Skip to content

Intel Labs makes progress in PC vision development with two new AI models

28/12/2023

A new generation of AI modeling: VI-Depth 1.0 and MiDaS 3.1

The innovative open source AI models VI-Depth 1.0 and MiDaS 3.1 come to optimize depth estimation in computer vision applications.

The complexity of estimating depth is a challenge for computer vision, essential in the development of countless applications in robotics, augmented reality (AR) and virtual reality (VR). Current solutions often encounter obstacles in accurately estimating distances, a vital component in ensuring movement planning and avoiding blockages in visual navigation. For this reason, Intel Labs researchers are responding to this challenge with the launch of two revolutionary AI models for monocular depth estimation: a specific model for inertial-vision and another for robust relative depth estimation (RDE). ).

Error 403 The request cannot be completed because you have exceeded your quota. : quotaExceeded

Improving accuracy: the role of MiDaS 3.1

The latest RDE model, called MiDaS version 3.1, is capable of generating robust relative depth using only one image. Thanks to its training on a large and diverse data set, it can perform efficiently on a wide variety of tasks and environments. The latest update to MiDaS has managed to increase the accuracy of the RDE model by approximately 30% by integrating a larger training set and updating its encoder backbones.

The MiDaS model has been integrated into numerous projects, most notably Stable Diffusion 2.0, where it facilitates the image depth function, which infers the depth of an input image and then generates new images using both textual and depth information. This technology could open the door to new virtual applications, including the reconstruction of crime scenes for trials, therapeutic environments for mental health and immersive gaming experiences.

VI-Depth: Integration of inertial data to improve accuracy

Intel Labs Drives Advances in Computer Vision with Two Innovative AI Models

The high performance of the RDE model allows for widespread utility, however, the lack of scale may limit its usefulness for downstream tasks requiring depth metrics such as mapping, planning, navigation, object recognition, 3D reconstruction, and image editing. Through their new AI model called VI-Depth, Intel Labs researchers are providing a solution to this problem.

VI-Depth is a visual-inertial depth estimation system that integrates monocular depth estimation and visual-inertial odometry (VIO) to achieve dense depth estimates with a metric scale. This approach provides accurate depth estimation that can be very useful in scene reconstruction, mapping, and object manipulation.

Incorporating inertial data can help resolve scale ambiguity issues. Most mobile devices are already equipped with inertial measurement units (IMU). Global alignment sets the appropriate global scale, while so-called Dense Scale Alignment (SML) acts locally to fit regions to the appropriate depth metric. The SML network is based on MiDaS as the encoder's starting point. By combining data-driven depth estimation with the MiDaS relative depth prediction model and the IMU sensor measurement unit, VI-Depth can generate a more reliable dense depth metric for each pixel in an image.

The latest innovations from Intel Labs, MiDaS 3.1 y VI-Depth 1.0 They are now available on GitHub under an MIT open source license.

READ MORE ARTICLES ABOUT: Automotive with AI.

READ THE PREVIOUS POST: OpenAI Introduces Multimodal GPT-4 LLM: The Most Advanced AI Yet.