Dual-eye USB camera module depth perception

The depth perception of the binocular USB camera module is based on the parallax principle. It simulates human vision through two parallel cameras and calculates the depth of the object by using the parallax generated from the difference in the viewing angles of the left and right cameras. Its workflow includes four links: calibration, correction, matching and depth calculation, as follows:

Calibration: The binocular cameras need to be calibrated to obtain the internal and external parameters and homography matrices of the two cameras. Calibration can determine key parameters such as the focal length and baseline length of the camera, providing a basis for subsequent depth calculations.

Correction: Based on the calibration results, the original image is corrected to ensure that the two corrected images are in the same plane and parallel to each other. Correction can eliminate image distortion and errors caused by factors such as the installation Angle and position of the camera, and improve the accuracy of depth perception.

Matching: Perform pixel point matching on the two corrected images. This step is at the core of depth perception, determining parallax by finding the relationship between corresponding pixels in the left and right images. Due to the poor robustness of a single pixel point and its vulnerability to changes in illumination and different viewing angles, in practical operations, methods such as sliding Windows or energy optimization are often used for matching. The sliding window finds the best matching point by comparing the similarity degree of pixels within the local area. Energy optimization defines the energy function, and the best matching result is obtained by minimizing the energy function.

Depth calculation: Calculate the depth of each pixel based on the matching result to obtain a depth map. There is a certain conversion relationship between depth and parallax. For instance, in an ideal binocular camera imaging model, if the left and right cameras are on the same plane (with parallel optical axes) and the camera parameters (focal length f) are the same, then the distance (depth) z from a spatial point to the camera is f * b/d, where b is the baseline of the left and right cameras and d is the parallax.