KinectFusion: Real-Time Dense Surface Mapping and Tracking

The surge of interest in augmented and mixed reality applications can at least in part be attributed to research in the “real-time infrastructure-free” tracking of a camera with the simultaneous generation of detailed maps of physical scenes. While computer vision research has enabled this (especially accurate camera tracking and dense scene surface reconstructions) using structure from motion and multiview stereo algorithms, they are not quite well suited for either real-time applications or detailed surface reconstructions. There has also been a contemporaneous improvement of camera technologies, especially depth-sensing cameras based on time-of-flight or structured light sensing, such as Microsoft Kinect, a consumer-grade offering. Microsoft Kinect features a structured light-based depth sensor (sensor hereafter) and generates a 11-bit $640 \times 480$ depth map at 30Hz using an on-board ASIC. However, these depth images are usually noisy with ‘holes’ indicating regions where depth reading was not possible. This paper proposes a system to process these noisy depth maps and perform real-time (9 million new point measurements per second) dense simultaneous localization and mapping (SLAM), thereby generating an incremental and consistent 3D scene model while also tracking the sensor’s motion (all 6 degrees-of-freedom) through each frame. While the paper presents quite an involved description of the method, the key components have been briefly summarized here. ...

October 19, 2020 · 4 min · Kumar Abhishek