In this project, we implemented a monocular visual odometry (VO) pipeline with the most essential components: initialization of 3D landmarks, keypoint tracking between two frames, pose estimation using established 2D to 3D correspondences, and triangulation of new landmarks. Building upon this baseline VO, we incorporated local optimization (sliding-window bundle adjustment) to mitigate the scale drift, and global optimization (loop detection and loop correction) to transform the VO pipeline into a visual simultaneous localization and mapping (VSLAM) framework. The performance of the pipelines are evaluated on three different datasets: Parking, KITTI and Malaga.
The baseline VO fails to run accurately and stably, even on the simplest dataset - the Parking dataset, as shown in the figure below.
After integrated with local optimization, the VO can run stably and with much higher accuracy on different datasets, which can be seen from the following videos, but it cannot detect and correct loops in KITTI (see in the 3rd video below, or the figure at the beginning).
When further combined with global optimization, the VO is capable of detecting and closing loops, and it can run successfully on KITTI. The demo is shown below.