The complementary nature of visual and inertial sensors makes their fusion suitable for bridging GNSS gaps in challenged environments. Filtering-based approaches once dominated the integration of visual and inertial cues, but nonlinear optimization approaches are becoming more prevalent. To explore the potential of nonlinear optimization in real-time large-scale navigation, this paper presents a motion estimation approach that tightly fuses data from stereo cameras and a consumer-grade inertial measurement unit (IMU).
The proposed framework includes a tracking frontend and a mapping backend. In the frontend, point features are tracked in a circular manner between two consecutive frames of the stereo rig (each frame has two synchronized images). Also, features in a number of selected frames (keyframes) that have been triangulated are tracked in the incoming frame by feature matching. Given tracked point features and readings from the IMU, two windows of constraints are built, a spatial window and a temporal window. The spatial window contains constraints between poses of keyframes and observed points. Besides such pose-point constraints, the temporal window also has the constraints relating consecutive poses of the recent frames using the readings from the IMU. With these constraints, all variables (aka states) involved in both windows are refined with a nonlinear optimization. This optimization is done recursively as new frames and IMU readings stream in, and the keyframes move out from the double window. In the backend, the map is routinely maintained once a new keyframe is inserted. In addition, features in the keyframes make the large-scale appearance-based loop closure possible. The current implementation also supports the online removal of redundant keyframes and spurious points. This enhances compact scene representation and assures the extended operation.
Tests on the Tsukuba Stereo and KITTI datasets verified that this approach achieved higher accuracy than the sliding window smoother, and an incremental stereo odometry. Thus, benefits of using keyframes and fusing inertial data in motion estimation were verified. It is our goal to make the final implementation fully available as an open-source.