Despite the significant success of Simultaneous Localization and Mapping (SLAM) in robotics research, the assumption of scene rigidity still limits the practical application of visual SLAM systems in the real world. As a result, various methods have been proposed to detect and segment dynamic objects in the scene to eliminate their influence, but their efficiency and accuracy fall short of the required standards. This paper introduces a new visual SLAM method that utilizes a fused network of YOLOv5 and BiSeNetv2 in the front end, which can provide accurate position, category, and mask information of dynamic objects simultaneously. Although similar to instance segmentation networks, the network's frame rate can reach 70FPS, surpassing many detection networks, and can be deployed in real-time SLAM systems. In the tracking thread of the SLAM system, the target frame information and mask information are used to remove feature points that do not conform to geometric constraints, and static feature points are propagated to the localization threads. In addition to the improvement in speed, compared to the state-of-the-art dynamic vision SLAM system on the public TUM datasets, our method also significantly improves positioning accuracy.
Chunbo XuJuan YanHuibin YangHan WuBo Wang
Dan FengZhenyu YinXiaohui WangFeiqing ZhangZisong Wang
Hongyu ZhangJiansheng PengQing Yang
Feng YangLiwen TanJinwen YuYanbo Wang