Intelligent Robots

SDA-Loc: A Semantic-Driven Alignment Algorithm for Cross-Modal Localization in Point Cloud Maps

Expand
  • 1. Department of Automation; Institute of Medical Robotics, Shanghai Jiao Tong University, Shanghai 200030, China; 2. Key Laboratory of System Control and Information Processing of Ministry of Education, Shanghai 200030, China; 3. Shanghai Engineering Research Center of Intelligent Control and Management, Shanghai 200030, China; 4. State Grid Intelligence Technology Co., Ltd., Jinan 250101, China

Received date: 2024-11-26

  Revised date: 2025-01-23

  Accepted date: 2025-02-17

  Online published: 2025-08-26

Abstract

Cross-modal localization, utilizing only cameras and prior light detection and ranging (LiDAR) point cloud maps, achieves high localization accuracy at a low cost. The integration of semantic information can significantly enhance the accuracy at the cost of heavy computational load on optimization and huge semantic annotation on LiDAR point cloud maps. In this paper, we propose the SDA-Loc, a semantic cross-modal localization system that solely relies on visual semantic information, making our approach more streamlined compared to existing methods. We design a semantic-driven alignment algorithm that leverages visual semantic labels to perform different types of iterative closest point, allowing the system to better exploit the structural information represented by object semantics, thereby achieving accurate localization without the additional burden of point cloud annotation. Coupled with a designed dynamic error rejection mechanism, our approach effectively achieves a balance between accuracy and speed. The experiments conducted on the KITTI dataset demonstrate the competitive localization performance of our approach. Moreover, the experiment on outdoor campus dataset confirms that the proposed system can effectively mitigate the drift in visual localization under challenging lighting conditions, and proves the robustness of SDA-Loc when using poor LiDAR point cloud maps. The runtime analysis also shows that SDA-Loc strikes an excellent balance between localization accuracy and computational efficiency.

Cite this article

Ceng Yuxuan, Zhao Wentao, Chen Yongtao, Xiao Peng, Wang Jingchuan, Guo Rui . SDA-Loc: A Semantic-Driven Alignment Algorithm for Cross-Modal Localization in Point Cloud Maps[J]. Journal of Shanghai Jiaotong University(Science), 2026 , 31(1) : 117 -129 . DOI: 10.1007/s12204-025-2841-5

References

[1] MAO T Y, ZHAO W T, WANG J C, et al. Lidar-visual-inertial odometry with online extrinsic calibration [J]. Journal of Shanghai Jiao Tong University (Science), 2023, 28(1): 70-76.
[2] XIANG J W, ZHANG J Y, WANG B, et al. Low data overlap rate graph-based SLAM with distributed submap strategy [J]. Journal of Shanghai Jiao Tong University (Science), 2020, 25(5): 650-658.
[3] YABUUCHI K, KATO S. VMVG-loc: Visual localization for autonomous driving using vector map and voxel grid map [C]//2022 IEEE/RSJ International Conference on Intelligent Robots and Systems. Kyoto: IEEE, 2022: 6976-6983.
[4] YABUUCHI K, WONG D R, ISHITA T, et al. Visual localization for autonomous driving using pre-built point cloud maps [C]//2021 IEEE Intelligent Vehicles Symposium. Nagoya: IEEE, 2021: 913-919.
[5] LIANG S W, ZHANG Y Z, TIAN R, et al. SemLoc: Accurate and robust visual localization with semantic and structural constraints from prior maps [C]//2022 International Conference on Robotics and Automation. Philadelphia: IEEE, 2022: 4135-4141.
[6] ZHANG C, ZHAO H W, WANG C X, et al. Cross-modal monocular localization in prior LiDAR maps utilizing semantic consistency [C]//2023 IEEE International Conference on Robotics and Automation. London: IEEE, 2023: 4004-4010.
[7] WOLCOTT R W, EUSTICE R M. Visual localization within LIDAR maps for automated urban driving [C]//2014 IEEE/RSJ International Conference on Intelligent Robots and Systems. Chicago: IEEE, 2014: 176-183.
[8] PASCOE G, MADDERN W, NEWMAN P. Direct visual localisation and calibration for road vehicles in changing city environments [C]// 2015 IEEE International Conference on Computer Vision Workshop. Santiago: IEEE, 2015: 98-105.
[9] XU Y Q, JOHN V, MITA S, et al. 3D point cloud map based vehicle localization using stereo camera [C]//2017 IEEE Intelligent Vehicles Symposium. Los Angeles: IEEE, 2017: 487-492.
[10] KIM Y, JEONG J, KIM A. Stereo camera localization in 3D LiDAR maps [C]//2018 IEEE/RSJ International Conference on Intelligent Robots and Systems. Madrid: IEEE, 2018: 1-9.
[11] CASELITZ T, STEDER B, RUHNKE M, et al. Monocular camera localization in 3D LiDAR maps [C]//2016 IEEE/RSJ International Conference on Intelligent Robots and Systems. Daejeon: IEEE, 2016: 1926-1931.
[12] YE H Y, HUANG H Y, LIU M. Monocular direct sparse localization in a prior 3D surfel map [C]//2020 IEEE International Conference on Robotics and Automation. Paris: IEEE, 2020: 8892-8898.
[13] DING X Q, WANG Y, LI D X, et al. Laser map aided visual inertial localization in changing environment [C]//2018 IEEE/RSJ International Conference on Intelligent Robots and Systems. Madrid: IEEE, 2018: 4794-4801.
[14] ZUO X X, GENEVA P, YANG Y L, et al. Visual-inertial localization with prior LiDAR map constraints [J]. IEEE Robotics and Automation Letters, 2019, 4(4): 3394-3401.
[15] CHEN L C, ZHU Y K, PAPANDREOU G, et al. Encoder-decoder with atrous separable convolution for semantic image segmentation [M]//Computer Vision – ECCV 2018. Cham: Springer, 2018: 833-851.
[16] ZHAO H S, QI X J, SHEN X Y, et al. ICNet for real-time semantic segmentation on high-resolution images [M]//Computer Vision – ECCV 2018. Cham: Springer, 2018: 418-434.
[17] YU H, ZHEN W K, YANG W, et al. Monocular camera localization in prior LiDAR maps with 2D-3D line correspondences [C]//2020 IEEE/RSJ International Conference on Intelligent Robots and Systems. Los Angeles: IEEE, 2020: 4588-4594.
[18] WANG Y B, ZHAO W T, CAO C, et al. SFPNet: Sparse focal point network for semantic segmentation on general LiDAR point clouds [M]//Computer Vision – ECCV 2024. Cham: Springer, 2025: 403-421.
[19] VIZZO I, GUADAGNINO T, MERSCH B, et al. KISS-ICP: In defense of point-to-point ICP–simple, accurate, and robust registration if done the right way [J]. IEEE Robotics and Automation Letters, 2023, 8(2): 1029-1036.
[20] LI Z Y, MA X, LI Y B. An optimal projection plane-based spatial circle measurement method using stereo vision system [J]. Pattern Recognition and Artificial Intelligence, 2019, 32(1): 58-66 (in Chinese).
[21] CORDTS M, OMRAN M, RAMOS S, et al. The cityscapes dataset for semantic urban scene understanding [C]//2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016: 3213-3223.
[22] QUIGLEY M, GERKEY B, CONLEY K, et al. ROS: an open-source robot operating system [C]//ICRA Workshop on Open Source Software. Kobe: IEEE, 2009: 1-6.
[23] CAMPOS C, ELVIRA R, RODRÍGUEZ J J G, et al. ORB-SLAM3: An accurate open-source library for visual, visual–inertial, and multimap SLAM [J]. IEEE Transactions on Robotics, 2021, 37(6): 1874-1890.
[24] Are we ready for autonomous driving? The KITTI vision benchmark suite [C]// 2012 IEEE Conference on Computer Vision and Pattern Recognition. Providence: IEEE, 2012: 3354-3361.
[25] XU W, CAI Y X, HE D J, et al. FAST-LIO2: Fast direct LiDAR-inertial odometry [J]. IEEE Transactions on Robotics, 2022, 38(4): 2053-2073.

Outlines

/