SDA-Loc: A Semantic-Driven Alignment Algorithm for Cross-Modal Localization in Point Cloud Maps

doi:10.1007/s12204-025-2841-5

Abstract

Abstract: Cross-modal localization, utilizing only cameras and prior light detection and ranging (LiDAR) point cloud maps, achieves high localization accuracy at a low cost. The integration of semantic information can significantly enhance the accuracy at the cost of heavy computational load on optimization and huge semantic annotation on LiDAR point cloud maps. In this paper, we propose the SDA-Loc, a semantic cross-modal localization system that solely relies on visual semantic information, making our approach more streamlined compared to existing methods. We design a semantic-driven alignment algorithm that leverages visual semantic labels to perform different types of iterative closest point, allowing the system to better exploit the structural information represented by object semantics, thereby achieving accurate localization without the additional burden of point cloud annotation. Coupled with a designed dynamic error rejection mechanism, our approach effectively achieves a balance between accuracy and speed. The experiments conducted on the KITTI dataset demonstrate the competitive localization performance of our approach. Moreover, the experiment on outdoor campus dataset confirms that the proposed system can effectively mitigate the drift in visual localization under challenging lighting conditions, and proves the robustness of SDA-Loc when using poor LiDAR point cloud maps. The runtime analysis also shows that SDA-Loc strikes an excellent balance between localization accuracy and computational efficiency.

Key words: cross-modal localization, map-based localization, semantic-driven alignment algorithm, simultaneous localization and mapping

摘要： 仅依赖相机在先验激光（LiDAR）点云地图中进行定位的跨模态视觉定位方法，能够在低成本约束下实现高精度定位。现有研究表明，融合语义信息可有效提升定位精度，但此类方法通常面临激光点云地图大规模语义标注所带来的开销以及位姿优化过程中的计算负担。为解决这一问题，本文提出了一种轻量化的语义跨模态定位系统SDA-Loc，该方法仅依赖视觉语义信息，较现有方法更为简洁高效。我们设计了语义驱动对齐算法，利用视觉语义标签引导改进型迭代最近点配准，充分挖掘目标语义表征的结构化特征，在无需点云地图附加标注的前提下，实现了精准的位姿估计。同时，结合动态错误拒绝机制，本方法有效平衡了定位精度与实时性需求。基于KITTI数据集的实验结果表明，SDA-Loc在定位精度上具备与现有方法竞争的优势。进一步，针对户外校园场景数据集的测试结果显示，在复杂光照变化等环境下，系统能够有效降低传统视觉定位方法中常见的漂移问题。此外，在点云地图质量较差的情况下，系统依然能够保持较强的鲁棒性。运行时间分析表明，SDA-Loc在定位精度与计算效率之间达到了良好的平衡。

关键词: 跨模态定位，基于地图的定位方法，语义驱动对齐算法，同时定位和映射

CLC Number:

TP242.6

Ceng Yuxuan, Zhao Wentao, Chen Yongtao, Xiao Peng, Wang Jingchuan, Guo Rui. SDA-Loc: A Semantic-Driven Alignment Algorithm for Cross-Modal Localization in Point Cloud Maps[J]. J Shanghai Jiaotong Univ Sci, 2026, 31(1): 117-129.

References

[1] MAO T Y, ZHAO W T, WANG J C, et al. Lidar-visual-inertial odometry with online extrinsic calibration [J]. Journal of Shanghai Jiao Tong University (Science), 2023, 28(1): 70-76.
[2] XIANG J W, ZHANG J Y, WANG B, et al. Low data overlap rate graph-based SLAM with distributed submap strategy [J]. Journal of Shanghai Jiao Tong University (Science), 2020, 25(5): 650-658.
[3] YABUUCHI K, KATO S. VMVG-loc: Visual localization for autonomous driving using vector map and voxel grid map [C]//2022 IEEE/RSJ International Conference on Intelligent Robots and Systems. Kyoto: IEEE, 2022: 6976-6983.
[4] YABUUCHI K, WONG D R, ISHITA T, et al. Visual localization for autonomous driving using pre-built point cloud maps [C]//2021 IEEE Intelligent Vehicles Symposium. Nagoya: IEEE, 2021: 913-919.
[5] LIANG S W, ZHANG Y Z, TIAN R, et al. SemLoc: Accurate and robust visual localization with semantic and structural constraints from prior maps [C]//2022 International Conference on Robotics and Automation. Philadelphia: IEEE, 2022: 4135-4141.
[6] ZHANG C, ZHAO H W, WANG C X, et al. Cross-modal monocular localization in prior LiDAR maps utilizing semantic consistency [C]//2023 IEEE International Conference on Robotics and Automation. London: IEEE, 2023: 4004-4010.
[7] WOLCOTT R W, EUSTICE R M. Visual localization within LIDAR maps for automated urban driving [C]//2014 IEEE/RSJ International Conference on Intelligent Robots and Systems. Chicago: IEEE, 2014: 176-183.
[8] PASCOE G, MADDERN W, NEWMAN P. Direct visual localisation and calibration for road vehicles in changing city environments [C]// 2015 IEEE International Conference on Computer Vision Workshop. Santiago: IEEE, 2015: 98-105.
[9] XU Y Q, JOHN V, MITA S, et al. 3D point cloud map based vehicle localization using stereo camera [C]//2017 IEEE Intelligent Vehicles Symposium. Los Angeles: IEEE, 2017: 487-492.
[10] KIM Y, JEONG J, KIM A. Stereo camera localization in 3D LiDAR maps [C]//2018 IEEE/RSJ International Conference on Intelligent Robots and Systems. Madrid: IEEE, 2018: 1-9.
[11] CASELITZ T, STEDER B, RUHNKE M, et al. Monocular camera localization in 3D LiDAR maps [C]//2016 IEEE/RSJ International Conference on Intelligent Robots and Systems. Daejeon: IEEE, 2016: 1926-1931.
[12] YE H Y, HUANG H Y, LIU M. Monocular direct sparse localization in a prior 3D surfel map [C]//2020 IEEE International Conference on Robotics and Automation. Paris: IEEE, 2020: 8892-8898.
[13] DING X Q, WANG Y, LI D X, et al. Laser map aided visual inertial localization in changing environment [C]//2018 IEEE/RSJ International Conference on Intelligent Robots and Systems. Madrid: IEEE, 2018: 4794-4801.
[14] ZUO X X, GENEVA P, YANG Y L, et al. Visual-inertial localization with prior LiDAR map constraints [J]. IEEE Robotics and Automation Letters, 2019, 4(4): 3394-3401.
[15] CHEN L C, ZHU Y K, PAPANDREOU G, et al. Encoder-decoder with atrous separable convolution for semantic image segmentation [M]//Computer Vision – ECCV 2018. Cham: Springer, 2018: 833-851.
[16] ZHAO H S, QI X J, SHEN X Y, et al. ICNet for real-time semantic segmentation on high-resolution images [M]//Computer Vision – ECCV 2018. Cham: Springer, 2018: 418-434.
[17] YU H, ZHEN W K, YANG W, et al. Monocular camera localization in prior LiDAR maps with 2D-3D line correspondences [C]//2020 IEEE/RSJ International Conference on Intelligent Robots and Systems. Los Angeles: IEEE, 2020: 4588-4594.
[18] WANG Y B, ZHAO W T, CAO C, et al. SFPNet: Sparse focal point network for semantic segmentation on general LiDAR point clouds [M]//Computer Vision – ECCV 2024. Cham: Springer, 2025: 403-421.
[19] VIZZO I, GUADAGNINO T, MERSCH B, et al. KISS-ICP: In defense of point-to-point ICP–simple, accurate, and robust registration if done the right way [J]. IEEE Robotics and Automation Letters, 2023, 8(2): 1029-1036.
[20] LI Z Y, MA X, LI Y B. An optimal projection plane-based spatial circle measurement method using stereo vision system [J]. Pattern Recognition and Artificial Intelligence, 2019, 32(1): 58-66 (in Chinese).
[21] CORDTS M, OMRAN M, RAMOS S, et al. The cityscapes dataset for semantic urban scene understanding [C]//2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016: 3213-3223.
[22] QUIGLEY M, GERKEY B, CONLEY K, et al. ROS: an open-source robot operating system [C]//ICRA Workshop on Open Source Software. Kobe: IEEE, 2009: 1-6.
[23] CAMPOS C, ELVIRA R, RODRÍGUEZ J J G, et al. ORB-SLAM3: An accurate open-source library for visual, visual–inertial, and multimap SLAM [J]. IEEE Transactions on Robotics, 2021, 37(6): 1874-1890.
[24] Are we ready for autonomous driving? The KITTI vision benchmark suite [C]// 2012 IEEE Conference on Computer Vision and Pattern Recognition. Providence: IEEE, 2012: 3354-3361.
[25] XU W, CAI Y X, HE D J, et al. FAST-LIO2: Fast direct LiDAR-inertial odometry [J]. IEEE Transactions on Robotics, 2022, 38(4): 2053-2073.