CenterLineFormer：基于单车载相机的车道中心线生成方法

doi:10.1007/s12204-024-2696-1

摘要/Abstract

摘要： 随着自动驾驶系统的迅速发展，车载感知算法对道路结构信息的需求激增。作为高精度地图中的道路结构层元素之一，车道中心线对于运动预测和决策规划等下游任务至关重要。考虑到车道中心线的复杂拓扑结构和重叠问题，以前的研究很少探讨车道中心线的生成问题。而基于深度学习的众包地图生成方法往往需要启发式后处理来生成车道中心线的道路结构信息。本文提出了一种基于深度注意力网络的端到端的车道中心线生成方法，CenterLineFormer，以单目车载相机作为传感器，生成鸟瞰图空间中表征道路驾驶态势的车道中心线结构图。提出了一种基于动态投影的可变性交叉注意力机制，该机制通过特征空间转换生成稠密的鸟瞰图空间特征图。可以描述不同中心线之间的连接关系，并为下游算法（例如规划和控制）生成矢量化的车道中心线结构图，避免后处理过程。实验表明，提出的方法在自动驾驶公开数据集上的表现优于现有算法，并且可以在夜间驾驶和复杂的交通路口场景中生成更准确的车道中心线结构图。

关键词: 自动驾驶, 生成车道中心线, 注意力机制

Abstract: As autonomous driving systems advance rapidly, there is a surge in demand for high-definition (HD) maps that provide accurate and dependable prior information on static environments around vehicles. As one of the main high-level elements in HD maps, the road lane centerline is essential for downstream tasks such as autonomous navigation and planning. Considering the complex topology and significant overlap concerns of road centerlines, previous studies have rarely examined the centerline HD map mapping problem. Recent learningbased pipelines take heuristic post-processing predictions to generate a structured centerline output without instance information. To ameliorate this situation, we propose a novel, end-to-end road centerlines vectorized graph generation pipeline, termed CenterLineFormer. CenterLineFormer takes a single onboard camera image as input and predicts a directed graph representing the lane-layer map in the bird’s-eye view (BEV). We propose a strategy for better view transformation that uses a cross-attention mechanism to generate a dense BEV feature map. With our pipeline, we can describe the connection relationship between different centerlines and generate structured lane graphs for downstream modules as planning and control. Qualitatively, our experiments emphasize that our pipeline achieves a superior performance against previous baselines on nuScenes dataset. We also show that CenterLineFormer can generate accurate centerline graph topologies on night driving and complex traffic intersection scenes.

中图分类号:

V323.19

. CenterLineFormer：基于单车载相机的车道中心线生成方法[J]. J Shanghai Jiaotong Univ Sci, 2025, 30(5): 1009-1017.

QIN Minghui, LIU Yuanzhi, L Na, TAO Wei, ZHAO Hui. CenterLineFormer: Road Centerlines Graph Generation with Single Onboard Camera[J]. J Shanghai Jiaotong Univ Sci, 2025, 30(5): 1009-1017.

参考文献

[1] SEIF H G, HU X L. Autonomous driving in the iCity—HD maps as a key challenge of the automotive industry [J]. Engineering, 2016, 2(2): 159-162.

[2] MA W C, URTASUN R, TARTAVULL I, et al. Exploiting sparse semantic HD maps for self-driving vehicle localization [C]//2019 IEEE/RSJ International Conference on Intelligent Robots and Systems. Macau: IEEE, 2019: 5304-5311.

[3] CHEN D, ZHOU B, KOLTUN V, et al. Learning by Cheating[C]// 3rd Conference on Robot Learning. Osakan: PMLR, 2019: 66-75.

[4] CUI H G, RADOSAVLJEVIC V, CHOU F C, et al. Multimodal trajectory predictions for autonomous driving using deep convolutional networks [C]//2019 International Conference on Robotics and Automation. Montreal: IEEE, 2019: 2090-2096.

[5] HONG J, SAPP B, PHILBIN J. Rules of the road: Predicting driving behavior with a convolutional model of semantic interactions [C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019: 8446-8454.

[6] BASTANI F, HE S T, ABBAR S, et al. RoadTracer: automatic extraction of road networks from aerial images [C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018: 4720-4728.

[7] HOMAYOUNFAR N, MA W C, LAKSHMIKANTH S K, et al. Hierarchical recurrent attention networks for structured online maps [C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018: 3417-3426.

[8] XU Z H, SUN Y X, LIU M. Topo-boundary: A benchmark dataset on topological road-boundary detection using aerial images for autonomous driving [J]. IEEE Robotics and Automation Letters, 2021, 6(4): 7248-7255.

[9] LIANG J, HOMAYOUNFAR N, MA W C, et al. Convolutional recurrent network for road boundary extraction [C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019: 9504-9513.

[10] XU Z H, SUN Y X, LIU M. iCurb: Imitation learning-based detection of road curbs using aerial images for autonomous driving [J]. IEEE Robotics and Automation Letters, 2021, 6(2): 1097-1104.

[11] RODDICK T, CIPOLLA R. Predicting semantic map representations from images using pyramid occupancy networks [C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020: 11135-11144.

[12] YANG B, LIANG M, URTASUN R. HDNET: Exploiting HD maps for 3D object detection [DB/OL]. (2020-12-21). https://arxiv.org/abs/2012.11704

[13] LI Q, WANG Y, WANG Y L, et al. HDMapNet: an online HD map construction and evaluation framework [C]//2022 International Conference on Robotics and Automation. Philadelphia: IEEE, 2022: 4628-4634.

[14] XU H Q, YANG M, DENG L Y, et al. Semantic segmentation-based road marking detection using around view monitoring system [J]. Journal of Shanghai Jiao Tong University (Science), 2022, 27(6): 833-843.

[15] CAN Y B, LINIGER A, PAUDEL D P, et al. Structured bird’s-eye-view traffic scene understanding from onboard images [C]//2021 IEEE/CVF International Conference on Computer Vision. Montreal: IEEE, 2021: 15641-15650.

[16] CAN Y B, LINIGER A, PAUDEL D P, et al. Topology preserving local road network estimation from single onboard camera image [C]//2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans: IEEE, 2022: 17242-17251.

[17] CHEN L C, ZHU Y K, PAPANDREOU G, et al. Encoder-decoder with atrous separable convolution for semantic image segmentation[M]//European conference on computer vision. Cham: Springer, 2018: 833-851.

[18] CORDTS M, OMRAN M, RAMOS S, et al. The cityscapes dataset for semantic urban scene understanding [C]//2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016: 3213-3223.

[19] LIN T Y, DOLLÁR P, GIRSHICK R, et al. Feature pyramid networks for object detection [C]//2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017: 936-944.

[20] MALLOT H A, BÜLTHOFF H H, LITTLE J J, et al. Inverse perspective mapping simplifies optical flow computation and obstacle detection [J]. Biological Cybernetics, 1991, 64(3): 177-185.

[21] ZHU X, SU W, LU L, et al. Deformable detr: Deformable transformers for end-to-end object detection[C]// 2021 7th International Conference on Learning Representations. Online: ICLR, 2021:1-16.

[22] CAESAR H, BANKITI V, LANG A H, et al. nuScenes: A multimodal dataset for autonomous driving [C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020: 11618-11628.

[23] LOSHCHILOV I, HUTTER F. Decoupled Weight Decay Regularization[C]// 2019 7th International Conference on Learning Representations. New Orleans: ICLR, 2019:1-19.

[24] ACUNA D, LING H, KAR A, et al. Efficient interactive annotation of segmentation datasets with polygon-RNN [C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018: 859-868.

[25] KO Y, LEE Y, AZAM S, et al. Key points estimation and point instance segmentation approach for lane detection [J]. IEEE Transactions on Intelligent Transportation Systems, 2022, 23(7): 8949-8958.