基于双流特征提取的车路协同感知方法

牛国臣; 孙翔宇; 苑峥岩

doi:10.16183/j.cnki.jsjtu.2024.239

上海交通大学学报 >

2024 , Vol. 58 >Issue 11: 1826 - 1834

DOI: https://doi.org/10.16183/j.cnki.jsjtu.2024.239

制导、导航与控制

基于双流特征提取的车路协同感知方法

牛国臣 ,
孙翔宇 ,
苑峥岩

展开

中国民航大学机器人研究所,天津 300300

牛国臣(1981—),副教授,从事智能机器人环境感知研究;E-mail:niu_guochen@139.com.

收稿日期: 2024-06-21

修回日期: 2024-07-16

录用日期: 2024-07-18

网络出版日期: 2024-09-03

基金资助

国家自然科学基金(U2333205);中央高校基本科研业务费(3122023PY04)

收起

Vehicle-Road Collaborative Perception Method Based on Dual-Stream Feature Extraction

NIU Guochen ,
SUN Xiangyu ,
YUAN Zhengyan

Expand

Robotics Institute, Civil Aviation University of China, Tianjin 300300, China

Received date: 2024-06-21

Revised date: 2024-07-16

Accepted date: 2024-07-18

Online published: 2024-09-03

Fold

摘要

针对自动驾驶在遮挡、超视距场景下感知不充分的问题,提出一种基于双流特征提取网络的特征级车路协同感知方法,以增强交通参与者的3D目标检测能力.根据路端与车端场景特点分别设计对应的特征提取网络:路端具有丰富且充足的感知数据和计算资源,采用Transformer结构提取更丰富、高级的特征表示;车端计算能力有限、实时性需求高,利用部分卷积(PConv)提高计算效率,引入Mamba-VSS模块实现对复杂环境的高效感知.通过置信度图指导关键感知信息共享与融合,有效实现了车路双端的协同感知.在DAIR-V2X数据集训练与测试,得到车端特征提取网络模型大小为8.1 MB,IoU阈值为0.5、0.7时对应平均精度指标为67.67%、53.74%.实验验证了该方法在检测精度、模型规模方面具备的优势,为车路协同提供了一种较低配置的检测方案.

关键词： 自动驾驶; 协同感知; 特征提取; 3D目标检测; 信息共享与融合

本文引用格式

牛国臣 , 孙翔宇 , 苑峥岩 . 基于双流特征提取的车路协同感知方法[J]. 上海交通大学学报, 2024 , 58(11) : 1826 -1834 . DOI: 10.16183/j.cnki.jsjtu.2024.239

Abstract

To solve the problem of inadequate perception of autonomous driving in occlusion and over-the-horizon scenarios, a vehicle-road collaborative perception method based on a dual-stream feature extraction network is proposed to enhance the 3D object detection capabilities of traffic participants. Feature extraction networks for roadside and vehicle-side scenes are tailored based on respective characteristics. Since roadside has rich and sufficient sensing data and computational resources, the Transformer structure is used to extract more sophisticated and advanced feature representations. Due to limited computational capability and high real-time demands of autonomous vehicles, partial convolution (PConv) is employed to enhance computing efficiency, and the Mamba-VSS module is introduced for efficient perception in complex environments. Collaborative perception between vehicle-side and roadside is accomplished through the selective sharing and fusion of critical perceptual information guided by confidence maps. By training and testing on DAIR-V2X dataset, the model size of vehicle-side feature extraction network is obtained to be 8.1 MB, and the IoU thresholds of 0.5 and 0.7 correspond to the average accuracy indexes of 67.67% and 53.74%. The experiment verifies the advantages of this method in detection accuracy and model size, and provides a lower-configuration detection scheme for vehicle-road collaboration.

Key words： autonomous driving; collaborative perception; feature extraction; 3D object detection; information sharing and fusion

参考文献

[1]	伊笑莹, 芮一康, 冉斌, 等. 车路协同感知技术研究进展及展望[J]. 中国工程科学, 2024, 26(1): 178-189.
	YI Xiaoying, RUI Yikang, RAN Bin, et al. Vehicle-infrastructure cooperative sensing: Progress and prospect[J]. Strategic Study of CAE, 2024, 26(1): 178-189.
[2]	ARNOLD E, DIANATI M, DE TEMPLE R, et al. Cooperative perception for 3D object detection in driving scenarios using infrastructure sensors[J]. IEEE Transactions on Intelligent Transportation Systems, 2022, 23(3): 1852-1864.
[3]	张毅, 姚丹亚, 李力, 等. 智能车路协同系统关键技术与应用[J]. 交通运输系统工程与信息, 2021, 21(5): 40-51.
	ZHANG Yi, YAO Danya, LI Li, et al. Technologies and applications for intelligent vehicle-infrastructure cooperation systems[J]. Journal of Transportation Systems Engineering and Information Technology, 2021, 21(5): 40-51.
[4]	DOSOVITSKIY A, ROS G, CODERVILLA F, et al. CARLA: An open urban driving simulator[C]//1st Conference on Robot Learning. Mountain View, USA: CoRL, 2017: 5550767.
[5]	CHEN Q, TANG S H, YANG Q, et al. Cooper: Cooperative perception for connected autonomous vehicles based on 3D point clouds[C]//2019 IEEE 39th International Conference on Distributed Computing Systems. Dallas, TX, USA: IEEE, 2019: 514-524.
[6]	CHEN Q. F-cooper: Feature based cooperative perception for autonomous vehicle edge computing system using 3D point clouds[DB/OL]. (2019-09-13)[2024-06-10]. https://arxiv.org/abs/1909.06459.
[7]	GUO J D, CARRILLO D, TANG S H, et al. CoFF: Cooperative spatial feature fusion for 3-D object detection on autonomous vehicles[J]. IEEE Internet of Things Journal, 2021, 8(14): 11078-11087.
[8]	HU Y, FANG S, LEI Z, et al.Where2comm: Communication-efficient collaborative perception via spatial confidence maps[C]//36th Corference on Neural Information Processing Systems. New Orleans, USA: NIPS, 2022: 4874-4886.
[9]	LIU Y C, TIAN J J, GLASER N, et al. When2com: Multi-agent perception via communication graph grouping[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, WA, USA: IEEE, 2020: 4105-4114.
[10]	WANG J Y, ZENG Y, GONG Y. Collaborative 3D object detection for automatic vehicle systems via learnable communications[DB/OL]. (2022-05-24) [2024-06-10]. https://arxiv.org/abs/2205.11849v1.
[11]	王秉路, 靳杨, 张磊, 等. 基于多传感器融合的协同感知方法[J]. 雷达学报, 2024, 13(1): 87-96.
	WANG Binglu, JIN Yang, ZHANG Lei, et al. Collaborative perception method based on multisensor fusion[J]. Journal of Radars, 2024, 13(1): 87-96.
[12]	LANG A H, VORA S, CAESAR H, et al. PointPillars: Fast encoders for object detection from point clouds[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, CA, USA: IEEE, 2019: 12689-12697.
[13]	DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16×16 words: Transformers for image recognition at scale[DB/OL]. (2020-10-22)[2024-06-10]. https://arxiv.org/abs/2010.11929.
[14]	LIU Z, LIN Y T, CAO Y, et al. Swin transformer: Hierarchical vision transformer using shifted windows[C]//2021 IEEE/CVF International Conference on Computer Vision. Montreal, QC, Canada: IEEE, 2021: 9992-10002.
[15]	CHEN J R, KAO S H, HE H, et al. Run, don’t walk: Chasing higher FLOPS for faster neural networks[C]//2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Vancouver, BC, Canada: IEEE, 2023: 12021-12031.
[16]	GU A, DAO T. Mamba: Linear-time sequence modeling with selective state spaces[DB/OL]. (2023-12-01)[2024-06-10]. https://arxiv.org/abs/2312.00752v2.
[17]	上海交通大学. 基于空间置信度图的多轮多模态多智能体的协同感知方法: CN 202211076556.X[P]. 2022-12-13[2024-06-10].
[18]	YU H B, LUO Y Z, SHU M, et al. DAIR-V2X: A large-scale dataset for vehicle-infrastructure cooperative 3D object detection[C]//2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans, LA, USA: IEEE, 2022: 21329-21338.
[19]	WANG T H, MANIVASAGAM S, LIANG M, et al. V2VNet: Vehicle-to-Vehicle communication for joint perception and prediction[M]//Computer Vision-ECCV 2020. Cham: Springer, 2020: 605-621.
[20]	XU R S, XIANG H, TU Z Z, et al. V2X-ViT: Vehicle-to-Everything cooperative perception with Vision Transformer[M]//Computer Vision-ECCV 2022. Cham: Springer, 2022: 107-124.
[21]	MEHR E, JOURDAN A, THOME N, et al. DiscoNet: Shapes learning on disconnected manifolds for 3D editing[C]//2019 IEEE/CVF International Conference on Computer Vision. Seoul: IEEE, 2019: 3473-3482.

Options

文章导航

模态框（Modal）标题

摘要

本文引用格式

Abstract

参考文献