基于锥型体素建模和单目相机的鸟瞰图语义分割和体素语义分割

doi:10.1007/s12204-023-2573-3

J Shanghai Jiaotong Univ Sci ›› 2023, Vol. 28 ›› Issue (1): 100-113.doi: 10.1007/s12204-023-2573-3

基于锥型体素建模和单目相机的鸟瞰图语义分割和体素语义分割

（1.上海交通大学机械与动力工程学院，上海200240；2. 上海智能网联汽车技术中心有限公司，上海201499)

收稿日期:2022-03-08 出版日期:2023-01-28 发布日期:2023-02-10

Birds-Eye-View Semantic Segmentation and Voxels Semantic Segmentation Based on Frustum Voxels Modeling and Monocular Camera

QIN Chao¹ (秦超), WANG Yafei¹ (王亚飞), ZHANG Yuchao² (张宇超), YIN Chengliang¹∗ (殷承良)

(1. School of Mechanical Engineering, Shanghai Jiao Tong University, Shanghai 200240, China; 2. Shanghai Intelligent and Connected Vehicle R&D Center Co., Ltd., Shanghai 201499, China)

Received:2022-03-08 Online:2023-01-28 Published:2023-02-10

摘要/Abstract

摘要： 自动驾驶场景中包含静态目标，如可驾驶区域，以及动态目标，如汽车, 而鸟瞰图的语义分割对于自主驾驶中的环境感知至关重要。本文提出了一个基于三维卷积的端到端深度学习模型以单目相机作为输入并预测鸟瞰图的语义分割和体素语义分割。场景的体素化建模和透视空间到相机空间的特征转换是提高本模型预测准确性的的关键方法。本模型在NuScenes数据集上进行训练并评估该方法的有效性。与其他经典模型的对比结果表明本文提出的模型在鸟瞰图的语义分割方面优于其他算法。此外本文模型还实现了体素语义分割，而其他模型并不具备体素语义分割的能力。

关键词: 语义分割, 体素语义分割, 深度学习, 卷积神经网络，鸟瞰图

Abstract: The semantic segmentation of a bird’s-eye view (BEV) is crucial for environment perception in autonomous driving, which includes the static elements of the scene, such as drivable areas, and dynamic elements such as cars. This paper proposes an end-to-end deep learning architecture based on 3D convolution to predict the semantic segmentation of a BEV, as well as voxel semantic segmentation, from monocular images. The voxelization of scenes and feature transformation from the perspective space to camera space are the key approaches of this model to boost the prediction accuracy. The effectiveness of the proposed method was demonstrated by training and evaluating the model on the NuScenes dataset. A comparison with other state-of-the-art methods showed that the proposed approach outperformed other approaches in the semantic segmentation of a BEV. It also implements voxel semantic segmentation, which cannot be achieved by the state-of-the-art methods.

Key words: semantic segmentation, voxel semantic segmentation, deep learning, convolution neural network, bird’s-eye view (BEV)

中图分类号:

TP391.4

. 基于锥型体素建模和单目相机的鸟瞰图语义分割和体素语义分割[J]. J Shanghai Jiaotong Univ Sci, 2023, 28(1): 100-113.

QIN Chao1 (秦超), WANG Yafei1 (王亚飞), ZHANG Yuchao2 (张宇超), YIN Chengliang1∗ (殷承良). Birds-Eye-View Semantic Segmentation and Voxels Semantic Segmentation Based on Frustum Voxels Modeling and Monocular Camera[J]. J Shanghai Jiaotong Univ Sci, 2023, 28(1): 100-113.

参考文献

[1] BADRINARAYANAN V, KENDALL A, CIPOLLA R. SegNet: A deep convolutional encoder-decoder architecture for image segmentation [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(12): 2481-2495.
[2] READING C, HARAKEH A, CHAE J L, et al. Categorical depth distribution network for monocular 3D object detection [C]//2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville: IEEE, 2021: 8551-8560.
[3] ABBAS S A, ZISSERMAN A. A geometric approach to obtain a bird’s eye view from an image [C]//2019 IEEE/CVF International Conference on Computer Vision Workshop. Seoul: IEEE, 2019: 4095-4104.
[4] LIN C C, WANG M S. A vision based top-view transformation model for a vehicle parking assistant [J]. Sensors, 2012, 12(4): 4431-4446.
[5] DENG L Y, YANG M, LI H, et al. Restricted deformable convolution-based road scene semantic segmentation using surround view cameras [J]. IEEE Transactions on Intelligent Transportation Systems, 2020, 21(10): 4350-4362.
[6] S?MANN T, AMENDE K, MILZ S, et al. Efficient semantic segmentation for visual bird’s-eye view interpretation [M]//Intelligent autonomous systems 15. Cham: Springer, 2018: 679-688.
[7] PAN B W, SUN J K, LEUNG H Y T, et al. Crossview semantic segmentation for sensing surroundings [J]. IEEE Robotics and Automation Letters, 2020, 5(3): 4867-4873.
[8] LU C Y, VAN DE MOLENGRAFT M J G, DUBBELMAN G. Monocular semantic occupancy grid mapping with convolutional variational encoder–decoder networks [J]. IEEE Robotics and Automation Letters, 2019, 4(2): 445-452.
[9] SCHULTER S, ZHAI M H, JACOBS N, et al. Learning to look around objects for top-view representations of outdoor scenes [M]//Computer vision – ECCV 2018. Cham: Springer, 2018: 815-831.
[10] MANI K, DAGA S, GARG S, et al. MonoLayout: Amodal scene layout from a single image [C]//2020 IEEE Winter Conference on Applications of Computer Vision. Snowmass: IEEE, 2020: 1678-1686.
[11] RODDICK T, CIPOLLA R. Predicting semantic map representations from images using pyramid occupancy networks [C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020: 11135-11144.
[12] RONNEBERGER O, FISCHER P, BROX T. U-Net: Convolutional networks for biomedical image segmentation [M]//Medical image computing and computerassisted intervention – MICCAI 2015. Cham: Springer, 2015: 234-241.
[13] DING X H, ZHANG X Y, MA N N, et al. RepVGG: making VGG-style ConvNets great again [C]//2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville: IEEE, 2021: 13728-13737.
[14] LIN T Y, GOYAL P, GIRSHICK R, et al. Focal loss fordense object detection [C]//2017 IEEE International Conference on Computer Vision. Venice: IEEE, 2017: 2999-3007.
[15] CAESAR H, BANKITI V, LANG A H, et al. nuScenes: A multimodal dataset for autonomous driving [C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020: 11618-11628.
[16] KINGMA D P, BA J. Adam: A method for stochastic optimization[DB/OL]. (2017-01-30). https://arxiv.org/abs/1412.6980.
[17] GARCIA-GARCIA A, ORTS-ESCOLANO S, OPREA S, et al. A review on deep learning techniques applied to semantic segmentation [DB/OL]. (2017-04-22). https://arxiv.org/abs/1704.06857.

基于锥型体素建模和单目相机的鸟瞰图语义分割和体素语义分割

Birds-Eye-View Semantic Segmentation and Voxels Semantic Segmentation Based on Frustum Voxels Modeling and Monocular Camera

PDF (PC)

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 13

编辑推荐

Metrics

本文评价

[1]	. 血管介入手术路径规划及三维视觉导航[J]. J Shanghai Jiaotong Univ Sci, 2025, 30(3): 472-481.
[2]	. 基于改进FCOS算法的钢丝绳芯输送带损伤X射线图像检测[J]. J Shanghai Jiaotong Univ Sci, 2025, 30(2): 309-318.
[3]	. 基于双流自编码器的无监督动作识别[J]. J Shanghai Jiaotong Univ Sci, 2025, 30(2): 330-336.
[4]	. 基于空间特征学习与多粒度特征融合的行人重识别[J]. J Shanghai Jiaotong Univ Sci, 2025, 30(2): 363-374.
[5]	周苏, 钟泽滨. 基于车载智能手机的实时车辆及行人测距[J]. J Shanghai Jiaotong Univ Sci, 2024, 29(6): 1081-1090.
[6]	鄢丛强1,2, 郭正玉3,4, 蔡云泽 1,2. 基于改进CycleGAN的SAR图像舰船尾迹数据增强[J]. J Shanghai Jiaotong Univ Sci, 2024, 29(4): 702-711.
[7]	LONARE Savita1,2, BHRAMARAMBA Ravi2. 基于图卷积网络的联邦式隐私保护交通预测方法[J]. J Shanghai Jiaotong Univ Sci, 2024, 29(3): 509-517.
[8]	吕峰，王新彦，李磊，江泉，易政洋. 基于嵌入式YOLO轻量级网络的树木检测算法[J]. J Shanghai Jiaotong Univ Sci, 2024, 29(3): 518-527.
[9]	宋立博a，费燕琼b. 新型Lite YOLOv4-Tiny算法及其在裂纹智能检测中的应用[J]. J Shanghai Jiaotong Univ Sci, 2024, 29(3): 528-536.
[10]	沈傲1, 2，胡冀苏2, 3，金鹏飞4，周志勇2，钱旭升2, 3，郑毅2，包婕4，王希明4，戴亚康1, 2. 基于课程学习训练的聚合注意力网络Multi-SEANet用于MRI图像的格里森级别组无创预测[J]. J Shanghai Jiaotong Univ Sci, 2024, 29(1): 109-119.
[11]	薛永波a，刘钊b，李泽阳a，朱平a. 基于改进分水岭算法和U-net神经网络模型的复合材料CT图像分割方法[J]. J Shanghai Jiaotong Univ Sci, 2023, 28(6): 783-792.
[12]	. 行人轨迹预测的动作感知编码器–解码器网络[J]. J Shanghai Jiaotong Univ Sci, 2023, 28(1): 20-27.
[13]	SONG Hao-hao (宋好好), LU Zhen (陆臻). Image Fusion Scheme Based on Nonsubsampled Contourlet and Block-Based Cosine Transform[J]. J Shanghai Jiaotong Univ Sci, 2012, 17(1): 8-012.