J Shanghai Jiaotong Univ Sci ›› 2023, Vol. 28 ›› Issue (1): 100-113.doi: 10.1007/s12204-023-2573-3

Previous Articles     Next Articles

Birds-Eye-View Semantic Segmentation and Voxels Semantic Segmentation Based on Frustum Voxels Modeling and Monocular Camera

基于锥型体素建模和单目相机的鸟瞰图语义分割和体素语义分割

QIN Chao1 (秦 超), WANG Yafei1 (王亚飞), ZHANG Yuchao2 (张宇超), YIN Chengliang1∗ (殷承良)   

  1. (1. School of Mechanical Engineering, Shanghai Jiao Tong University, Shanghai 200240, China; 2. Shanghai Intelligent and Connected Vehicle R&D Center Co., Ltd., Shanghai 201499, China)
  2. (1.上海交通大学 机械与动力工程学院,上海200240;2. 上海智能网联汽车技术中心有限公司,上海201499)
  • Received:2022-03-08 Online:2023-01-28 Published:2023-02-10

Abstract: The semantic segmentation of a bird’s-eye view (BEV) is crucial for environment perception in autonomous driving, which includes the static elements of the scene, such as drivable areas, and dynamic elements such as cars. This paper proposes an end-to-end deep learning architecture based on 3D convolution to predict the semantic segmentation of a BEV, as well as voxel semantic segmentation, from monocular images. The voxelization of scenes and feature transformation from the perspective space to camera space are the key approaches of this model to boost the prediction accuracy. The effectiveness of the proposed method was demonstrated by training and evaluating the model on the NuScenes dataset. A comparison with other state-of-the-art methods showed that the proposed approach outperformed other approaches in the semantic segmentation of a BEV. It also implements voxel semantic segmentation, which cannot be achieved by the state-of-the-art methods.

Key words: semantic segmentation, voxel semantic segmentation, deep learning, convolution neural network, bird’s-eye view (BEV)

摘要: 自动驾驶场景中包含静态目标,如可驾驶区域,以及动态目标,如汽车, 而鸟瞰图的语义分割对于自主驾驶中的环境感知至关重要。本文提出了一个基于三维卷积的端到端深度学习模型以单目相机作为输入并预测鸟瞰图的语义分割和体素语义分割。场景的体素化建模和透视空间到相机空间的特征转换是提高本模型预测准确性的的关键方法。本模型在NuScenes数据集上进行训练并评估该方法的有效性。与其他经典模型的对比结果表明本文提出的模型在鸟瞰图的语义分割方面优于其他算法。此外本文模型还实现了体素语义分割,而其他模型并不具备体素语义分割的能力。

关键词: 语义分割, 体素语义分割, 深度学习, 卷积神经网络,鸟瞰图

CLC Number: