The semantic segmentation of a bird’s-eye view (BEV) is crucial for environment perception in
autonomous driving, which includes the static elements of the scene, such as drivable areas, and dynamic elements
such as cars. This paper proposes an end-to-end deep learning architecture based on 3D convolution to predict the
semantic segmentation of a BEV, as well as voxel semantic segmentation, from monocular images. The voxelization
of scenes and feature transformation from the perspective space to camera space are the key approaches of this
model to boost the prediction accuracy. The effectiveness of the proposed method was demonstrated by training
and evaluating the model on the NuScenes dataset. A comparison with other state-of-the-art methods showed that
the proposed approach outperformed other approaches in the semantic segmentation of a BEV. It also implements
voxel semantic segmentation, which cannot be achieved by the state-of-the-art methods.
QIN Chao1 (秦 超), WANG Yafei1 (王亚飞), ZHANG Yuchao2 (张宇超), YIN Chengliang1∗ (殷承良)
. Birds-Eye-View Semantic Segmentation and Voxels Semantic Segmentation Based on Frustum Voxels Modeling and Monocular Camera[J]. Journal of Shanghai Jiaotong University(Science), 2023
, 28(1)
: 100
-113
.
DOI: 10.1007/s12204-023-2573-3
[1] BADRINARAYANAN V, KENDALL A, CIPOLLA R. SegNet: A deep convolutional encoder-decoder architecture for image segmentation [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(12): 2481-2495.
[2] READING C, HARAKEH A, CHAE J L, et al. Categorical depth distribution network for monocular 3D object detection [C]//2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville: IEEE, 2021: 8551-8560.
[3] ABBAS S A, ZISSERMAN A. A geometric approach to obtain a bird’s eye view from an image [C]//2019 IEEE/CVF International Conference on Computer Vision Workshop. Seoul: IEEE, 2019: 4095-4104.
[4] LIN C C, WANG M S. A vision based top-view transformation model for a vehicle parking assistant [J]. Sensors, 2012, 12(4): 4431-4446.
[5] DENG L Y, YANG M, LI H, et al. Restricted deformable convolution-based road scene semantic segmentation using surround view cameras [J]. IEEE Transactions on Intelligent Transportation Systems, 2020, 21(10): 4350-4362.
[6] S?MANN T, AMENDE K, MILZ S, et al. Efficient semantic segmentation for visual bird’s-eye view interpretation [M]//Intelligent autonomous systems 15. Cham: Springer, 2018: 679-688.
[7] PAN B W, SUN J K, LEUNG H Y T, et al. Crossview semantic segmentation for sensing surroundings [J]. IEEE Robotics and Automation Letters, 2020, 5(3): 4867-4873.
[8] LU C Y, VAN DE MOLENGRAFT M J G, DUBBELMAN G. Monocular semantic occupancy grid mapping with convolutional variational encoder–decoder networks [J]. IEEE Robotics and Automation Letters, 2019, 4(2): 445-452.
[9] SCHULTER S, ZHAI M H, JACOBS N, et al. Learning to look around objects for top-view representations of outdoor scenes [M]//Computer vision – ECCV 2018. Cham: Springer, 2018: 815-831.
[10] MANI K, DAGA S, GARG S, et al. MonoLayout: Amodal scene layout from a single image [C]//2020 IEEE Winter Conference on Applications of Computer Vision. Snowmass: IEEE, 2020: 1678-1686.
[11] RODDICK T, CIPOLLA R. Predicting semantic map representations from images using pyramid occupancy networks [C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020: 11135-11144.
[12] RONNEBERGER O, FISCHER P, BROX T. U-Net: Convolutional networks for biomedical image segmentation [M]//Medical image computing and computerassisted intervention – MICCAI 2015. Cham: Springer, 2015: 234-241.
[13] DING X H, ZHANG X Y, MA N N, et al. RepVGG: making VGG-style ConvNets great again [C]//2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville: IEEE, 2021: 13728-13737.
[14] LIN T Y, GOYAL P, GIRSHICK R, et al. Focal loss fordense object detection [C]//2017 IEEE International Conference on Computer Vision. Venice: IEEE, 2017: 2999-3007.
[15] CAESAR H, BANKITI V, LANG A H, et al. nuScenes: A multimodal dataset for autonomous driving [C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020: 11618-11628.
[16] KINGMA D P, BA J. Adam: A method for stochastic optimization[DB/OL]. (2017-01-30). https://arxiv.org/abs/1412.6980.
[17] GARCIA-GARCIA A, ORTS-ESCOLANO S, OPREA S, et al. A review on deep learning techniques applied to semantic segmentation [DB/OL]. (2017-04-22). https://arxiv.org/abs/1704.06857.