J Shanghai Jiaotong Univ Sci ›› 2023, Vol. 28 ›› Issue (1): 100-113.doi: 10.1007/s12204-023-2573-3
收稿日期:
2022-03-08
出版日期:
2023-01-28
发布日期:
2023-02-10
QIN Chao1 (秦 超), WANG Yafei1 (王亚飞), ZHANG Yuchao2 (张宇超), YIN Chengliang1∗ (殷承良)
Received:
2022-03-08
Online:
2023-01-28
Published:
2023-02-10
摘要: 自动驾驶场景中包含静态目标,如可驾驶区域,以及动态目标,如汽车, 而鸟瞰图的语义分割对于自主驾驶中的环境感知至关重要。本文提出了一个基于三维卷积的端到端深度学习模型以单目相机作为输入并预测鸟瞰图的语义分割和体素语义分割。场景的体素化建模和透视空间到相机空间的特征转换是提高本模型预测准确性的的关键方法。本模型在NuScenes数据集上进行训练并评估该方法的有效性。与其他经典模型的对比结果表明本文提出的模型在鸟瞰图的语义分割方面优于其他算法。此外本文模型还实现了体素语义分割,而其他模型并不具备体素语义分割的能力。
中图分类号:
. 基于锥型体素建模和单目相机的鸟瞰图语义分割和体素语义分割[J]. J Shanghai Jiaotong Univ Sci, 2023, 28(1): 100-113.
QIN Chao1 (秦 超), WANG Yafei1 (王亚飞), ZHANG Yuchao2 (张宇超), YIN Chengliang1∗ (殷承良). Birds-Eye-View Semantic Segmentation and Voxels Semantic Segmentation Based on Frustum Voxels Modeling and Monocular Camera[J]. J Shanghai Jiaotong Univ Sci, 2023, 28(1): 100-113.
[1] BADRINARAYANAN V, KENDALL A, CIPOLLA R. SegNet: A deep convolutional encoder-decoder architecture for image segmentation [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(12): 2481-2495. [2] READING C, HARAKEH A, CHAE J L, et al. Categorical depth distribution network for monocular 3D object detection [C]//2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville: IEEE, 2021: 8551-8560. [3] ABBAS S A, ZISSERMAN A. A geometric approach to obtain a bird’s eye view from an image [C]//2019 IEEE/CVF International Conference on Computer Vision Workshop. Seoul: IEEE, 2019: 4095-4104. [4] LIN C C, WANG M S. A vision based top-view transformation model for a vehicle parking assistant [J]. Sensors, 2012, 12(4): 4431-4446. [5] DENG L Y, YANG M, LI H, et al. Restricted deformable convolution-based road scene semantic segmentation using surround view cameras [J]. IEEE Transactions on Intelligent Transportation Systems, 2020, 21(10): 4350-4362. [6] S?MANN T, AMENDE K, MILZ S, et al. Efficient semantic segmentation for visual bird’s-eye view interpretation [M]//Intelligent autonomous systems 15. Cham: Springer, 2018: 679-688. [7] PAN B W, SUN J K, LEUNG H Y T, et al. Crossview semantic segmentation for sensing surroundings [J]. IEEE Robotics and Automation Letters, 2020, 5(3): 4867-4873. [8] LU C Y, VAN DE MOLENGRAFT M J G, DUBBELMAN G. Monocular semantic occupancy grid mapping with convolutional variational encoder–decoder networks [J]. IEEE Robotics and Automation Letters, 2019, 4(2): 445-452. [9] SCHULTER S, ZHAI M H, JACOBS N, et al. Learning to look around objects for top-view representations of outdoor scenes [M]//Computer vision – ECCV 2018. Cham: Springer, 2018: 815-831. [10] MANI K, DAGA S, GARG S, et al. MonoLayout: Amodal scene layout from a single image [C]//2020 IEEE Winter Conference on Applications of Computer Vision. Snowmass: IEEE, 2020: 1678-1686. [11] RODDICK T, CIPOLLA R. Predicting semantic map representations from images using pyramid occupancy networks [C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020: 11135-11144. [12] RONNEBERGER O, FISCHER P, BROX T. U-Net: Convolutional networks for biomedical image segmentation [M]//Medical image computing and computerassisted intervention – MICCAI 2015. Cham: Springer, 2015: 234-241. [13] DING X H, ZHANG X Y, MA N N, et al. RepVGG: making VGG-style ConvNets great again [C]//2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville: IEEE, 2021: 13728-13737. [14] LIN T Y, GOYAL P, GIRSHICK R, et al. Focal loss fordense object detection [C]//2017 IEEE International Conference on Computer Vision. Venice: IEEE, 2017: 2999-3007. [15] CAESAR H, BANKITI V, LANG A H, et al. nuScenes: A multimodal dataset for autonomous driving [C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020: 11618-11628. [16] KINGMA D P, BA J. Adam: A method for stochastic optimization[DB/OL]. (2017-01-30). https://arxiv.org/abs/1412.6980. [17] GARCIA-GARCIA A, ORTS-ESCOLANO S, OPREA S, et al. A review on deep learning techniques applied to semantic segmentation [DB/OL]. (2017-04-22). https://arxiv.org/abs/1704.06857. |
[1] | . 血管介入手术路径规划及三维视觉导航[J]. J Shanghai Jiaotong Univ Sci, 2025, 30(3): 472-481. |
[2] | . 基于改进FCOS算法的钢丝绳芯输送带损伤X射线图像检测[J]. J Shanghai Jiaotong Univ Sci, 2025, 30(2): 309-318. |
[3] | . 基于双流自编码器的无监督动作识别[J]. J Shanghai Jiaotong Univ Sci, 2025, 30(2): 330-336. |
[4] | . 基于空间特征学习与多粒度特征融合的行人重识别[J]. J Shanghai Jiaotong Univ Sci, 2025, 30(2): 363-374. |
[5] | 周苏, 钟泽滨. 基于车载智能手机的实时车辆及行人测距[J]. J Shanghai Jiaotong Univ Sci, 2024, 29(6): 1081-1090. |
[6] | 鄢丛强1,2, 郭正玉3,4, 蔡云泽 1,2. 基于改进CycleGAN的SAR图像舰船尾迹数据增强[J]. J Shanghai Jiaotong Univ Sci, 2024, 29(4): 702-711. |
[7] | LONARE Savita1,2, BHRAMARAMBA Ravi2. 基于图卷积网络的联邦式隐私保护交通预测方法[J]. J Shanghai Jiaotong Univ Sci, 2024, 29(3): 509-517. |
[8] | 吕峰,王新彦,李磊,江泉,易政洋. 基于嵌入式YOLO轻量级网络的树木检测算法[J]. J Shanghai Jiaotong Univ Sci, 2024, 29(3): 518-527. |
[9] | 宋立博a,费燕琼b. 新型Lite YOLOv4-Tiny算法及其在裂纹智能检测中的应用[J]. J Shanghai Jiaotong Univ Sci, 2024, 29(3): 528-536. |
[10] | 沈傲1, 2,胡冀苏2, 3,金鹏飞4,周志勇2,钱旭升2, 3,郑毅2,包婕4,王希明4,戴亚康1, 2. 基于课程学习训练的聚合注意力网络Multi-SEANet用于MRI图像的格里森级别组无创预测[J]. J Shanghai Jiaotong Univ Sci, 2024, 29(1): 109-119. |
[11] | 薛永波a,刘 钊b,李泽阳a,朱 平a. 基于改进分水岭算法和U-net神经网络模型的复合材料CT图像分割方法[J]. J Shanghai Jiaotong Univ Sci, 2023, 28(6): 783-792. |
[12] | . 行人轨迹预测的动作感知编码器–解码器网络[J]. J Shanghai Jiaotong Univ Sci, 2023, 28(1): 20-27. |
[13] | SONG Hao-hao (宋好好), LU Zhen (陆 臻). Image Fusion Scheme Based on Nonsubsampled Contourlet and Block-Based Cosine Transform[J]. J Shanghai Jiaotong Univ Sci, 2012, 17(1): 8-012. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||