BEV-Fused Imitation and Reinforcement Learning for Autonomous Driving Planning

doi:10.1007/s12204-025-2851-3

Abstract

Abstract: End-to-end autonomous driving technology breaks the constraints of traditional modular pipeline approaches by integrating perception, prediction, and planning within a single framework, achieving global optimization. Current end-to-end frameworks typically rely on deep learning planning, which requires extensive offline data for training. Deep reinforcement learning (DRL) algorithms are also popular, as they allow agents to adapt to environmental changes through reward functions. However, these frameworks cannot implement backpropagation with the perception module. Each approach has its strengths and weaknesses. This paper combines both frameworks by developing a bird’s eye view (BEV) feature extraction network to capture key traffic flow information, creating an end-to-end DRL planning framework based on BEV features. This shift transforms the technology from data-driven to behavior-driven. To improve training speed and quality, we propose an advanced imitation learning algorithm, validated through simulations in the CARLA simulator. Experimental results show that our approach outperforms other frameworks, enhancing the agent’s safety and efficiency.

Key words: end-to-end technology, deep reinforcement learning (DRL), bird’s eye view (BEV), imitation learning

摘要： 端到端自动驾驶技术打破了传统自动驾驶技术模块化管道形式的桎梏，将感知、预测、规划集成在一个框架下，实现了全局优化。目前较为典型的端到端（E2E）框架都是基于深度学习规划的，需要大量真实世界的离线数据对网络进行训练，而数据的获取与管理是一件费时又费力的事情。基于深度强化学习（DRL）算法进行规划也是当今流行的一种自动驾驶技术，深度强化学习算法能够促使智能体在环境突变时通过奖励函数的引导实现自适应，但这类学习框架与感知模块之间没有强关联性，即无法实现反向传播。上述的两类学习框架各有优缺点，本文选择将两个框架融合在一起，并搭建了鸟瞰图（BEV）特征提取网络从相机拍摄的图像中提取关键交通流信息，最终构建出基于BEV特征的端到端深度强化学习规划框架，该框架使得端到端自动驾驶技术由数据驱动转化为行为驱动。为了提高网络训练速度与质量，本文还提出了先进的模仿学习算法。所提出的算法最后在CARLA仿真器中进行仿真验证，实验结果证明该算法优于其他框架下的算法，能够进一步提高智能体的安全性、高效性等。

关键词: 端到端技术，深度强化学习，鸟瞰视图，模仿学习

CLC Number:

TP242.6

Xia Jie, Wu Xiaodong, Xu Min. BEV-Fused Imitation and Reinforcement Learning for Autonomous Driving Planning[J]. J Shanghai Jiaotong Univ Sci, 2026, 31(1): 154-166.

References

[1] CHEN L, WU P H, CHITTA K, et al. End-to-end autonomous driving: Challenges and frontiers [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024, 46(12): 10164-10183.
[2] HU S C, CHEN L, WU P H, et al. ST-P3: End-to-end vision-based autonomous driving viaSpatial-temporal feature learning [M]//Computer Vision – ECCV 2022. Cham: Springer, 2022: 533-549.
[3] HU Y H, YANG J Z, CHEN L, et al. Planning-oriented autonomous driving [C]//2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Vancouver: IEEE, 2023: 17853-17862.
[4] YE T, JING W, HU C, et al. FusionAD: Multi-modality fusion for prediction and planning tasks of autonomous driving [DB/OL]. (2023-08-02). https://arxiv.org/abs/2308.01006
[5] MUHAMMAD K, ULLAH A, LLORET J, et al. Deep learning for safe autonomous driving: Current challenges and future directions [J]. IEEE Transactions on Intelligent Transportation Systems, 2021, 22(7): 4316-4336.
[6] BASILE G, LECCESE S, PETRILLO A, et al. Sustainable DDPG-based path tracking for connected autonomous electric vehicles in extra-urban scenarios [J]. IEEE Transactions on Industry Applications, 2024, 60(6): 9237-9250.
[7] REN Y G, DUAN J L, LI S E, et al. Improving generalization of reinforcement learning with minimax distributional soft actor-critic [C]//2020 IEEE 23rd International Conference on Intelligent Transportation Systems. Rhodes: IEEE, 2020: 1-6.
[8] LI S Y, LI M Z, JING Z L. Multi-agent path planning method based on improved deep Q-network in dynamic environments [J]. Journal of Shanghai Jiao Tong University (Science), 2024, 29(4): 601-612.
[9] WU J D, HUANG Z Y, HANG P, et al. Digital twin-enabled reinforcement learning for end-to-end autonomous driving [C]//2021 IEEE 1st International Conference on Digital Twins and Parallel Intelligence. Beijing: IEEE, 2021: 62-65.
[10] HUANG Z Q, ZHANG J, TIAN R, et al. End-to-end autonomous driving decision based on deep reinforcement learning [C]//2019 5th International Conference on Control, Automation and Robotics. Beijing: IEEE, 2019: 658-662.
[11] LI H Y, SIMA C, DAI J F, et al. Delving into the Devils of bird’s-eye-view perception: A review, evaluation and recipe [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024, 46(4): 2151-2170.
[12] MA Y X, WANG T, BAI X Y, et al. Vision-centric BEV perception: A survey [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024, 46(12): 10978-10997.
[13] LI Z Q, WANG W H, LI H Y, et al. BEVFormer: Learning bird’s-eye-view representation from multi-camera images via spatiotemporal transformers [M]//Computer Vision – ECCV 2022. Cham: Springer, 2022: 1-18.
[14] LIU Z J, TANG H T, AMINI A, et al. BEVFusion: Multi-task multi-sensor fusion with unified bird’s-eye view representation [C]//2023 IEEE International Conference on Robotics and Automation. London: IEEE, 2023: 2774-2781.
[15] RAMRAKHYA R, BATRA D, WIJMANS E, et al. PIRLNav: Pretraining with imitation and RL finetuning for OBJECTNAV [C]//2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Vancouver: IEEE, 2023: 17896-17906.
[16] Y G D, NAIR N G, SATPATHY P, et al. Covariate shift: A review and analysis on classifiers [C]//2019 Global Conference for Advancement in Technology. Bangalore: IEEE, 2019: 1-6.
[17] KURNIAWATI H. Partially observable Markov decision processes (POMDPs) and robotics [DB/OL]. (2021-07-15). https://arxiv.org/abs/2107.07599
[18] HUBMANN C, SCHULZ J, BECKER M, et al. Automated driving in uncertain environments: Planning with interaction and uncertain maneuver prediction [J]. IEEE Transactions on Intelligent Vehicles, 2018, 3(1): 5-17.
[19] DOSOVITSKIY A, ROS G, CODEVILLA F, et al. CARLA: An open urban driving simulator [C]// 1st Annual Conference on Robot Learning. Mountain View: PMLR, 2017: 1-16.
[20] LI Y. Deep reinforcement learning: An overview [DB/OL]. (2017-01-25). https://arxiv.org/abs/1701.07274
[21] PHILION J, FIDLER S. Lift, splat, shoot: Encoding images from arbitrary camera rigs by implicitly unprojecting to 3D [M]//Computer Vision – ECCV 2020. Cham: Springer, 2020: 194-210.
[22] HUANG J, HUANG G, ZHU Z, et al. BEVDet: High-performance multi-camera 3D object detection in bird-eye-view [DB/OL]. (2021-12-22). https://arxiv.org/abs/2112.11790
[23] CHEN X Z, MA H M, WAN J, et al. Multi-view 3D object detection network for autonomous driving [C]//2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017: 6526-6534.
[24] O'SHEA K, NASH R. An introduction to convolutional neural networks [DB/OL]. (2015-11-26). https://arxiv.org/abs/1511.08458
[25] SCHULMAN J, WOLSKI F, DHARIWAL P, et al. Proximal policy optimization algorithms [DB/OL]. (2017-07-20). https://arxiv.org/abs/1707.06347
[26] SCHULMAN J, MORITZ P, LEVINE S, et al. High-dimensional continuous control using generalized advantage estimation [DB/OL]. (2015-06-08). https://arxiv.org/abs/1506.02438
[27] ZARE M, KEBRIA P M, KHOSRAVI A, et al. A survey of imitation learning: Algorithms, recent developments, and challenges [J]. IEEE Transactions on Cybernetics, 2024, 54(12): 7173-7186.
[28] ZHANG Z J, LINIGER A, DAI D X, et al. End-to-end urban driving by imitating a reinforcement learning coach [C]//2021 IEEE/CVF International Conference on Computer Vision. Montreal: IEEE, 2021: 15202-15212.