Intelligent Robots

BEV-Fused Imitation and Reinforcement Learning for Autonomous Driving Planning

Expand
  • 1. Institute of Intelligent Vehicles, Shanghai Jiao Tong University, Shanghai 200240, China; 2. School of Mechanical Engineering, Shanghai Jiao Tong University, Shanghai 200240, China

Received date: 2024-12-16

  Revised date: 2025-02-25

  Accepted date: 2025-03-21

  Online published: 2025-10-14

Abstract

End-to-end autonomous driving technology breaks the constraints of traditional modular pipeline approaches by integrating perception, prediction, and planning within a single framework, achieving global optimization. Current end-to-end frameworks typically rely on deep learning planning, which requires extensive offline data for training. Deep reinforcement learning (DRL) algorithms are also popular, as they allow agents to adapt to environmental changes through reward functions. However, these frameworks cannot implement backpropagation with the perception module. Each approach has its strengths and weaknesses. This paper combines both frameworks by developing a bird’s eye view (BEV) feature extraction network to capture key traffic flow information, creating an end-to-end DRL planning framework based on BEV features. This shift transforms the technology from data-driven to behavior-driven. To improve training speed and quality, we propose an advanced imitation learning algorithm, validated through simulations in the CARLA simulator. Experimental results show that our approach outperforms other frameworks, enhancing the agent’s safety and efficiency.

Cite this article

Xia Jie, Wu Xiaodong, Xu Min . BEV-Fused Imitation and Reinforcement Learning for Autonomous Driving Planning[J]. Journal of Shanghai Jiaotong University(Science), 2026 , 31(1) : 154 -166 . DOI: 10.1007/s12204-025-2851-3

References

[1] CHEN L, WU P H, CHITTA K, et al. End-to-end autonomous driving: Challenges and frontiers [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024, 46(12): 10164-10183.
[2] HU S C, CHEN L, WU P H, et al. ST-P3: End-to-end vision-based autonomous driving viaSpatial-temporal feature learning [M]//Computer Vision – ECCV 2022. Cham: Springer, 2022: 533-549.
[3] HU Y H, YANG J Z, CHEN L, et al. Planning-oriented autonomous driving [C]//2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Vancouver: IEEE, 2023: 17853-17862.
[4] YE T, JING W, HU C, et al. FusionAD: Multi-modality fusion for prediction and planning tasks of autonomous driving [DB/OL]. (2023-08-02). https://arxiv.org/abs/2308.01006 
[5] MUHAMMAD K, ULLAH A, LLORET J, et al. Deep learning for safe autonomous driving: Current challenges and future directions [J]. IEEE Transactions on Intelligent Transportation Systems, 2021, 22(7): 4316-4336.
[6] BASILE G, LECCESE S, PETRILLO A, et al. Sustainable DDPG-based path tracking for connected autonomous electric vehicles in extra-urban scenarios [J]. IEEE Transactions on Industry Applications, 2024, 60(6): 9237-9250.
[7] REN Y G, DUAN J L, LI S E, et al. Improving generalization of reinforcement learning with minimax distributional soft actor-critic [C]//2020 IEEE 23rd International Conference on Intelligent Transportation Systems. Rhodes: IEEE, 2020: 1-6.
[8] LI S Y, LI M Z, JING Z L. Multi-agent path planning method based on improved deep Q-network in dynamic environments [J]. Journal of Shanghai Jiao Tong University (Science), 2024, 29(4): 601-612.
[9] WU J D, HUANG Z Y, HANG P, et al. Digital twin-enabled reinforcement learning for end-to-end autonomous driving [C]//2021 IEEE 1st International Conference on Digital Twins and Parallel Intelligence. Beijing: IEEE, 2021: 62-65.
[10] HUANG Z Q, ZHANG J, TIAN R, et al. End-to-end autonomous driving decision based on deep reinforcement learning [C]//2019 5th International Conference on Control, Automation and Robotics. Beijing: IEEE, 2019: 658-662.
[11] LI H Y, SIMA C, DAI J F, et al. Delving into the Devils of bird’s-eye-view perception: A review, evaluation and recipe [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024, 46(4): 2151-2170.
[12] MA Y X, WANG T, BAI X Y, et al. Vision-centric BEV perception: A survey [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024, 46(12): 10978-10997.
[13] LI Z Q, WANG W H, LI H Y, et al. BEVFormer: Learning bird’s-eye-view representation from multi-camera images via spatiotemporal transformers [M]//Computer Vision – ECCV 2022. Cham: Springer, 2022: 1-18.
[14] LIU Z J, TANG H T, AMINI A, et al. BEVFusion: Multi-task multi-sensor fusion with unified bird’s-eye view representation [C]//2023 IEEE International Conference on Robotics and Automation. London: IEEE, 2023: 2774-2781.
[15] RAMRAKHYA R, BATRA D, WIJMANS E, et al. PIRLNav: Pretraining with imitation and RL finetuning for OBJECTNAV [C]//2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Vancouver: IEEE, 2023: 17896-17906.
[16] Y G D, NAIR N G, SATPATHY P, et al. Covariate shift: A review and analysis on classifiers [C]//2019 Global Conference for Advancement in Technology. Bangalore: IEEE, 2019: 1-6.
[17] KURNIAWATI H. Partially observable Markov decision processes (POMDPs) and robotics [DB/OL]. (2021-07-15). https://arxiv.org/abs/2107.07599
[18] HUBMANN C, SCHULZ J, BECKER M, et al. Automated driving in uncertain environments: Planning with interaction and uncertain maneuver prediction [J]. IEEE Transactions on Intelligent Vehicles, 2018, 3(1): 5-17.
[19] DOSOVITSKIY A, ROS G, CODEVILLA F, et al. CARLA: An open urban driving simulator [C]// 1st Annual Conference on Robot Learning. Mountain View: PMLR, 2017: 1-16.
[20] LI Y. Deep reinforcement learning: An overview [DB/OL]. (2017-01-25). https://arxiv.org/abs/1701.07274
[21] PHILION J, FIDLER S. Lift, splat, shoot: Encoding images from arbitrary camera rigs by implicitly unprojecting to 3D [M]//Computer Vision – ECCV 2020. Cham: Springer, 2020: 194-210.
[22] HUANG J, HUANG G, ZHU Z, et al. BEVDet: High-performance multi-camera 3D object detection in bird-eye-view [DB/OL]. (2021-12-22). https://arxiv.org/abs/2112.11790
[23] CHEN X Z, MA H M, WAN J, et al. Multi-view 3D object detection network for autonomous driving [C]//2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017: 6526-6534.
[24] O'SHEA K, NASH R. An introduction to convolutional neural networks [DB/OL]. (2015-11-26). https://arxiv.org/abs/1511.08458
[25] SCHULMAN J, WOLSKI F, DHARIWAL P, et al. Proximal policy optimization algorithms [DB/OL]. (2017-07-20). https://arxiv.org/abs/1707.06347 
[26] SCHULMAN J, MORITZ P, LEVINE S, et al. High-dimensional continuous control using generalized advantage estimation [DB/OL]. (2015-06-08). https://arxiv.org/abs/1506.02438
[27] ZARE M, KEBRIA P M, KHOSRAVI A, et al. A survey of imitation learning: Algorithms, recent developments, and challenges [J]. IEEE Transactions on Cybernetics, 2024, 54(12): 7173-7186.
[28] ZHANG Z J, LINIGER A, DAI D X, et al. End-to-end urban driving by imitating a reinforcement learning coach [C]//2021 IEEE/CVF International Conference on Computer Vision. Montreal: IEEE, 2021: 15202-15212.

Outlines

/