J Shanghai Jiaotong Univ Sci ›› 2026, Vol. 31 ›› Issue (1): 154-166.doi: 10.1007/s12204-025-2851-3

• Intelligent Robots • Previous Articles     Next Articles

BEV-Fused Imitation and Reinforcement Learning for Autonomous Driving Planning

融合鸟瞰图特征的模仿与强化学习自动驾驶规划方法

夏洁1,吴晓东1,许敏2   

  1. 1. Institute of Intelligent Vehicles, Shanghai Jiao Tong University, Shanghai 200240, China; 2. School of Mechanical Engineering, Shanghai Jiao Tong University, Shanghai 200240, China
  2. 1. 上海交通大学 智能汽车研究所,上海200240;2. 上海交通大学 机械与动力工程学院,上海200240
  • Received:2024-12-16 Revised:2025-02-25 Accepted:2025-03-21 Online:2026-02-28 Published:2025-10-14

Abstract: End-to-end autonomous driving technology breaks the constraints of traditional modular pipeline approaches by integrating perception, prediction, and planning within a single framework, achieving global optimization. Current end-to-end frameworks typically rely on deep learning planning, which requires extensive offline data for training. Deep reinforcement learning (DRL) algorithms are also popular, as they allow agents to adapt to environmental changes through reward functions. However, these frameworks cannot implement backpropagation with the perception module. Each approach has its strengths and weaknesses. This paper combines both frameworks by developing a bird’s eye view (BEV) feature extraction network to capture key traffic flow information, creating an end-to-end DRL planning framework based on BEV features. This shift transforms the technology from data-driven to behavior-driven. To improve training speed and quality, we propose an advanced imitation learning algorithm, validated through simulations in the CARLA simulator. Experimental results show that our approach outperforms other frameworks, enhancing the agent’s safety and efficiency.

Key words: end-to-end technology, deep reinforcement learning (DRL), bird’s eye view (BEV), imitation learning

摘要: 端到端自动驾驶技术打破了传统自动驾驶技术模块化管道形式的桎梏,将感知、预测、规划集成在一个框架下,实现了全局优化。目前较为典型的端到端(E2E)框架都是基于深度学习规划的,需要大量真实世界的离线数据对网络进行训练,而数据的获取与管理是一件费时又费力的事情。基于深度强化学习(DRL)算法进行规划也是当今流行的一种自动驾驶技术,深度强化学习算法能够促使智能体在环境突变时通过奖励函数的引导实现自适应,但这类学习框架与感知模块之间没有强关联性,即无法实现反向传播。上述的两类学习框架各有优缺点,本文选择将两个框架融合在一起,并搭建了鸟瞰图(BEV)特征提取网络从相机拍摄的图像中提取关键交通流信息,最终构建出基于BEV特征的端到端深度强化学习规划框架,该框架使得端到端自动驾驶技术由数据驱动转化为行为驱动。为了提高网络训练速度与质量,本文还提出了先进的模仿学习算法。所提出的算法最后在CARLA仿真器中进行仿真验证,实验结果证明该算法优于其他框架下的算法,能够进一步提高智能体的安全性、高效性等。

关键词: 端到端技术,深度强化学习,鸟瞰视图,模仿学习

CLC Number: