J Shanghai Jiaotong Univ Sci ›› 2024, Vol. 29 ›› Issue (4): 601-612.doi: 10.1007/s12204-024-2732-1

• •    下一篇

动态环境下基于改进DQN的多智能体路径规划方法

李舒逸,李旻哲,敬忠良*   

  1. (上海交通大学 航空航天学院,上海200240)
  • 接受日期:2023-10-12 出版日期:2024-07-14 发布日期:2024-07-14

Multi-Agent Path Planning Method Based on Improved Deep Q-Network in Dynamic Environments

LI Shuyi (李舒逸), LI Minzhe (李旻哲), JING Zhongliang (敬忠良)   

  1. (School of Aeronautics and Astronautics, Shanghai Jiao Tong University, Shanghai 200240, China)
  • Accepted:2023-10-12 Online:2024-07-14 Published:2024-07-14

摘要: 动态环境中多智能体路径规划问题一直是一个挑战,主要是由于障碍物位置的不断变化以及智能体之间复杂的相互作用。这些因素导致解决方案收敛速度慢,甚至在某些情况下完全发散。为了解决这个问题,提出了一种利用双重决斗深度Q网络(D3QN)的新方法,适用于动态多智能体和复杂环境。设计了一种基于多智能体位置约束的新奖励函数,并采纳了一种基于增量学习的训练策略,以实现多个智能体的协作路径规划。此外,为了避免收敛到局部极值,引入了贪婪和玻尔兹曼概率选择策略来进行行动选择。为了融合雷达和图像传感器数据,构建了卷积神经网络-长短时记忆(CNN-LSTM)架构,以提取多源测量的特征作为D3QN的输入。同时,在使用机器人操作系统和Gazebo的模拟环境中验证了算法的效能和可靠性。仿真结果显示,所提出的算法为动态场景中的路径规划任务提供了实时解决方案。在平均成功率和准确性方面,所提出的方法优于其他几种深度学习算法,而且收敛速度也得到了提升。

关键词: 多智能体, 路径规划, 深度强化学习, 深度Q网络

Abstract: The multi-agent path planning problem presents significant challenges in dynamic environments, primarily due to the ever-changing positions of obstacles and the complex interactions between agents’ actions. These factors contribute to a tendency for the solution to converge slowly, and in some cases, diverge altogether. In addressing this issue, this paper introduces a novel approach utilizing a double dueling deep Q-network (D3QN), tailored for dynamic multi-agent environments. A novel reward function based on multi-agent positional constraints is designed, and a training strategy based on incremental learning is performed to achieve collaborative path planning of multiple agents. Moreover, the greedy and Boltzmann probability selection policy is introduced for action selection and avoiding convergence to local extremum. To match radar and image sensors, a convolutional neural network - long short-term memory (CNN-LSTM) architecture is constructed to extract the feature of multi-source measurement as the input of the D3QN. The algorithm’s efficacy and reliability are validated in a simulated environment, utilizing robot operating system and Gazebo. The simulation results show that the proposed algorithm provides a real-time solution for path planning tasks in dynamic scenarios. In terms of the average success rate and accuracy, the proposed method is superior to other deep learning algorithms, and the convergence speed is also improved.

中图分类号: