Multi-Agent Path Planning Method Based on Improved Deep Q-Network in Dynamic Environments

doi:10.1007/s12204-024-2732-1

Abstract

Abstract: The multi-agent path planning problem presents significant challenges in dynamic environments, primarily due to the ever-changing positions of obstacles and the complex interactions between agents’ actions. These factors contribute to a tendency for the solution to converge slowly, and in some cases, diverge altogether. In addressing this issue, this paper introduces a novel approach utilizing a double dueling deep Q-network (D3QN), tailored for dynamic multi-agent environments. A novel reward function based on multi-agent positional constraints is designed, and a training strategy based on incremental learning is performed to achieve collaborative path planning of multiple agents. Moreover, the greedy and Boltzmann probability selection policy is introduced for action selection and avoiding convergence to local extremum. To match radar and image sensors, a convolutional neural network - long short-term memory (CNN-LSTM) architecture is constructed to extract the feature of multi-source measurement as the input of the D3QN. The algorithm’s efficacy and reliability are validated in a simulated environment, utilizing robot operating system and Gazebo. The simulation results show that the proposed algorithm provides a real-time solution for path planning tasks in dynamic scenarios. In terms of the average success rate and accuracy, the proposed method is superior to other deep learning algorithms, and the convergence speed is also improved.

Key words: multi-agent, path planning, deep reinforcement learning, deep Q-network

摘要： 动态环境中多智能体路径规划问题一直是一个挑战，主要是由于障碍物位置的不断变化以及智能体之间复杂的相互作用。这些因素导致解决方案收敛速度慢，甚至在某些情况下完全发散。为了解决这个问题，提出了一种利用双重决斗深度Q网络（D3QN）的新方法，适用于动态多智能体和复杂环境。设计了一种基于多智能体位置约束的新奖励函数，并采纳了一种基于增量学习的训练策略，以实现多个智能体的协作路径规划。此外，为了避免收敛到局部极值，引入了贪婪和玻尔兹曼概率选择策略来进行行动选择。为了融合雷达和图像传感器数据，构建了卷积神经网络-长短时记忆（CNN-LSTM）架构，以提取多源测量的特征作为D3QN的输入。同时，在使用机器人操作系统和Gazebo的模拟环境中验证了算法的效能和可靠性。仿真结果显示，所提出的算法为动态场景中的路径规划任务提供了实时解决方案。在平均成功率和准确性方面，所提出的方法优于其他几种深度学习算法，而且收敛速度也得到了提升。

关键词: 多智能体，路径规划，深度强化学习，深度Q网络

CLC Number:

TP242.6

LI Shuyi (李舒逸), LI Minzhe (李旻哲), JING Zhongliang^∗ (敬忠良). Multi-Agent Path Planning Method Based on Improved Deep Q-Network in Dynamic Environments[J]. J Shanghai Jiaotong Univ Sci, 2024, 29(4): 601-612.

References

[1] ARADI S. Survey of deep reinforcement learning for motion planning of autonomous vehicles [J]. IEEE Transactions on Intelligent Transportation Systems, 2022, 23(2): 740-759.
[2] ZHOU W H, LIU Z H, LI J, et al. Multi-target tracking for unmanned aerial vehicle swarms using deep reinforcement learning [J]. Neurocomputing, 2021, 466: 285-297.
[3] HAN R H, CHEN S D, HAO Q. Cooperative multirobot navigation in dynamic environment with deep reinforcement learning [C]//2020 IEEE International Conference on Robotics and Automation. Paris: IEEE, 2020: 448-454.
[4] S′ANCHEZ-IB′A?NEZ J R, P′EREZ-DEL-PULGAR C J, GARC′IA-CEREZO A. Path planning for autonomous mobile robots: A review [J]. Sensors, 2021, 21(23): 7898.
[5] CHAE S W, SEO Y W, LEE K C. Task difficulty and team diversity on team creativity: Multi-agent simulation approach [J]. Computers in Human Behavior, 2015, 42: 83-92.
[6] MA H. Graph-based multi-robot path finding and planning [J]. Current Robotics Reports, 2022, 3(3): 77-84.
[7] POUDEL S, ARAFAT M Y, MOH S. Bio-inspired optimization-based path planning algorithms in unmanned aerial vehicles: A survey [J]. Sensors, 2023, 23(6): 3051.
[8] HUANG J, JI Z H, XIAO S, et al. Multi-agent vehicle formation control based on mpc and particle swarm optimization algorithm [C]//2022 IEEE 6th Information Technology and Mechatronics Engineering Conference. Chongqing: IEEE, 2022: 288-292.
[9] GAO J L, YE W J, GUO J, et al. Deep reinforcement learning for indoor mobile robot path planning [J]. Sensors, 2020, 20(19): 5493.
[10] PATLE B K, BABU L G, PANDEY A, et al. A review: On path planning strategies for navigation of mobile robot [J]. Defence Technology, 2019, 15(4): 582-606.
[11] SALAMAT B, TONELLO A M. A modelling approach to generate representative UAV trajectories using PSO [C]//2019 27th European Signal Processing Conference. A Coruna: IEEE, 2019: 1-5.
[12] BATTOCLETTI G, URBAN R, GODIO S, et al. RLbased path planning for autonomous aerial vehicles in unknown environments [C]//AIAA AVIATION 2021 FORUM. Online: AIAA, 2021: 3016.
[13] ZHU K, ZHANG T. Deep reinforcement learning based mobile robot navigation: A review [J]. Tsinghua Science and Technology, 2021, 26(5): 674-691.
[14] GARAFFA L C, BASSO M, KONZEN A A, et al. Reinforcement learning for mobile robotics exploration: A survey [J]. IEEE Transactions on Neural Networks and Learning Systems, 2023, 34(8): 3796-3810.
[15] LIU F, CHEN C, LI Z H, et al. Research on path planning of robot based on deep reinforcement learning [C]//2020 39th Chinese Control Conference. Shenyang: IEEE, 2020: 3730-3734.
[16] YAN C, XIANG X J, WANG C. Towards real-time path planning through deep reinforcement learning for a UAV in dynamic environments [J]. Journal of Intelligent & Robotic Systems, 2020, 98(2): 297-309.
[17] RUAN X G, LIN C L, HUANG J, et al. Obstacle avoidance navigation method for robot based on deep reinforcement learning [C]//2022 IEEE 6th Information Technology and Mechatronics Engineering Conference. Chongqing: IEEE, 2022: 1633-1637.
[18] HU Z W, CONG S C, SONG T K, et al. AirScope: Mobile robots-assisted cooperative indoor air quality sensing by distributed deep reinforcement learning [J].IEEE Internet of Things Journal, 2020, 7(9): 9189-9200.
[19] MNIH V, KAVUKCUOGLU K, SILVER D, et al. Playing Atari with deep reinforcement learning [DB/OL]. (2013-12-19). http://arxiv.org/abs/1312.5602
[20] VAN HASSELT H, GUEZ A, SILVER D. Deep reinforcement learning with double Q-Learning [C]//Thirtieth AAAI Conference on Artificial Intelligence. Phoenix: ACM, 2016: 2094-2100.
[21] SEWAK M. Deep Q Network (DQN), Double DQN, and Dueling DQN: A step towards general artificial intelligence [M]//Deep reinforcement learning: Frontiers of artificial intelligence. Singapore: Springer, 2019: 95-108.
[22] PENG B Y, SUN Q, LI S E, et al. End-to-end autonomous driving through dueling double deep Qnetwork [J]. Automotive Innovation, 2021, 4(3): 328-337.
[23] SCHAUL T, QUAN J, ANTONOGLOU I, et al. Prioritized experience replay [DB/OL]. (2015-11-18). http://arxiv.org/abs/1511.05952
[24] CHAUHAN R, GHANSHALA K K, JOSHI R C. Convolutional neural network (CNN) for image detection and recognition [C]//2018 First International Conference on Secure Cyber Computing and Communication. Jalandhar: IEEE, 2018: 278-282.
[25] MEGALINGAM R K, R A, HEMATEJAANIRUDHBABU D, et al. Implementation of a Person Following Robot in ROS-gazebo platform [C]//2022 International Conference for Advancement in Technology. Goa: IEEE, 2022: 1-5.