Multi-Agent Path Planning Method Based on Improved Deep Q-Network in Dynamic Environments

李舒逸，李旻哲，敬忠良*

doi:10.1007/s12204-024-2732-1

Journal of Shanghai Jiaotong University(Science) >

2024 , Vol. 29 >Issue 4: 601 - 612

DOI: https://doi.org/10.1007/s12204-024-2732-1

Special Issue on Multi-Agent Collaborative Perception and Control

Multi-Agent Path Planning Method Based on Improved Deep Q-Network in Dynamic Environments

李舒逸，李旻哲，敬忠良*

Expand

(School of Aeronautics and Astronautics, Shanghai Jiao Tong University, Shanghai 200240, China)

Accepted date: 2023-10-12

Online published: 2024-07-28

Fold

Abstract

The multi-agent path planning problem presents significant challenges in dynamic environments, primarily due to the ever-changing positions of obstacles and the complex interactions between agents’ actions. These factors contribute to a tendency for the solution to converge slowly, and in some cases, diverge altogether. In addressing this issue, this paper introduces a novel approach utilizing a double dueling deep Q-network (D3QN), tailored for dynamic multi-agent environments. A novel reward function based on multi-agent positional constraints is designed, and a training strategy based on incremental learning is performed to achieve collaborative path planning of multiple agents. Moreover, the greedy and Boltzmann probability selection policy is introduced for action selection and avoiding convergence to local extremum. To match radar and image sensors, a convolutional neural network - long short-term memory (CNN-LSTM) architecture is constructed to extract the feature of multi-source measurement as the input of the D3QN. The algorithm’s efficacy and reliability are validated in a simulated environment, utilizing robot operating system and Gazebo. The simulation results show that the proposed algorithm provides a real-time solution for path planning tasks in dynamic scenarios. In terms of the average success rate and accuracy, the proposed method is superior to other deep learning algorithms, and the convergence speed is also improved.

Cite this article

李舒逸，李旻哲，敬忠良* . Multi-Agent Path Planning Method Based on Improved Deep Q-Network in Dynamic Environments[J]. Journal of Shanghai Jiaotong University(Science), 2024 , 29(4) : 601 -612 . DOI: 10.1007/s12204-024-2732-1

References

[1] ARADI S. Survey of deep reinforcement learning for motion planning of autonomous vehicles [J]. IEEE Transactions on Intelligent Transportation Systems, 2022, 23(2): 740-759.
[2] ZHOU W H, LIU Z H, LI J, et al. Multi-target tracking for unmanned aerial vehicle swarms using deep reinforcement learning [J]. Neurocomputing, 2021, 466: 285-297.
[3] HAN R H, CHEN S D, HAO Q. Cooperative multirobot navigation in dynamic environment with deep reinforcement learning [C]//2020 IEEE International Conference on Robotics and Automation. Paris: IEEE, 2020: 448-454.
[4] S′ANCHEZ-IB′A?NEZ J R, P′EREZ-DEL-PULGAR C J, GARC′IA-CEREZO A. Path planning for autonomous mobile robots: A review [J]. Sensors, 2021, 21(23): 7898.
[5] CHAE S W, SEO Y W, LEE K C. Task difficulty and team diversity on team creativity: Multi-agent simulation approach [J]. Computers in Human Behavior, 2015, 42: 83-92.
[6] MA H. Graph-based multi-robot path finding and planning [J]. Current Robotics Reports, 2022, 3(3): 77-84.
[7] POUDEL S, ARAFAT M Y, MOH S. Bio-inspired optimization-based path planning algorithms in unmanned aerial vehicles: A survey [J]. Sensors, 2023, 23(6): 3051.
[8] HUANG J, JI Z H, XIAO S, et al. Multi-agent vehicle formation control based on mpc and particle swarm optimization algorithm [C]//2022 IEEE 6th Information Technology and Mechatronics Engineering Conference. Chongqing: IEEE, 2022: 288-292.
[9] GAO J L, YE W J, GUO J, et al. Deep reinforcement learning for indoor mobile robot path planning [J]. Sensors, 2020, 20(19): 5493.
[10] PATLE B K, BABU L G, PANDEY A, et al. A review: On path planning strategies for navigation of mobile robot [J]. Defence Technology, 2019, 15(4): 582-606.
[11] SALAMAT B, TONELLO A M. A modelling approach to generate representative UAV trajectories using PSO [C]//2019 27th European Signal Processing Conference. A Coruna: IEEE, 2019: 1-5.
[12] BATTOCLETTI G, URBAN R, GODIO S, et al. RLbased path planning for autonomous aerial vehicles in unknown environments [C]//AIAA AVIATION 2021 FORUM. Online: AIAA, 2021: 3016.
[13] ZHU K, ZHANG T. Deep reinforcement learning based mobile robot navigation: A review [J]. Tsinghua Science and Technology, 2021, 26(5): 674-691.
[14] GARAFFA L C, BASSO M, KONZEN A A, et al. Reinforcement learning for mobile robotics exploration: A survey [J]. IEEE Transactions on Neural Networks and Learning Systems, 2023, 34(8): 3796-3810.
[15] LIU F, CHEN C, LI Z H, et al. Research on path planning of robot based on deep reinforcement learning [C]//2020 39th Chinese Control Conference. Shenyang: IEEE, 2020: 3730-3734.
[16] YAN C, XIANG X J, WANG C. Towards real-time path planning through deep reinforcement learning for a UAV in dynamic environments [J]. Journal of Intelligent & Robotic Systems, 2020, 98(2): 297-309.
[17] RUAN X G, LIN C L, HUANG J, et al. Obstacle avoidance navigation method for robot based on deep reinforcement learning [C]//2022 IEEE 6th Information Technology and Mechatronics Engineering Conference. Chongqing: IEEE, 2022: 1633-1637.
[18] HU Z W, CONG S C, SONG T K, et al. AirScope: Mobile robots-assisted cooperative indoor air quality sensing by distributed deep reinforcement learning [J].IEEE Internet of Things Journal, 2020, 7(9): 9189-9200.
[19] MNIH V, KAVUKCUOGLU K, SILVER D, et al. Playing Atari with deep reinforcement learning [DB/OL]. (2013-12-19). http://arxiv.org/abs/1312.5602
[20] VAN HASSELT H, GUEZ A, SILVER D. Deep reinforcement learning with double Q-Learning [C]//Thirtieth AAAI Conference on Artificial Intelligence. Phoenix: ACM, 2016: 2094-2100.
[21] SEWAK M. Deep Q Network (DQN), Double DQN, and Dueling DQN: A step towards general artificial intelligence [M]//Deep reinforcement learning: Frontiers of artificial intelligence. Singapore: Springer, 2019: 95-108.
[22] PENG B Y, SUN Q, LI S E, et al. End-to-end autonomous driving through dueling double deep Qnetwork [J]. Automotive Innovation, 2021, 4(3): 328-337.
[23] SCHAUL T, QUAN J, ANTONOGLOU I, et al. Prioritized experience replay [DB/OL]. (2015-11-18). http://arxiv.org/abs/1511.05952
[24] CHAUHAN R, GHANSHALA K K, JOSHI R C. Convolutional neural network (CNN) for image detection and recognition [C]//2018 First International Conference on Secure Cyber Computing and Communication. Jalandhar: IEEE, 2018: 278-282.
[25] MEGALINGAM R K, R A, HEMATEJAANIRUDHBABU D, et al. Implementation of a Person Following Robot in ROS-gazebo platform [C]//2022 International Conference for Advancement in Technology. Goa: IEEE, 2022: 1-5.

Options

Outlines

模态框（Modal）标题

Abstract

Cite this article

References