J Shanghai Jiaotong Univ Sci ›› 2024, Vol. 29 ›› Issue (4): 601-612.doi: 10.1007/s12204-024-2732-1
所属专题: 智能机器人
李舒逸,李旻哲,敬忠良*
接受日期:2023-10-12
出版日期:2024-07-14
发布日期:2024-07-14
LI Shuyi (李舒逸), LI Minzhe (李旻哲), JING Zhongliang∗ (敬忠良)
Accepted:2023-10-12
Online:2024-07-14
Published:2024-07-14
摘要: 动态环境中多智能体路径规划问题一直是一个挑战,主要是由于障碍物位置的不断变化以及智能体之间复杂的相互作用。这些因素导致解决方案收敛速度慢,甚至在某些情况下完全发散。为了解决这个问题,提出了一种利用双重决斗深度Q网络(D3QN)的新方法,适用于动态多智能体和复杂环境。设计了一种基于多智能体位置约束的新奖励函数,并采纳了一种基于增量学习的训练策略,以实现多个智能体的协作路径规划。此外,为了避免收敛到局部极值,引入了贪婪和玻尔兹曼概率选择策略来进行行动选择。为了融合雷达和图像传感器数据,构建了卷积神经网络-长短时记忆(CNN-LSTM)架构,以提取多源测量的特征作为D3QN的输入。同时,在使用机器人操作系统和Gazebo的模拟环境中验证了算法的效能和可靠性。仿真结果显示,所提出的算法为动态场景中的路径规划任务提供了实时解决方案。在平均成功率和准确性方面,所提出的方法优于其他几种深度学习算法,而且收敛速度也得到了提升。
中图分类号:
李舒逸, 李旻哲, 敬忠良. 动态环境下基于改进DQN的多智能体路径规划方法[J]. J Shanghai Jiaotong Univ Sci, 2024, 29(4): 601-612.
LI Shuyi (李舒逸), LI Minzhe (李旻哲), JING Zhongliang∗ (敬忠良). Multi-Agent Path Planning Method Based on Improved Deep Q-Network in Dynamic Environments[J]. J Shanghai Jiaotong Univ Sci, 2024, 29(4): 601-612.
| [1] ARADI S. Survey of deep reinforcement learning for motion planning of autonomous vehicles [J]. IEEE Transactions on Intelligent Transportation Systems, 2022, 23(2): 740-759. [2] ZHOU W H, LIU Z H, LI J, et al. Multi-target tracking for unmanned aerial vehicle swarms using deep reinforcement learning [J]. Neurocomputing, 2021, 466: 285-297. [3] HAN R H, CHEN S D, HAO Q. Cooperative multirobot navigation in dynamic environment with deep reinforcement learning [C]//2020 IEEE International Conference on Robotics and Automation. Paris: IEEE, 2020: 448-454. [4] S′ANCHEZ-IB′A?NEZ J R, P′EREZ-DEL-PULGAR C J, GARC′IA-CEREZO A. Path planning for autonomous mobile robots: A review [J]. Sensors, 2021, 21(23): 7898. [5] CHAE S W, SEO Y W, LEE K C. Task difficulty and team diversity on team creativity: Multi-agent simulation approach [J]. Computers in Human Behavior, 2015, 42: 83-92. [6] MA H. Graph-based multi-robot path finding and planning [J]. Current Robotics Reports, 2022, 3(3): 77-84. [7] POUDEL S, ARAFAT M Y, MOH S. Bio-inspired optimization-based path planning algorithms in unmanned aerial vehicles: A survey [J]. Sensors, 2023, 23(6): 3051. [8] HUANG J, JI Z H, XIAO S, et al. Multi-agent vehicle formation control based on mpc and particle swarm optimization algorithm [C]//2022 IEEE 6th Information Technology and Mechatronics Engineering Conference. Chongqing: IEEE, 2022: 288-292. [9] GAO J L, YE W J, GUO J, et al. Deep reinforcement learning for indoor mobile robot path planning [J]. Sensors, 2020, 20(19): 5493. [10] PATLE B K, BABU L G, PANDEY A, et al. A review: On path planning strategies for navigation of mobile robot [J]. Defence Technology, 2019, 15(4): 582-606. [11] SALAMAT B, TONELLO A M. A modelling approach to generate representative UAV trajectories using PSO [C]//2019 27th European Signal Processing Conference. A Coruna: IEEE, 2019: 1-5. [12] BATTOCLETTI G, URBAN R, GODIO S, et al. RLbased path planning for autonomous aerial vehicles in unknown environments [C]//AIAA AVIATION 2021 FORUM. Online: AIAA, 2021: 3016. [13] ZHU K, ZHANG T. Deep reinforcement learning based mobile robot navigation: A review [J]. Tsinghua Science and Technology, 2021, 26(5): 674-691. [14] GARAFFA L C, BASSO M, KONZEN A A, et al. Reinforcement learning for mobile robotics exploration: A survey [J]. IEEE Transactions on Neural Networks and Learning Systems, 2023, 34(8): 3796-3810. [15] LIU F, CHEN C, LI Z H, et al. Research on path planning of robot based on deep reinforcement learning [C]//2020 39th Chinese Control Conference. Shenyang: IEEE, 2020: 3730-3734. [16] YAN C, XIANG X J, WANG C. Towards real-time path planning through deep reinforcement learning for a UAV in dynamic environments [J]. Journal of Intelligent & Robotic Systems, 2020, 98(2): 297-309. [17] RUAN X G, LIN C L, HUANG J, et al. Obstacle avoidance navigation method for robot based on deep reinforcement learning [C]//2022 IEEE 6th Information Technology and Mechatronics Engineering Conference. Chongqing: IEEE, 2022: 1633-1637. [18] HU Z W, CONG S C, SONG T K, et al. AirScope: Mobile robots-assisted cooperative indoor air quality sensing by distributed deep reinforcement learning [J].IEEE Internet of Things Journal, 2020, 7(9): 9189-9200. [19] MNIH V, KAVUKCUOGLU K, SILVER D, et al. Playing Atari with deep reinforcement learning [DB/OL]. (2013-12-19). http://arxiv.org/abs/1312.5602 [20] VAN HASSELT H, GUEZ A, SILVER D. Deep reinforcement learning with double Q-Learning [C]//Thirtieth AAAI Conference on Artificial Intelligence. Phoenix: ACM, 2016: 2094-2100. [21] SEWAK M. Deep Q Network (DQN), Double DQN, and Dueling DQN: A step towards general artificial intelligence [M]//Deep reinforcement learning: Frontiers of artificial intelligence. Singapore: Springer, 2019: 95-108. [22] PENG B Y, SUN Q, LI S E, et al. End-to-end autonomous driving through dueling double deep Qnetwork [J]. Automotive Innovation, 2021, 4(3): 328-337. [23] SCHAUL T, QUAN J, ANTONOGLOU I, et al. Prioritized experience replay [DB/OL]. (2015-11-18). http://arxiv.org/abs/1511.05952 [24] CHAUHAN R, GHANSHALA K K, JOSHI R C. Convolutional neural network (CNN) for image detection and recognition [C]//2018 First International Conference on Secure Cyber Computing and Communication. Jalandhar: IEEE, 2018: 278-282. [25] MEGALINGAM R K, R A, HEMATEJAANIRUDHBABU D, et al. Implementation of a Person Following Robot in ROS-gazebo platform [C]//2022 International Conference for Advancement in Technology. Goa: IEEE, 2022: 1-5. |
| [1] | 袁景美, 赵亮, 孙卓然, 徐志朝, 牛亚雷. 基于深度强化学习的导航信号自适应干扰决策方法[J]. 空天防御, 2026, 9(2): 41-52. |
| [2] | . 触觉辅助导航车辆:增强盲区和透明物体场景中的障碍物检测[J]. J Shanghai Jiaotong Univ Sci, 2026, 31(1): 167-175. |
| [3] | . 融合鸟瞰图特征的模仿与强化学习自动驾驶规划方法[J]. J Shanghai Jiaotong Univ Sci, 2026, 31(1): 154-166. |
| [4] | . 基于多智能体强化学习的无人艇集群协同围捕[J]. J Shanghai Jiaotong Univ Sci, 2026, 31(1): 187-194. |
| [5] | 陈实, 杨林森, 刘艺洪, 罗欢, 臧天磊, 周步祥. 小样本数据驱动模式下的新建微电网优化调度策略[J]. 上海交通大学学报, 2025, 59(6): 732-745. |
| [6] | 叶骐畅, 万士正, 李粤蜀, 陈竹梅, 刘尚麟. 基于多算法框架自适应层级共享的海上联合防空多智能体训练研究[J]. 空天防御, 2025, 8(6): 121-128. |
| [7] | 王志博, 呼卫军, 马先龙, 全家乐, 周皓宇. 感知驱动控制的无人机拦截碰撞技术[J]. 空天防御, 2025, 8(4): 78-84. |
| [8] | . 血管介入手术路径规划及三维视觉导航[J]. J Shanghai Jiaotong Univ Sci, 2025, 30(3): 472-481. |
| [9] | 李奕佳, 李嘉诺, 柯良军. 基于强化学习的无人机协作防守策略设计与验证[J]. 空天防御, 2025, 8(3): 73-85. |
| [10] | 周文杰, 付昱龙, 郭相科, 戚玉涛, 张海宾. 基于博弈树与数字平行战场的空战决策方法[J]. 空天防御, 2025, 8(3): 50-58. |
| [11] | 刘雁行, 乔如妤, 梁楠, 陈宇, 于凯, 吴汉霄. 基于负荷准线和深度强化学习的含电动汽车集群系统新能源消纳策略[J]. 上海交通大学学报, 2025, 59(10): 1464-1475. |
| [12] | 董德金, 王常成, 蔡云泽. 基于改进多目标进化算法的栅格地图路径规划[J]. 上海交通大学学报, 2025, 59(10): 1558-1567. |
| [13] | 杨映荷, 魏汉迪, 范迪夏, 李昂. 基于高斯过程回归和深度强化学习的水下扑翼推进性能寻优方法[J]. 上海交通大学学报, 2025, 59(1): 70-78. |
| [14] | 何通, 韦亚利, 卢青, 毕千. 无人机群协同侦察多点目标路径规划与控制[J]. 空天防御, 2025, 8(1): 31-40. |
| [15] | 白文超, 班明飞, 宋梦, 夏世威, 李知艺, 宋文龙. 电动汽车-无人机联合救援系统协调调度模型[J]. 上海交通大学学报, 2024, 58(9): 1443-1453. |
| 阅读次数 | ||||||
|
全文 |
|
|||||
|
摘要 |
|
|||||