J Shanghai Jiaotong Univ Sci ›› 2024, Vol. 29 ›› Issue (4): 601-612.doi: 10.1007/s12204-024-2732-1
• • 下一篇
李舒逸,李旻哲,敬忠良*
接受日期:
2023-10-12
出版日期:
2024-07-14
发布日期:
2024-07-14
LI Shuyi (李舒逸), LI Minzhe (李旻哲), JING Zhongliang∗ (敬忠良)
Accepted:
2023-10-12
Online:
2024-07-14
Published:
2024-07-14
摘要: 动态环境中多智能体路径规划问题一直是一个挑战,主要是由于障碍物位置的不断变化以及智能体之间复杂的相互作用。这些因素导致解决方案收敛速度慢,甚至在某些情况下完全发散。为了解决这个问题,提出了一种利用双重决斗深度Q网络(D3QN)的新方法,适用于动态多智能体和复杂环境。设计了一种基于多智能体位置约束的新奖励函数,并采纳了一种基于增量学习的训练策略,以实现多个智能体的协作路径规划。此外,为了避免收敛到局部极值,引入了贪婪和玻尔兹曼概率选择策略来进行行动选择。为了融合雷达和图像传感器数据,构建了卷积神经网络-长短时记忆(CNN-LSTM)架构,以提取多源测量的特征作为D3QN的输入。同时,在使用机器人操作系统和Gazebo的模拟环境中验证了算法的效能和可靠性。仿真结果显示,所提出的算法为动态场景中的路径规划任务提供了实时解决方案。在平均成功率和准确性方面,所提出的方法优于其他几种深度学习算法,而且收敛速度也得到了提升。
中图分类号:
李舒逸, 李旻哲, 敬忠良. 动态环境下基于改进DQN的多智能体路径规划方法[J]. J Shanghai Jiaotong Univ Sci, 2024, 29(4): 601-612.
LI Shuyi (李舒逸), LI Minzhe (李旻哲), JING Zhongliang∗ (敬忠良). Multi-Agent Path Planning Method Based on Improved Deep Q-Network in Dynamic Environments[J]. J Shanghai Jiaotong Univ Sci, 2024, 29(4): 601-612.
[1] ARADI S. Survey of deep reinforcement learning for motion planning of autonomous vehicles [J]. IEEE Transactions on Intelligent Transportation Systems, 2022, 23(2): 740-759. [2] ZHOU W H, LIU Z H, LI J, et al. Multi-target tracking for unmanned aerial vehicle swarms using deep reinforcement learning [J]. Neurocomputing, 2021, 466: 285-297. [3] HAN R H, CHEN S D, HAO Q. Cooperative multirobot navigation in dynamic environment with deep reinforcement learning [C]//2020 IEEE International Conference on Robotics and Automation. Paris: IEEE, 2020: 448-454. [4] S′ANCHEZ-IB′A?NEZ J R, P′EREZ-DEL-PULGAR C J, GARC′IA-CEREZO A. Path planning for autonomous mobile robots: A review [J]. Sensors, 2021, 21(23): 7898. [5] CHAE S W, SEO Y W, LEE K C. Task difficulty and team diversity on team creativity: Multi-agent simulation approach [J]. Computers in Human Behavior, 2015, 42: 83-92. [6] MA H. Graph-based multi-robot path finding and planning [J]. Current Robotics Reports, 2022, 3(3): 77-84. [7] POUDEL S, ARAFAT M Y, MOH S. Bio-inspired optimization-based path planning algorithms in unmanned aerial vehicles: A survey [J]. Sensors, 2023, 23(6): 3051. [8] HUANG J, JI Z H, XIAO S, et al. Multi-agent vehicle formation control based on mpc and particle swarm optimization algorithm [C]//2022 IEEE 6th Information Technology and Mechatronics Engineering Conference. Chongqing: IEEE, 2022: 288-292. [9] GAO J L, YE W J, GUO J, et al. Deep reinforcement learning for indoor mobile robot path planning [J]. Sensors, 2020, 20(19): 5493. [10] PATLE B K, BABU L G, PANDEY A, et al. A review: On path planning strategies for navigation of mobile robot [J]. Defence Technology, 2019, 15(4): 582-606. [11] SALAMAT B, TONELLO A M. A modelling approach to generate representative UAV trajectories using PSO [C]//2019 27th European Signal Processing Conference. A Coruna: IEEE, 2019: 1-5. [12] BATTOCLETTI G, URBAN R, GODIO S, et al. RLbased path planning for autonomous aerial vehicles in unknown environments [C]//AIAA AVIATION 2021 FORUM. Online: AIAA, 2021: 3016. [13] ZHU K, ZHANG T. Deep reinforcement learning based mobile robot navigation: A review [J]. Tsinghua Science and Technology, 2021, 26(5): 674-691. [14] GARAFFA L C, BASSO M, KONZEN A A, et al. Reinforcement learning for mobile robotics exploration: A survey [J]. IEEE Transactions on Neural Networks and Learning Systems, 2023, 34(8): 3796-3810. [15] LIU F, CHEN C, LI Z H, et al. Research on path planning of robot based on deep reinforcement learning [C]//2020 39th Chinese Control Conference. Shenyang: IEEE, 2020: 3730-3734. [16] YAN C, XIANG X J, WANG C. Towards real-time path planning through deep reinforcement learning for a UAV in dynamic environments [J]. Journal of Intelligent & Robotic Systems, 2020, 98(2): 297-309. [17] RUAN X G, LIN C L, HUANG J, et al. Obstacle avoidance navigation method for robot based on deep reinforcement learning [C]//2022 IEEE 6th Information Technology and Mechatronics Engineering Conference. Chongqing: IEEE, 2022: 1633-1637. [18] HU Z W, CONG S C, SONG T K, et al. AirScope: Mobile robots-assisted cooperative indoor air quality sensing by distributed deep reinforcement learning [J].IEEE Internet of Things Journal, 2020, 7(9): 9189-9200. [19] MNIH V, KAVUKCUOGLU K, SILVER D, et al. Playing Atari with deep reinforcement learning [DB/OL]. (2013-12-19). http://arxiv.org/abs/1312.5602 [20] VAN HASSELT H, GUEZ A, SILVER D. Deep reinforcement learning with double Q-Learning [C]//Thirtieth AAAI Conference on Artificial Intelligence. Phoenix: ACM, 2016: 2094-2100. [21] SEWAK M. Deep Q Network (DQN), Double DQN, and Dueling DQN: A step towards general artificial intelligence [M]//Deep reinforcement learning: Frontiers of artificial intelligence. Singapore: Springer, 2019: 95-108. [22] PENG B Y, SUN Q, LI S E, et al. End-to-end autonomous driving through dueling double deep Qnetwork [J]. Automotive Innovation, 2021, 4(3): 328-337. [23] SCHAUL T, QUAN J, ANTONOGLOU I, et al. Prioritized experience replay [DB/OL]. (2015-11-18). http://arxiv.org/abs/1511.05952 [24] CHAUHAN R, GHANSHALA K K, JOSHI R C. Convolutional neural network (CNN) for image detection and recognition [C]//2018 First International Conference on Secure Cyber Computing and Communication. Jalandhar: IEEE, 2018: 278-282. [25] MEGALINGAM R K, R A, HEMATEJAANIRUDHBABU D, et al. Implementation of a Person Following Robot in ROS-gazebo platform [C]//2022 International Conference for Advancement in Technology. Goa: IEEE, 2022: 1-5. |
[1] | 周毅, 周良才, 史迪, 赵小英, 闪鑫. 基于安全深度强化学习的电网有功频率协同优化控制[J]. 上海交通大学学报, 2024, 58(5): 682-692. |
[2] | 刘文倩, 单梁, 张伟龙, 刘成林, 马强. 复杂环境下基于改进Informed RRT*的无人机路径规划算法[J]. 上海交通大学学报, 2024, 58(4): 511-524. |
[3] | 董德金1,2,董诗音3,章露露1,2,蔡云泽1,2. 基于A-Star和DWA算法的野外环境路径规划[J]. J Shanghai Jiaotong Univ Sci, 2024, 29(4): 725-736. |
[4] | 杜海阔1,2, 郭正玉3,4, 章露露1,2, 蔡云泽1,2. 基于多目标松散同步搜索的多目标多智能体异步路径规划[J]. J Shanghai Jiaotong Univ Sci, 2024, 29(4): 667-677. |
[5] | 金飞宇,陈龙胜,李统帅,石童昕. 高阶MIMO非线性多智能体系统分布式协同抗干扰控制[J]. J Shanghai Jiaotong Univ Sci, 2024, 29(4): 656-666. |
[6] | 董玉博1, 崔涛1, 周禹帆1, 宋勋2, 祝月2, 董鹏1. 基于长周期极坐标系追击问题的多智能体强化学习奖赏函数设计方法[J]. J Shanghai Jiaotong Univ Sci, 2024, 29(4): 646-655. |
[7] | 耿宗盛1,赵东东1, 2,周兴文1,闫磊1, 阎石1, 2. 基于全分布式事件驱动控制的多智能体系统领导-跟随一致性研究[J]. J Shanghai Jiaotong Univ Sci, 2024, 29(4): 640-645. |
[8] | 邢优靖1,高金凤1,刘小平1, 2, 吴平1. 带有时延和切换拓扑的二阶非线性多智能体系统事件触发固定时间一致性[J]. J Shanghai Jiaotong Univ Sci, 2024, 29(4): 625-639. |
[9] | 吴治海,谢林柏. 异步自我感知功能失效下双积分多智能体系统的容错动态一致性[J]. J Shanghai Jiaotong Univ Sci, 2024, 29(4): 613-624. |
[10] | 苗镇华1, 黄文焘2, 张依恋3, 范勤勤1. 基于深度强化学习的多模态多目标多机器人任务分配算法[J]. J Shanghai Jiaotong Univ Sci, 2024, 29(3): 377-387. |
[11] | 黄山1,黄洪钟1,曾奇2. 一种四阶段的快速移动机器人局部轨迹规划方法[J]. J Shanghai Jiaotong Univ Sci, 2024, 29(3): 428-435. |
[12] | 全家乐, 马先龙, 沈昱恒. 基于近端策略动态优化的多智能体编队方法[J]. 空天防御, 2024, 7(2): 52-62. |
[13] | 郭建国, 胡冠杰, 许新鹏, 刘悦, 曹晋. 基于强化学习的多对多拦截目标分配方法[J]. 空天防御, 2024, 7(1): 24-31. |
[14] | 董德金, 范云锋, 蔡云泽. 一种具有必经点约束的非结构化环境路径规划方法[J]. 空天防御, 2024, 7(1): 71-80. |
[15] | 马驰, 张国群, 孙俊格, 吕广喆, 张涛. 基于深度强化学习的综合电子系统重构方法[J]. 空天防御, 2024, 7(1): 63-70. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||