J Shanghai Jiaotong Univ Sci ›› 2024, Vol. 29 ›› Issue (3): 471-479.doi: 10.1007/s12204-023-2586-y
• Automation & Computer Technologies • Previous Articles Next Articles
ZHAO Yingce1 (赵英策), ZHANG Guanghao2 (张广浩), XING Zhengyu2 (邢正宇), LI Jianxun2∗ (李建勋)
Accepted:
2022-02-18
Online:
2024-05-28
Published:
2024-05-28
CLC Number:
ZHAO Yingce(赵英策), ZHANG Guanghao(张广浩), XING Zhengyu(邢正宇), LI Jianxun(李建勋). Hierarchical Reinforcement Learning Adversarial Algorithm Against Opponent with Fixed Offensive Strategy[J]. J Shanghai Jiaotong Univ Sci, 2024, 29(3): 471-479.
[1] ZHANG J W, HUANG S C, HAN C C. Analysis of trajectory simulation of proportional guidance based on Matlab [J]. Tactical Missile Technology, 2009(3): 60-64 (in Chinese). [2] ZHAO W C, NA L, JIN X Y. Research and realization of quasi-parallel approaching method [J]. Measurement & Control Technology, 2009, 28(3): 92-95 (in Chinese). [3] ZENG J, MOU J, LIU Y. Lightweight issues of swarm intelligence based multi-agent game strategy [J]. Journal of Command and Control, 2020, 6(4): 381-387 (in Chinese). [4] SCHULMAN J, WOLSKI F, DHARIWAL P, et al. Proximal policy optimization algorithms [DB/OL]. (2017-08-28) [2021-10-25]. https://arxiv.org/abs/1707.06347. [5] LILLICRAP T P, HUNT J J, PRITZEL A, et al. Continuous control with deep reinforcement learning [DB/OL]. (2019-07-05) [2021-10-25]. https:// arxiv.org/abs/ 1509.02971. [6] FUJIMOTO S, VAN HOOF H, MEGER D. Addressing function approximation error in actor-critic methods [C]//35th International Conference on Machine Learning. Stockholm: IMLS, 2018: 1587-1596. [7] HAARNOJA T, ZHOU A, ABBEEL P, et al. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor [C]//35th International Conference on Machine Learning. Stockholm: IMLS, 2018: 1861-1870. [8] LANGE S, RIEDMILLER M. Deep auto-encoder neural networks in reinforcement learning [C]//The 2010 International Joint Conference on Neural Networks. Barcelona: IEEE, 2010: 1-8. [9] ABTAHI F, ZHU Z G, BURRY A M. A deep reinforcement learning approach to character segmentation of license plate images [C]//2015 14th IAPR International Conference on Machine Vision Applications. Tokyo: IEEE, 2015: 539-542. [10] LANGE S, RIEDMILLER M, VOIGTL¨ANDER A. Autonomous reinforcement learning on raw visual input data in a real world application [C]//The 2012 International Joint Conference on Neural Networks. Brisbane: IEEE, 2012: 1-8. [11] NACHUM O, GU S S, LEE H, et al. Data-efficient hierarchical reinforcement learning [C]//32nd Conference on Neural Information Processing Systems. Montr′eal: NIPS, 2018: 1-11. [12] LOWE R, WU Y, TAMAR A, et al. Multi-agent actorcritic for mixed cooperative-competitive environments [C]//31st Conference on Neural Information Processing Systems. Long Beach: NIPS, 2017: 1-12. [13] FOERSTER J, FARQUHAR G, AFOURAS T, et al. Counterfactual multi-agent policy gradients [J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2018, 32(1): 2974-2982. [14] DIETTERICH T G. The MAXQ method for hierarchical reinforcement learning [C]//15th International Conference on Machine Learning. Madison: IMLS 1998: 118-126. [15] KULKARNI T D, NARASIMHAN K R, SAEEDI A, et al. Hierarchical deep reinforcement learning: Integrating temporal abstraction and intrinsic motivation [C]//29th Conference on Neural Information Processing Systems. Barcelona: NIPS, 2016: 1-9. [16] SUTTON R S, PRECUP D, SINGH S. Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning [J]. Artificial Intelligence, 1999, 112(1/2): 181-211. [17] BACON P L, HARB J, PRECUP D. The option-critic architecture [J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2017, 31(1): 1726-1734. [18] LEVY A, KONIDARIS G, PLATT R, et al. Learning multi-level hierarchies with hindsight [DB/OL]. (2019-09-03) [2021-10-25]. https://arxiv.org/abs/1712.00948. |
[1] | ZHAO Yanfei1,2,3(赵艳飞), XIAO Peng4 (肖鹏), WANG Jingchuan1,2,3* (王景川), GUO Rui4*(郭锐). Semi-Autonomous Navigation Based on Local Semantic Map for Mobile Robot [J]. J Shanghai Jiaotong Univ Sci, 2025, 30(1): 27-33. |
[2] | LI Shuyi (李舒逸), LI Minzhe (李旻哲), JING Zhongliang∗ (敬忠良). Multi-Agent Path Planning Method Based on Improved Deep Q-Network in Dynamic Environments [J]. J Shanghai Jiaotong Univ Sci, 2024, 29(4): 601-612. |
[3] | CAO Bingquan1,2,3 (曹炳全), HE Yuesheng1,2,3∗ (贺越生), ZHUANG Hanyang4 (庄瀚洋), YANG Ming1,2,3 (杨 明). Infrastructure-Based Vehicle Localization System for Indoor Parking Lot Using RGB-D Cameras [J]. J Shanghai Jiaotong Univ Sci, 2023, 28(1): 61-69. |
[4] | MAO Tianyang (茅天阳), ZHAO Wentao (赵文韬), WANG Jingchuan∗ (王景川), CHEN Weidong (陈卫东). Lidar-Visual-Inertial Odometry with Online Extrinsic Calibration [J]. J Shanghai Jiaotong Univ Sci, 2023, 28(1): 70-76. |
[5] | LÜ Qibing (吕其兵), LIU Tianyuan (刘天元), ZHANG Rong (张荣), JIANG Yanan (江亚南), XIAO Lei (肖雷), BAO Jingsong∗ (鲍劲松). Generation Approach of Human-Robot Cooperative Assembly Strategy Based on Transfer Learning [J]. J Shanghai Jiaotong Univ Sci, 2022, 27(5): 602-613. |
[6] | LIU Dasheng∗ (刘大生), YAN Guozheng (颜国正). Biomechanical Analysis of a Radial Expansion Mechanism of Intestinal Robot Coupling with Hyperelastic Intestinal Wall [J]. J Shanghai Jiaotong Univ Sci, 2022, 27(4): 552-560. |
[7] | LI Yanbiao∗ (李研彪), CHEN Ke (陈 科), SUN Peng (孙 鹏), WANG Zesheng (王泽胜). Dynamic Modeling and Performance Evaluation of a Novel Humanoid Ankle Joint [J]. J Shanghai Jiaotong Univ Sci, 2022, 27(4): 570-578. |
Viewed | ||||||||||||||||||||||||||||||||||||||||||||||||||
Full text 38
|
|
|||||||||||||||||||||||||||||||||||||||||||||||||
Abstract 167
|
|
|||||||||||||||||||||||||||||||||||||||||||||||||