[1] ZHANG J W, HUANG S C, HAN C C. Analysis of trajectory simulation of proportional guidance based on Matlab [J]. Tactical Missile Technology, 2009(3): 60-64 (in Chinese).
[2] ZHAO W C, NA L, JIN X Y. Research and realization of quasi-parallel approaching method [J]. Measurement & Control Technology, 2009, 28(3): 92-95 (in Chinese).
[3] ZENG J, MOU J, LIU Y. Lightweight issues of swarm intelligence based multi-agent game strategy [J]. Journal of Command and Control, 2020, 6(4): 381-387 (in Chinese).
[4] SCHULMAN J, WOLSKI F, DHARIWAL P, et al. Proximal policy optimization algorithms [DB/OL]. (2017-08-28) [2021-10-25]. https://arxiv.org/abs/1707.06347.
[5] LILLICRAP T P, HUNT J J, PRITZEL A, et al. Continuous control with deep reinforcement learning [DB/OL]. (2019-07-05) [2021-10-25]. https:// arxiv.org/abs/ 1509.02971.
[6] FUJIMOTO S, VAN HOOF H, MEGER D. Addressing function approximation error in actor-critic methods [C]//35th International Conference on Machine Learning. Stockholm: IMLS, 2018: 1587-1596.
[7] HAARNOJA T, ZHOU A, ABBEEL P, et al. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor [C]//35th International Conference on Machine Learning. Stockholm: IMLS, 2018: 1861-1870.
[8] LANGE S, RIEDMILLER M. Deep auto-encoder neural networks in reinforcement learning [C]//The 2010 International Joint Conference on Neural Networks. Barcelona: IEEE, 2010: 1-8.
[9] ABTAHI F, ZHU Z G, BURRY A M. A deep reinforcement learning approach to character segmentation of license plate images [C]//2015 14th IAPR International Conference on Machine Vision Applications. Tokyo: IEEE, 2015: 539-542.
[10] LANGE S, RIEDMILLER M, VOIGTL¨ANDER A. Autonomous reinforcement learning on raw visual input data in a real world application [C]//The 2012 International Joint Conference on Neural Networks. Brisbane: IEEE, 2012: 1-8.
[11] NACHUM O, GU S S, LEE H, et al. Data-efficient hierarchical reinforcement learning [C]//32nd Conference on Neural Information Processing Systems. Montr′eal: NIPS, 2018: 1-11.
[12] LOWE R, WU Y, TAMAR A, et al. Multi-agent actorcritic for mixed cooperative-competitive environments [C]//31st Conference on Neural Information Processing Systems. Long Beach: NIPS, 2017: 1-12.
[13] FOERSTER J, FARQUHAR G, AFOURAS T, et al. Counterfactual multi-agent policy gradients [J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2018, 32(1): 2974-2982.
[14] DIETTERICH T G. The MAXQ method for hierarchical reinforcement learning [C]//15th International Conference on Machine Learning. Madison: IMLS
1998: 118-126.
[15] KULKARNI T D, NARASIMHAN K R, SAEEDI A, et al. Hierarchical deep reinforcement learning: Integrating temporal abstraction and intrinsic motivation [C]//29th Conference on Neural Information Processing Systems. Barcelona: NIPS, 2016: 1-9.
[16] SUTTON R S, PRECUP D, SINGH S. Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning [J]. Artificial Intelligence, 1999, 112(1/2): 181-211.
[17] BACON P L, HARB J, PRECUP D. The option-critic architecture [J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2017, 31(1): 1726-1734.
[18] LEVY A, KONIDARIS G, PLATT R, et al. Learning multi-level hierarchies with hindsight [DB/OL]. (2019-09-03) [2021-10-25]. https://arxiv.org/abs/1712.00948.
|