A Double Competitive Scheme Based Learning Automata Algorithm

doi:10.16183/j.cnki.jsjtu.2018.10.018

Abstract

Abstract: Learning automaton (LA) plays an essential part in the theoretical system of reinforcement learning. Not only does LA have important theoretical value, but also has extensive practical value, including many applications in security field. As classic family of LA, estimator algorithms have improved the performance of algorithms to a novel level. However, because of the limitation of estimators, the estimated values are not reliable at the beginning of learning process, which will lead to wrong updates of actions’ selection probability vector. Using current update strategy, the offset of wrong updates only relies on large number of extra iterations. Thus, there is still room for enhancement of convergence efficiency. In this paper, for improving the update strategy of probability vector, we proposed a double competitive scheme based learning automata algorithm and proved its ε-optimality. The simulation results showed that the proposed algorithms reached the highest convergence efficiency and confirmed the validity of proposed probability vector’s update strategy.

Key words: learning automata, reinforcement learning, stationary environments, estimator algorithms, double competitive scheme

CLC Number:

TP 181

DI Chong,QI Kaiyue,WU Yue,SU Yu,LI Shenghong. A Double Competitive Scheme Based Learning Automata Algorithm[J]. Journal of Shanghai Jiao Tong University, 2018, 52(10): 1292-1297.

References

［1］NARENDRA K S, THATHACHAR M A L. Learning automata-a survey［J］. IEEE Transactions on Systems, Man, and Cybernetics, 1974, (4): 323-334. ［2］KAELBLING L P, LITTMAN M L, MOORE A W. Reinforcement learning: A survey［J］. Journal of Artificial Intelligence Research, 1996, (4): 237-285. ［3］郝圣, 张沪寅, 宋梦凯.基于学习自动机理论与稳定性控制的自适应移动无线AdHoc网络分簇策略［J/OL］. 计算机学报, 2017, 40: 1-20. HAO Sheng, ZHANG Huyin, SONG Mengkai. An adaptive clustering strategy for MANET based on learning automata theory and stability control［J/OL］. Chinese Journal of Computers, 2017, 40: 1-20. ［4］王建平, 陈改霞, 孔德川, 等.一种基于学习自动机的WSN区域覆盖算法［J］.数据采集与处理, 2014, 29(6): 1016-1022. WANG Jianping, CHEN Gaixia, KONG Dechuan, et al. Learning automata-based area coverage algorithm for wireless sensor networks［J］. Journal of Data Acquisition and Processing, 2014, 29(6): 1016-1022. ［5］JIAO Lei, ZHANG Xuan, OOMMEN B J, et al. Optimizing channel selection for cognitive radio networks using a distributed Bayesian learning automata-based approach［J］. Applied Intelligence, 2016, 44(2): 307-321. ［6］REZVANIAN A, SAGHIRI A M, VAHIDIPOUR S M, et al. Learning automata for wireless sensor networks［M］. Cham: Springer, 2018: 91-219. ［7］DI C, SU Y, HAN Z, et al. Learning automata based SVM for intrusion detection［C］//International Conference in Communications, Signal Processing, and Systems. Singapore: Springer, 2017: 2067-2074. ［8］ZHANG Y, DI C, HAN Z, et al. An adaptive honeypot deployment algorithm based on learning automata［C］//2017 IEEE Second International Conference on Data Science in Cyberspace (DSC). Piscataway NJ: ［s.n.］, 2017: 521-527. ［9］GE H, HUANG J, DI C, et al. Learning automata based approach for influence maximization problem on social networks［C］//2017 IEEE Second International Conference on Data Science in Cyberspace (DSC). Piscataway, NJ: ［s.n.］, 2017: 108-117. ［10］TSETLIN M L. On behaviour of finite automata in random medium［J］. Avtomatika i Telemekhanika, 1961 (22): 1345-1354. ［11］VARSHAVSKII V I, VORONTSOVA I P. On the behavior of stochastic automata with a variable structure［J］. Avtomatika i Telemekhanika, 1963, 24(3): 353-360. ［12］OOMMEN B J, HANSEN E. The asymptotic optimality of discretized linear reward-inaction learning automata［J］. IEEE Transactions on Systems, Man, and Cybernetics, 1984 (3): 542-545. ［13］THATHACHAR M A L. A class of rapidly converging algorithms for learning automata［C］//IEEE International Conference on Cybernetics and Society. Piscataway, NJ: ［s.n.］, 1984: 602-606. ［14］OOMMEN B J, LANCTT J K. Discretized pursuit learning automata［J］. IEEE Transactions on Systems, Man, and Cybernetics, 1990, 20(4): 931-938. ［15］AGACHE M, OOMMEN B J. Generalized pursuit learning schemes: New families of continuous and discretized learning automata［J］. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 2002, 32(6): 738-749. ［16］PAPADIMITRIOU G I, SKLIRA M, POMPORTSIS A S. A new class of/spl epsi/-optimal learning automata［J］. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 2004, 34(1): 246-254. ［17］GE H, JIANG W, LI S, et al. A novel estimator based learning automata algorithm［J］. Applied Intelligence, 2015, 42(2): 262-275.

[1]	ZHAO Yingying, QIU Yue, ZHU Tianchen, LI Fan, SU Yun, TAI Zhenying, SUN Qingyun, FAN Hang. Online Steady-State Scheduling of New Power Systems Based on Hierarchical Reinforcement Learning [J]. Journal of Shanghai Jiao Tong University, 2025, 59(3): 400-412.
[2]	YANG Yinghe, WEI Handi, FAN Dixia, LI Ang. Optimization Method of Underwater Flapping Foil Propulsion Performance Based on Gaussian Process Regression and Deep Reinforcement Learning [J]. Journal of Shanghai Jiao Tong University, 2025, 59(1): 70-78.
[3]	LIU Huahua, WANG Qing. Multi-Aircraft Target Assignment Method Based on Reinforcement Learning [J]. Air & Space Defense, 2024, 7(5): 65-72.
[4]	ZHOU Yi, ZHOU Liangcai, SHI Di, ZHAO Xiaoying, SHAN Xin. Coordinated Active Power-Frequency Control Based on Safe Deep Reinforcement Learning [J]. Journal of Shanghai Jiao Tong University, 2024, 58(5): 682-692.
[5]	DONG Yubo¹ (董玉博), CUI Tao¹ (崔涛), ZHOU Yufan¹ (周禹帆), SONG Xun² (宋勋), ZHU Yue² (祝月), DONG Peng^1∗ (董鹏). Reward Function Design Method for Long Episode Pursuit Tasks Under Polar Coordinate in Multi-Agent Reinforcement Learning [J]. J Shanghai Jiaotong Univ Sci, 2024, 29(4): 646-655.
[6]	LI Shuyi (李舒逸), LI Minzhe (李旻哲), JING Zhongliang^∗ (敬忠良). Multi-Agent Path Planning Method Based on Improved Deep Q-Network in Dynamic Environments [J]. J Shanghai Jiaotong Univ Sci, 2024, 29(4): 601-612.
[7]	ZHAO Yingce(赵英策), ZHANG Guanghao(张广浩), XING Zhengyu(邢正宇), LI Jianxun(李建勋). Hierarchical Reinforcement Learning Adversarial Algorithm Against Opponent with Fixed Offensive Strategy [J]. J Shanghai Jiaotong Univ Sci, 2024, 29(3): 471-479.
[8]	MIAO Zhenhua(苗镇华), HUANG Wentao(黄文焘), ZHANG Yilian(张依恋), FAN Qinqin(范勤勤). Multi-Robot Task Allocation Using Multimodal Multi-Objective Evolutionary Algorithm Based on Deep Reinforcement Learning [J]. J Shanghai Jiaotong Univ Sci, 2024, 29(3): 377-387.
[9]	QUAN Jiale, MA Xianlong, SHEN Yuheng. Multi-agent Formation Method Based on Dynamic Optimization of Proximal Policies [J]. Air & Space Defense, 2024, 7(2): 52-62.
[10]	ZHANG Weizhen, HE Zhen, TANG Zhangfan. Reinforcement Learning Control Design for Perching Maneuver of Unmanned Aerial Vehicles with Wind Disturbances [J]. Journal of Shanghai Jiao Tong University, 2024, 58(11): 1753-1761.
[11]	WANG Xu, CAI Yuanli, ZHANG Xuecheng, ZHANG Rongliang, HAN Chenglong. Intercept Guidance Law with a Low Acceleration Ratio Based on Hierarchical Reinforcement Learning [J]. Air & Space Defense, 2024, 7(1): 40-47.
[12]	GUO Jianguo, HU Guanjie, XU Xinpeng, LIU Yue, CAO Jin. Reinforcement Learning-Based Target Assignment Method for Many-to-Many Interceptions [J]. Air & Space Defense, 2024, 7(1): 24-31.
[13]	MA Chi, ZHANG Guoqun, SUN Junge, LYU Guangzhe, ZHANG Tao. Deep Reinforcement Learning-Based Reconfiguration Method for Integrated Electronic Systems [J]. Air & Space Defense, 2024, 7(1): 63-70.
[14]	LI Mengxuan, GUO Jianguo, XU Xinpeng, SHEN Yuheng. Guidance Law Based on Proximal Policy Optimization [J]. Air & Space Defense, 2023, 6(4): 51-57.
[15]	SUN Jie, LI Zihao, ZHANG Shuyu. Application of Machine Learning in Chemical Synthesis and Characterization [J]. Journal of Shanghai Jiao Tong University, 2023, 57(10): 1231-1244.