Learning automaton (LA) plays an essential part in the theoretical system of reinforcement learning. Not only does LA have important theoretical value, but also has extensive practical value, including many applications in security field. As classic family of LA, estimator algorithms have improved the performance of algorithms to a novel level. However, because of the limitation of estimators, the estimated values are not reliable at the beginning of learning process, which will lead to wrong updates of actions’ selection probability vector. Using current update strategy, the offset of wrong updates only relies on large number of extra iterations. Thus, there is still room for enhancement of convergence efficiency. In this paper, for improving the update strategy of probability vector, we proposed a double competitive scheme based learning automata algorithm and proved its ε-optimality. The simulation results showed that the proposed algorithms reached the highest convergence efficiency and confirmed the validity of proposed probability vector’s update strategy.
DI Chong,QI Kaiyue,WU Yue,SU Yu,LI Shenghong
. A Double Competitive Scheme Based Learning Automata Algorithm[J]. Journal of Shanghai Jiaotong University, 2018
, 52(10)
: 1292
-1297
.
DOI: 10.16183/j.cnki.jsjtu.2018.10.018
[1]NARENDRA K S, THATHACHAR M A L. Learning automata-a survey[J]. IEEE Transactions on Systems, Man, and Cybernetics, 1974, (4): 323-334.
[2]KAELBLING L P, LITTMAN M L, MOORE A W. Reinforcement learning: A survey[J]. Journal of Artificial Intelligence Research, 1996, (4): 237-285.
[3]郝圣, 张沪寅, 宋梦凯.基于学习自动机理论与稳定性控制的自适应移动无线AdHoc网络分簇策略[J/OL]. 计算机学报, 2017, 40: 1-20.
HAO Sheng, ZHANG Huyin, SONG Mengkai. An adaptive clustering strategy for MANET based on learning automata theory and stability control[J/OL]. Chinese Journal of Computers, 2017, 40: 1-20.
[4]王建平, 陈改霞, 孔德川, 等.一种基于学习自动机的WSN区域覆盖算法[J].数据采集与处理, 2014, 29(6): 1016-1022.
WANG Jianping, CHEN Gaixia, KONG Dechuan, et al. Learning automata-based area coverage algorithm for wireless sensor networks[J]. Journal of Data Acquisition and Processing, 2014, 29(6): 1016-1022.
[5]JIAO Lei, ZHANG Xuan, OOMMEN B J, et al. Optimizing channel selection for cognitive radio networks using a distributed Bayesian learning automata-based approach[J]. Applied Intelligence, 2016, 44(2): 307-321.
[6]REZVANIAN A, SAGHIRI A M, VAHIDIPOUR S M, et al. Learning automata for wireless sensor networks[M]. Cham: Springer, 2018: 91-219.
[7]DI C, SU Y, HAN Z, et al. Learning automata based SVM for intrusion detection[C]//International Conference in Communications, Signal Processing, and Systems. Singapore: Springer, 2017: 2067-2074.
[8]ZHANG Y, DI C, HAN Z, et al. An adaptive honeypot deployment algorithm based on learning automata[C]//2017 IEEE Second International Conference on Data Science in Cyberspace (DSC). Piscataway NJ: [s.n.], 2017: 521-527.
[9]GE H, HUANG J, DI C, et al. Learning automata based approach for influence maximization problem on social networks[C]//2017 IEEE Second International Conference on Data Science in Cyberspace (DSC). Piscataway, NJ: [s.n.], 2017: 108-117.
[10]TSETLIN M L. On behaviour of finite automata in random medium[J]. Avtomatika i Telemekhanika, 1961 (22): 1345-1354.
[11]VARSHAVSKII V I, VORONTSOVA I P. On the behavior of stochastic automata with a variable structure[J]. Avtomatika i Telemekhanika, 1963, 24(3): 353-360.
[12]OOMMEN B J, HANSEN E. The asymptotic optimality of discretized linear reward-inaction learning automata[J]. IEEE Transactions on Systems, Man, and Cybernetics, 1984 (3): 542-545.
[13]THATHACHAR M A L. A class of rapidly converging algorithms for learning automata[C]//IEEE International Conference on Cybernetics and Society. Piscataway, NJ: [s.n.], 1984: 602-606.
[14]OOMMEN B J, LANCTT J K. Discretized pursuit learning automata[J]. IEEE Transactions on Systems, Man, and Cybernetics, 1990, 20(4): 931-938.
[15]AGACHE M, OOMMEN B J. Generalized pursuit learning schemes: New families of continuous and discretized learning automata[J]. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 2002, 32(6): 738-749.
[16]PAPADIMITRIOU G I, SKLIRA M, POMPORTSIS A S. A new class of/spl epsi/-optimal learning automata[J]. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 2004, 34(1): 246-254.
[17]GE H, JIANG W, LI S, et al. A novel estimator based learning automata algorithm[J]. Applied Intelligence, 2015, 42(2): 262-275.