基于双重竞争策略的学习自动机算法

DI Chong; QI Kaiyue; WU Yue; SU Yu; LI Shenghong

doi:10.16183/j.cnki.jsjtu.2018.10.018

上海交通大学学报 >

2018 , Vol. 52 >Issue 10: 1292 - 1297

DOI: https://doi.org/10.16183/j.cnki.jsjtu.2018.10.018

学报（中文）

基于双重竞争策略的学习自动机算法

DI Chong ,
QI Kaiyue ,
WU Yue ,
SU Yu ,
LI Shenghong

展开

上海交通大学 a. 网络空间安全学院； b. 电子信息与电气工程学院，上海 200240

狄冲（1995-），男，山东省滕州市人，博士生，主要从事机器学习、增强学习的研究. E-MAIL: dichong95@sjtu.edu.cn

网络出版日期: 2025-07-02

基金资助

国家电网公司总部科技项目(SGRIXTKJ［2017］133)

收起

A Double Competitive Scheme Based Learning Automata Algorithm

狄冲a，齐开悦b，吴越a，苏宇a，李生红a

Expand

a. School of Cyber Security; b. School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University, Shanghai 200240, China

Online published: 2025-07-02

Fold

摘要

学习自动机是增强学习理论体系中的重要组成部分，在应用数学的随机函数优化、信息安全的异常检测等理论和实际问题中发挥着重要作用.估计器算法是目前学习自动机中最为主流的一类算法，具有最高的算法性能.但是，由于估计器本身的局限性导致在学习初期估计值不准确，行为选择概率向量无法一直保持最优更新，且概率向量的更新完全依赖于固定步长，一次错误的更新需要大量额外的迭代来对其进行弥补，算法的收敛效率仍存在提升空间.针对上述问题，通过改进估计器算法的概率向量更新策略，提出一种基于双重竞争策略的学习自动机算法，并对其ε-收敛特性进行数学证明.实验结果显示，该算法提高了学习自动机的收敛效率，从而验证并确立了所提策略的有效性和算法的优越性.

关键词： 学习自动机; 增强学习; 估计器算法; 平稳环境; 双重竞争策略

本文引用格式

DI Chong , QI Kaiyue , WU Yue , SU Yu , LI Shenghong . 基于双重竞争策略的学习自动机算法[J]. 上海交通大学学报, 2018 , 52(10) : 1292 -1297 . DOI: 10.16183/j.cnki.jsjtu.2018.10.018

Abstract

Learning automaton (LA) plays an essential part in the theoretical system of reinforcement learning. Not only does LA have important theoretical value, but also has extensive practical value, including many applications in security field. As classic family of LA, estimator algorithms have improved the performance of algorithms to a novel level. However, because of the limitation of estimators, the estimated values are not reliable at the beginning of learning process, which will lead to wrong updates of actions’ selection probability vector. Using current update strategy, the offset of wrong updates only relies on large number of extra iterations. Thus, there is still room for enhancement of convergence efficiency. In this paper, for improving the update strategy of probability vector, we proposed a double competitive scheme based learning automata algorithm and proved its ε-optimality. The simulation results showed that the proposed algorithms reached the highest convergence efficiency and confirmed the validity of proposed probability vector’s update strategy.

Key words： learning automata; reinforcement learning; stationary environments; estimator algorithms; double competitive scheme

参考文献

［1］NARENDRA K S, THATHACHAR M A L. Learning automata-a survey［J］. IEEE Transactions on Systems, Man, and Cybernetics, 1974, (4): 323-334. ［2］KAELBLING L P, LITTMAN M L, MOORE A W. Reinforcement learning: A survey［J］. Journal of Artificial Intelligence Research, 1996, (4): 237-285. ［3］郝圣, 张沪寅, 宋梦凯.基于学习自动机理论与稳定性控制的自适应移动无线AdHoc网络分簇策略［J/OL］. 计算机学报, 2017, 40: 1-20. HAO Sheng, ZHANG Huyin, SONG Mengkai. An adaptive clustering strategy for MANET based on learning automata theory and stability control［J/OL］. Chinese Journal of Computers, 2017, 40: 1-20. ［4］王建平, 陈改霞, 孔德川, 等.一种基于学习自动机的WSN区域覆盖算法［J］.数据采集与处理, 2014, 29(6): 1016-1022. WANG Jianping, CHEN Gaixia, KONG Dechuan, et al. Learning automata-based area coverage algorithm for wireless sensor networks［J］. Journal of Data Acquisition and Processing, 2014, 29(6): 1016-1022. ［5］JIAO Lei, ZHANG Xuan, OOMMEN B J, et al. Optimizing channel selection for cognitive radio networks using a distributed Bayesian learning automata-based approach［J］. Applied Intelligence, 2016, 44(2): 307-321. ［6］REZVANIAN A, SAGHIRI A M, VAHIDIPOUR S M, et al. Learning automata for wireless sensor networks［M］. Cham: Springer, 2018: 91-219. ［7］DI C, SU Y, HAN Z, et al. Learning automata based SVM for intrusion detection［C］//International Conference in Communications, Signal Processing, and Systems. Singapore: Springer, 2017: 2067-2074. ［8］ZHANG Y, DI C, HAN Z, et al. An adaptive honeypot deployment algorithm based on learning automata［C］//2017 IEEE Second International Conference on Data Science in Cyberspace (DSC). Piscataway NJ: ［s.n.］, 2017: 521-527. ［9］GE H, HUANG J, DI C, et al. Learning automata based approach for influence maximization problem on social networks［C］//2017 IEEE Second International Conference on Data Science in Cyberspace (DSC). Piscataway, NJ: ［s.n.］, 2017: 108-117. ［10］TSETLIN M L. On behaviour of finite automata in random medium［J］. Avtomatika i Telemekhanika, 1961 (22): 1345-1354. ［11］VARSHAVSKII V I, VORONTSOVA I P. On the behavior of stochastic automata with a variable structure［J］. Avtomatika i Telemekhanika, 1963, 24(3): 353-360. ［12］OOMMEN B J, HANSEN E. The asymptotic optimality of discretized linear reward-inaction learning automata［J］. IEEE Transactions on Systems, Man, and Cybernetics, 1984 (3): 542-545. ［13］THATHACHAR M A L. A class of rapidly converging algorithms for learning automata［C］//IEEE International Conference on Cybernetics and Society. Piscataway, NJ: ［s.n.］, 1984: 602-606. ［14］OOMMEN B J, LANCTT J K. Discretized pursuit learning automata［J］. IEEE Transactions on Systems, Man, and Cybernetics, 1990, 20(4): 931-938. ［15］AGACHE M, OOMMEN B J. Generalized pursuit learning schemes: New families of continuous and discretized learning automata［J］. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 2002, 32(6): 738-749. ［16］PAPADIMITRIOU G I, SKLIRA M, POMPORTSIS A S. A new class of/spl epsi/-optimal learning automata［J］. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 2004, 34(1): 246-254. ［17］GE H, JIANG W, LI S, et al. A novel estimator based learning automata algorithm［J］. Applied Intelligence, 2015, 42(2): 262-275.

Options

文章导航

模态框（Modal）标题

摘要

本文引用格式

Abstract

参考文献