基于双重竞争策略的学习自动机算法

doi:10.16183/j.cnki.jsjtu.2018.10.018

上海交通大学学报 ›› 2018, Vol. 52 ›› Issue (10): 1292-1297.doi: 10.16183/j.cnki.jsjtu.2018.10.018

基于双重竞争策略的学习自动机算法

狄冲a，齐开悦b，吴越a，苏宇a，李生红a

上海交通大学 a. 网络空间安全学院； b. 电子信息与电气工程学院，上海 200240

发布日期:2025-07-02
通讯作者: 齐开悦，男，讲师，E-mail: tommy-qi@sjtu.edu.cn.
作者简介:狄冲（1995-），男，山东省滕州市人，博士生，主要从事机器学习、增强学习的研究. E-MAIL: dichong95@sjtu.edu.cn
基金资助:
国家电网公司总部科技项目(SGRIXTKJ［2017］133)

A Double Competitive Scheme Based Learning Automata Algorithm

DI Chong,QI Kaiyue,WU Yue,SU Yu,LI Shenghong

a. School of Cyber Security; b. School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University, Shanghai 200240, China

Published:2025-07-02

摘要/Abstract

摘要： 学习自动机是增强学习理论体系中的重要组成部分，在应用数学的随机函数优化、信息安全的异常检测等理论和实际问题中发挥着重要作用.估计器算法是目前学习自动机中最为主流的一类算法，具有最高的算法性能.但是，由于估计器本身的局限性导致在学习初期估计值不准确，行为选择概率向量无法一直保持最优更新，且概率向量的更新完全依赖于固定步长，一次错误的更新需要大量额外的迭代来对其进行弥补，算法的收敛效率仍存在提升空间.针对上述问题，通过改进估计器算法的概率向量更新策略，提出一种基于双重竞争策略的学习自动机算法，并对其ε-收敛特性进行数学证明.实验结果显示，该算法提高了学习自动机的收敛效率，从而验证并确立了所提策略的有效性和算法的优越性.

关键词: 学习自动机, 增强学习, 估计器算法, 平稳环境, 双重竞争策略

Abstract: Learning automaton (LA) plays an essential part in the theoretical system of reinforcement learning. Not only does LA have important theoretical value, but also has extensive practical value, including many applications in security field. As classic family of LA, estimator algorithms have improved the performance of algorithms to a novel level. However, because of the limitation of estimators, the estimated values are not reliable at the beginning of learning process, which will lead to wrong updates of actions’ selection probability vector. Using current update strategy, the offset of wrong updates only relies on large number of extra iterations. Thus, there is still room for enhancement of convergence efficiency. In this paper, for improving the update strategy of probability vector, we proposed a double competitive scheme based learning automata algorithm and proved its ε-optimality. The simulation results showed that the proposed algorithms reached the highest convergence efficiency and confirmed the validity of proposed probability vector’s update strategy.

Key words: learning automata, reinforcement learning, stationary environments, estimator algorithms, double competitive scheme

中图分类号:

TP 181

狄冲a，齐开悦b，吴越a，苏宇a，李生红a. 基于双重竞争策略的学习自动机算法[J]. 上海交通大学学报, 2018, 52(10): 1292-1297.

DI Chong,QI Kaiyue,WU Yue,SU Yu,LI Shenghong. A Double Competitive Scheme Based Learning Automata Algorithm[J]. Journal of Shanghai Jiao Tong University, 2018, 52(10): 1292-1297.

参考文献

［1］NARENDRA K S, THATHACHAR M A L. Learning automata-a survey［J］. IEEE Transactions on Systems, Man, and Cybernetics, 1974, (4): 323-334. ［2］KAELBLING L P, LITTMAN M L, MOORE A W. Reinforcement learning: A survey［J］. Journal of Artificial Intelligence Research, 1996, (4): 237-285. ［3］郝圣, 张沪寅, 宋梦凯.基于学习自动机理论与稳定性控制的自适应移动无线AdHoc网络分簇策略［J/OL］. 计算机学报, 2017, 40: 1-20. HAO Sheng, ZHANG Huyin, SONG Mengkai. An adaptive clustering strategy for MANET based on learning automata theory and stability control［J/OL］. Chinese Journal of Computers, 2017, 40: 1-20. ［4］王建平, 陈改霞, 孔德川, 等.一种基于学习自动机的WSN区域覆盖算法［J］.数据采集与处理, 2014, 29(6): 1016-1022. WANG Jianping, CHEN Gaixia, KONG Dechuan, et al. Learning automata-based area coverage algorithm for wireless sensor networks［J］. Journal of Data Acquisition and Processing, 2014, 29(6): 1016-1022. ［5］JIAO Lei, ZHANG Xuan, OOMMEN B J, et al. Optimizing channel selection for cognitive radio networks using a distributed Bayesian learning automata-based approach［J］. Applied Intelligence, 2016, 44(2): 307-321. ［6］REZVANIAN A, SAGHIRI A M, VAHIDIPOUR S M, et al. Learning automata for wireless sensor networks［M］. Cham: Springer, 2018: 91-219. ［7］DI C, SU Y, HAN Z, et al. Learning automata based SVM for intrusion detection［C］//International Conference in Communications, Signal Processing, and Systems. Singapore: Springer, 2017: 2067-2074. ［8］ZHANG Y, DI C, HAN Z, et al. An adaptive honeypot deployment algorithm based on learning automata［C］//2017 IEEE Second International Conference on Data Science in Cyberspace (DSC). Piscataway NJ: ［s.n.］, 2017: 521-527. ［9］GE H, HUANG J, DI C, et al. Learning automata based approach for influence maximization problem on social networks［C］//2017 IEEE Second International Conference on Data Science in Cyberspace (DSC). Piscataway, NJ: ［s.n.］, 2017: 108-117. ［10］TSETLIN M L. On behaviour of finite automata in random medium［J］. Avtomatika i Telemekhanika, 1961 (22): 1345-1354. ［11］VARSHAVSKII V I, VORONTSOVA I P. On the behavior of stochastic automata with a variable structure［J］. Avtomatika i Telemekhanika, 1963, 24(3): 353-360. ［12］OOMMEN B J, HANSEN E. The asymptotic optimality of discretized linear reward-inaction learning automata［J］. IEEE Transactions on Systems, Man, and Cybernetics, 1984 (3): 542-545. ［13］THATHACHAR M A L. A class of rapidly converging algorithms for learning automata［C］//IEEE International Conference on Cybernetics and Society. Piscataway, NJ: ［s.n.］, 1984: 602-606. ［14］OOMMEN B J, LANCTT J K. Discretized pursuit learning automata［J］. IEEE Transactions on Systems, Man, and Cybernetics, 1990, 20(4): 931-938. ［15］AGACHE M, OOMMEN B J. Generalized pursuit learning schemes: New families of continuous and discretized learning automata［J］. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 2002, 32(6): 738-749. ［16］PAPADIMITRIOU G I, SKLIRA M, POMPORTSIS A S. A new class of/spl epsi/-optimal learning automata［J］. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 2004, 34(1): 246-254. ［17］GE H, JIANG W, LI S, et al. A novel estimator based learning automata algorithm［J］. Applied Intelligence, 2015, 42(2): 262-275.

[1]	王肖霞，杨风暴，吉琳娜，蔺素珍，史冬梅. 基于柔性相似度量和可能性歪度的尾矿坝风险评估方法[J]. 上海交通大学学报（自然版）, 2014, 48(10): 1440-1445.
[2]	吴文通1，李元祥1，韦邦合2，郑思龙1. 局部测地距离估计的增量等距特征映射算法[J]. 上海交通大学学报（自然版）, 2013, 47(07): 1082-1087.
[3]	雷菊阳. 基于中国餐馆过程的语音增强[J]. 上海交通大学学报（自然版）, 2013, 47(04): 635-639.
[4]	雷菊阳,黄克，许海翔，史习智. 过程混合的高斯过程模型混合采样推理[J]. 上海交通大学学报（自然版）, 2010, 44(02): 271-0275.
[5]	屠恩美，杨杰. 半监督学习理论及其研究进展概述[J]. 上海交通大学学报, 2018, 52(10): 1280-1291.
[6]	刘凯a，张立民b，周立军a. 随机受限玻尔兹曼机组设计[J]. 上海交通大学学报, 2017, 51(10): 1235-1240.
[7]	刘昊，丁进良，杨翠娥，柴天佑. 基于择优学习策略的差分进化算法[J]. 上海交通大学学报, 2017, 51(6): 704-708.

基于双重竞争策略的学习自动机算法

A Double Competitive Scheme Based Learning Automata Algorithm

PDF (PC)

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 7

编辑推荐

Metrics

本文评价