上海交通大学学报 ›› 2018, Vol. 52 ›› Issue (10): 1292-1297.doi: 10.16183/j.cnki.jsjtu.2018.10.018
狄冲a,齐开悦b,吴越a,苏宇a,李生红a
发布日期:
2025-07-02
通讯作者:
齐开悦,男,讲师,E-mail: tommy-qi@sjtu.edu.cn.
作者简介:
狄冲(1995-),男,山东省滕州市人,博士生,主要从事机器学习、增强学习的研究. E-MAIL: dichong95@sjtu.edu.cn
基金资助:
DI Chong,QI Kaiyue,WU Yue,SU Yu,LI Shenghong
Published:
2025-07-02
摘要: 学习自动机是增强学习理论体系中的重要组成部分,在应用数学的随机函数优化、信息安全的异常检测等理论和实际问题中发挥着重要作用.估计器算法是目前学习自动机中最为主流的一类算法,具有最高的算法性能.但是,由于估计器本身的局限性导致在学习初期估计值不准确,行为选择概率向量无法一直保持最优更新,且概率向量的更新完全依赖于固定步长,一次错误的更新需要大量额外的迭代来对其进行弥补,算法的收敛效率仍存在提升空间.针对上述问题,通过改进估计器算法的概率向量更新策略,提出一种基于双重竞争策略的学习自动机算法,并对其ε-收敛特性进行数学证明.实验结果显示,该算法提高了学习自动机的收敛效率,从而验证并确立了所提策略的有效性和算法的优越性.
中图分类号:
狄冲a,齐开悦b,吴越a,苏宇a,李生红a. 基于双重竞争策略的学习自动机算法[J]. 上海交通大学学报, 2018, 52(10): 1292-1297.
DI Chong,QI Kaiyue,WU Yue,SU Yu,LI Shenghong. A Double Competitive Scheme Based Learning Automata Algorithm[J]. Journal of Shanghai Jiao Tong University, 2018, 52(10): 1292-1297.
[1]NARENDRA K S, THATHACHAR M A L. Learning automata-a survey[J]. IEEE Transactions on Systems, Man, and Cybernetics, 1974, (4): 323-334. [2]KAELBLING L P, LITTMAN M L, MOORE A W. Reinforcement learning: A survey[J]. Journal of Artificial Intelligence Research, 1996, (4): 237-285. [3]郝圣, 张沪寅, 宋梦凯.基于学习自动机理论与稳定性控制的自适应移动无线AdHoc网络分簇策略[J/OL]. 计算机学报, 2017, 40: 1-20. HAO Sheng, ZHANG Huyin, SONG Mengkai. An adaptive clustering strategy for MANET based on learning automata theory and stability control[J/OL]. Chinese Journal of Computers, 2017, 40: 1-20. [4]王建平, 陈改霞, 孔德川, 等.一种基于学习自动机的WSN区域覆盖算法[J].数据采集与处理, 2014, 29(6): 1016-1022. WANG Jianping, CHEN Gaixia, KONG Dechuan, et al. Learning automata-based area coverage algorithm for wireless sensor networks[J]. Journal of Data Acquisition and Processing, 2014, 29(6): 1016-1022. [5]JIAO Lei, ZHANG Xuan, OOMMEN B J, et al. Optimizing channel selection for cognitive radio networks using a distributed Bayesian learning automata-based approach[J]. Applied Intelligence, 2016, 44(2): 307-321. [6]REZVANIAN A, SAGHIRI A M, VAHIDIPOUR S M, et al. Learning automata for wireless sensor networks[M]. Cham: Springer, 2018: 91-219. [7]DI C, SU Y, HAN Z, et al. Learning automata based SVM for intrusion detection[C]//International Conference in Communications, Signal Processing, and Systems. Singapore: Springer, 2017: 2067-2074. [8]ZHANG Y, DI C, HAN Z, et al. An adaptive honeypot deployment algorithm based on learning automata[C]//2017 IEEE Second International Conference on Data Science in Cyberspace (DSC). Piscataway NJ: [s.n.], 2017: 521-527. [9]GE H, HUANG J, DI C, et al. Learning automata based approach for influence maximization problem on social networks[C]//2017 IEEE Second International Conference on Data Science in Cyberspace (DSC). Piscataway, NJ: [s.n.], 2017: 108-117. [10]TSETLIN M L. On behaviour of finite automata in random medium[J]. Avtomatika i Telemekhanika, 1961 (22): 1345-1354. [11]VARSHAVSKII V I, VORONTSOVA I P. On the behavior of stochastic automata with a variable structure[J]. Avtomatika i Telemekhanika, 1963, 24(3): 353-360. [12]OOMMEN B J, HANSEN E. The asymptotic optimality of discretized linear reward-inaction learning automata[J]. IEEE Transactions on Systems, Man, and Cybernetics, 1984 (3): 542-545. [13]THATHACHAR M A L. A class of rapidly converging algorithms for learning automata[C]//IEEE International Conference on Cybernetics and Society. Piscataway, NJ: [s.n.], 1984: 602-606. [14]OOMMEN B J, LANCTT J K. Discretized pursuit learning automata[J]. IEEE Transactions on Systems, Man, and Cybernetics, 1990, 20(4): 931-938. [15]AGACHE M, OOMMEN B J. Generalized pursuit learning schemes: New families of continuous and discretized learning automata[J]. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 2002, 32(6): 738-749. [16]PAPADIMITRIOU G I, SKLIRA M, POMPORTSIS A S. A new class of/spl epsi/-optimal learning automata[J]. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 2004, 34(1): 246-254. [17]GE H, JIANG W, LI S, et al. A novel estimator based learning automata algorithm[J]. Applied Intelligence, 2015, 42(2): 262-275. |
[1] | 王肖霞,杨风暴,吉琳娜,蔺素珍,史冬梅. 基于柔性相似度量和可能性歪度的尾矿坝风险评估方法[J]. 上海交通大学学报(自然版), 2014, 48(10): 1440-1445. |
[2] | 吴文通1,李元祥1,韦邦合2,郑思龙1. 局部测地距离估计的增量等距特征映射算法[J]. 上海交通大学学报(自然版), 2013, 47(07): 1082-1087. |
[3] | 雷 菊 阳. 基于中国餐馆过程的语音增强[J]. 上海交通大学学报(自然版), 2013, 47(04): 635-639. |
[4] | 雷菊阳,黄克,许海翔,史习智. 过程混合的高斯过程模型混合采样推理[J]. 上海交通大学学报(自然版), 2010, 44(02): 271-0275. |
[5] | 刘凯a,张立民b,周立军a. 随机受限玻尔兹曼机组设计[J]. 上海交通大学学报, 2017, 51(10): 1235-1240. |
[6] | 刘昊,丁进良,杨翠娥,柴天佑. 基于择优学习策略的差分进化算法[J]. 上海交通大学学报, 2017, 51(6): 704-708. |
[7] | 屠恩美,杨杰. 半监督学习理论及其研究进展概述[J]. 上海交通大学学报, 2018, 52(10): 1280-1291. |
阅读次数 | ||||||||||||||||||||||||||||||||||||||||||||||
全文 85
|
|
|||||||||||||||||||||||||||||||||||||||||||||
摘要 42
|
|
|||||||||||||||||||||||||||||||||||||||||||||