上海交通大学学报(自然版) ›› 2018, Vol. 52 ›› Issue (10): 1292-1297.doi: 10.16183/j.cnki.jsjtu.2018.10.018

• 学报(中文) • 上一篇    下一篇

基于双重竞争策略的学习自动机算法

狄冲a,齐开悦b,吴越a,苏宇a,李生红a   

  1. 上海交通大学 a. 网络空间安全学院; b. 电子信息与电气工程学院, 上海 200240
  • 通讯作者: 齐开悦,男,讲师,E-mail: tommy-qi@sjtu.edu.cn.
  • 作者简介:狄冲(1995-),男,山东省滕州市人,博士生,主要从事机器学习、增强学习的研究. E-MAIL: dichong95@sjtu.edu.cn
  • 基金资助:
    国家电网公司总部科技项目(SGRIXTKJ[2017]133)

A Double Competitive Scheme Based Learning Automata Algorithm

DI Chong,QI Kaiyue,WU Yue,SU Yu,LI Shenghong   

  1. a. School of Cyber Security; b. School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University, Shanghai 200240, China

摘要: 学习自动机是增强学习理论体系中的重要组成部分,在应用数学的随机函数优化、信息安全的异常检测等理论和实际问题中发挥着重要作用.估计器算法是目前学习自动机中最为主流的一类算法,具有最高的算法性能.但是,由于估计器本身的局限性导致在学习初期估计值不准确,行为选择概率向量无法一直保持最优更新,且概率向量的更新完全依赖于固定步长,一次错误的更新需要大量额外的迭代来对其进行弥补,算法的收敛效率仍存在提升空间.针对上述问题,通过改进估计器算法的概率向量更新策略,提出一种基于双重竞争策略的学习自动机算法,并对其ε-收敛特性进行数学证明.实验结果显示,该算法提高了学习自动机的收敛效率,从而验证并确立了所提策略的有效性和算法的优越性.

关键词: 学习自动机, 增强学习, 估计器算法, 平稳环境, 双重竞争策略

Abstract: Learning automaton (LA) plays an essential part in the theoretical system of reinforcement learning. Not only does LA have important theoretical value, but also has extensive practical value, including many applications in security field. As classic family of LA, estimator algorithms have improved the performance of algorithms to a novel level. However, because of the limitation of estimators, the estimated values are not reliable at the beginning of learning process, which will lead to wrong updates of actions’ selection probability vector. Using current update strategy, the offset of wrong updates only relies on large number of extra iterations. Thus, there is still room for enhancement of convergence efficiency. In this paper, for improving the update strategy of probability vector, we proposed a double competitive scheme based learning automata algorithm and proved its ε-optimality. The simulation results showed that the proposed algorithms reached the highest convergence efficiency and confirmed the validity of proposed probability vector’s update strategy.

Key words: learning automata, reinforcement learning, stationary environments, estimator algorithms, double competitive scheme

中图分类号: