[1] GANZFRIED S, SANDHOLM T. Game theory-based opponent modeling
in large imperfect-information games [C]//The 10th International Conference on
Autonomous Agents and Multiagent Systems-Volume 2. Taipei: International
Foundation for Autonomous Agents and Multiagent Systems, 2011: 533-540.
[2] GANZFRIED S, SUN Q Y. Bayesian opponent exploitation in imperfect-information
games [C]//2018 IEEE Conference on Computational Intelligence and Games.
Maastricht: IEEE, 2018: 1-8.
[3] HERNANDEZ-LEAL P, KAISERS M, BAARSLAG T, et al. A survey of learning in
multiagent environments: Dealing with non-stationarity [DB/OL]. (2017- 07-28).
[4] CESA-BIANCHI N, LUGOSI G. Prediction, learning, and games [M]. Cambridge:
Cambridge University Press, 2006.
[5] ZINKEVICHM, JOHANSONM, BOWLING M, et al. Regret minimization in games with
incomplete information [M]//Advances in Neural Information Processing Systems
20. Red Hook: Curran Associates, 2007: 1729-1736.
[6] HART S, MAS-COLELL A. A simple adaptive procedure leading to correlated
equilibrium [J]. Econometrica, 2000, 68(5): 1127-1150.
[7] SHALEV-SHWARTZ S. Online learning and online convex optimization [J].
Foundations and Trends
in Machine Learning,
2011, 4(2): 107-194.
[8] HAGHTALAB N, NOOTHIGATTU R, PROCACCIA A. Weighted voting via no-regret
learning [J]. Proceedings of the AAAI Conference on Artificial Intelligence,
2018, 32(1): 1055-1062.
[9] CHEN Y L, VAUGHAN J W. A new understanding of prediction markets via
no-regret learning [C]//11th ACM conference on Electronic commerce. Cambridge:
ACM Press, 2010: 189-198.
[10] HARTLINE J, SYRGKANIS V, TARDOS E. Noregret learning in Bayesian games
[M]//Advances in neural information processing systems 28. Red Hook: Curran
Associates, 2015: 3061-3069.
[11] BILLINGS D, PAPP D, SCHAEFFER J, et al. Opponent modeling in poker
[C]//15th National Conference on Artificial Intelligence and 10th Innovative
Applications of Artificial Intelligence Conference. Madison: AAAI, 1998:
493-499.
[12] SCHADD F, BAKKES S, SPRONCK P. Opponent modeling in real-time strategy
games [C]//GAMEON’2007. Bologna: University of Bologna, 2007: 61-70.
[13] ALBRECHT S V, STONE P. Autonomous agents modelling other agents: A
comprehensive survey and open problems [J]. Artificial Intelligence, 2018, 258:
66-95.
[14] TANG Z T, ZHU Y H, ZHAO D B, et al. Enhanced rolling horizon evolution
algorithm with opponent model learning [J]. IEEE Transactions on Games, 2020.
[15] MCCRACKEN P, BOWLING M. Safe strategies for agent modelling in games
[C]//2004 AAAI Fall Symposium. Arlington: AAAI, 2004: 103-110.
[16] GANZFRIED S, SANDHOLM T. Safe opponent exploitation [J]. ACM Transactions
on Economics and Computation, 2015, 3(2): 1-28.
[17] ZHANG C J, LESSER V. Multi-agent learning with policy prediction [C]//24th
AAAI Conference on Artificial Intelligence. Atlanta: AAAI, 2010: 927-934.
[18] FOERSTER J, CHEN R Y, AL-SHEDIVAT M, et al. Learning with
opponent-learning awareness [DB/OL]. (2017-09-13).
[19] KIM D K, LIUM, RIEMER M, et al. A policy gradient algorithm for learning
to learn in multiagent reinforcement learning [C]//38th International
Conference on Machine Learning. Online: PMLR, 2021: 5541-5550.
[20] SODERSTROM T, STOICA P. System identification [M]. London: Prentice-Hall
International, 1989.
[21] LI L K. Approximation theory and recurrent networks [C]//International
Joint Conference on Neural Networks. Baltimore: IEEE, 1992: 266-271.
[22] FUNAHASHI K I, NAKAMURA Y. Approximation of dynamical systems by
continuous time recurrent neural networks [J]. Neural Networks, 1993, 6(6):
801-806.
[23] ZIMMERMANN H, NEUNEIER R. Modeling dynamical systems by recurrent neural
networks [M]//Data mining II. Southampton: WIT Press, 2000: 557-566.
[24] BILLINGS S A. Nonlinear system identification [M]. Chichester: John Wiley
& Sons, Ltd, 2013.
[25] CESA-BIANCHI N, FREUND Y, HAUSSLER D, et al. How to use expert advice [J].
Journal of the ACM, 1997, 44(3): 427-485.
[26] FREUND Y, SCHAPIRE R E. A decision-theoretic generalization of on-line
learning and an application to boosting [J]. Journal of Computer and System
Sciences, 1997, 55(1): 119-139.
[27] LITTLESTONE N, WARMUTH M K. The weighted majority algorithm [J].
Information and Computation, 1994, 108(2): 212-261.
[28] RAKHLIN A, SRIDHARAN K. Optimization, learning, and games with predictable
sequences [M]//Advances in neural information processing systems 26. Red Hook:
Curran Associates, 2013: 3066-3074.
[29] BROWN N, KROER C, SANDHOLM T. Dynamic thresholding and pruning for regret
minimization [J]. Proceedings of the AAAI Conference on Artificial
Intelligence, 2017, 31(1): 421-429.
[30] NEUMANN J. Zur theorie der gesellschaftsspiele [J]. Mathematische Annalen,
1928, 100(1): 295-320.
[31] AL-SHEDIVAT M, BANSAL T, BURDA Y, et al. Continuous adaptation via
meta-learning in nonstationary and competitive environments [DB/OL].
(2017-10-10).
[32] DASKALAKIS C, DECKELBAUM A, KIM A. Nearoptimal no-regret algorithms for
zero-sum games [J]. Games and Economic Behavior, 2015, 92: 327-348.
[33] AUER P, CESA-BIANCHI N, FREUND Y, et al. Gambling in a rigged casino: The
adversarial multiarmed bandit problem [C]//IEEE 36th Annual Foundations of
Computer Science. Milwaukee: IEEE, 1995: 322-331.
[34] RUMELHART D E, HINTON G E, WILLIAMS R J. Learning representations by
back-propagating errors [J]. Nature, 1986, 323(6088): 533-536.
[35] ELMAN J L. Finding structure in time [J]. Cognitive Science, 1990, 14(2):
179-211.
[36] JIN L, NIKIFORUK P N, GUPTA M M. Approximation of discrete-time
state-space trajectories using dynamic recurrent neural networks [J]. IEEE
Transactions on Automatic Control, 1995, 40(7): 1266-1270.
[37] HORNIK K, STINCHCOMBE M, WHITE H. Multilayer feedforward networks are
universal approximators [J]. Neural Networks, 1989, 2(5): 359-366.
[38] LLANAS B, LANTAR′ON S, S′AINZ F J. Constructive approximation of
discontinuous functions by neural networks [J]. Neural Processing Letters,
2008, 27(3): 209-226.
[39] HOCHREITER S, SCHMIDHUBER J. Long shortterm memory [J]. Neural
Computation, 1997, 9(8): 1735-1780.
[40] JAEGER H. A tutorial on training recurrent neural networks, covering BPTT,
RTRL, EKF and the “echo state network” approach [R]. Bremen: German National Research
Center for Information Technology, 2002.
[41] BILLINGS D. The first international roshambo programming competition [J].
ICGA Journal, 2000, 23(1): 42-50.
[42] PASZKE A, GROSS S, MASSA F, et al. Pytorch: An imperative style,
high-performance deep learning library [M]//Advances in neural information
processing systems 32. Red Hook: Curran Associates, 2019: 8024-8035.
[43] KINGMA D P, BA J. Adam: A method for stochastic optimization [DB/OL].
(2014-12-22).
[44] VAJENTE G, HUANG Y, ISI M, et al. Machinelearning nonstationary noise out
of gravitationalwave detectors [J]. Physical Review D, 2020, 101(4): 042003.
[45] BERGER U. Fictitious play in 2×n games [J]. Journal of Economic Theory,
2005, 120(2): 139-154.
[46] BAILEY J P, PILIOURAS G. Fast and furious learning in zero-s[46] BAILEY J P, PILIOURAS G. Fast and furious learning in zero-sum games: Vanishing regret with nonvanishing step sizes [M]//Advances in neural information processing systems 32. Red Hook: Curran Associates, 2019: 12977-12987.