J Shanghai Jiaotong Univ Sci ›› 2025, Vol. 30 ›› Issue (2): 385-398.doi: 10.1007/s12204-023-2610-2
• Automation & Computer Science • Previous Articles Next Articles
李凯1,黄文瀚1,李晨晨2,邓小铁3
Accepted:
2021-10-14
Online:
2025-03-21
Published:
2025-03-21
CLC Number:
Li Kai, Huang Wenhan, Li Chenchen, Deng Xiaotie. Exploiting a No-Regret Opponent in Repeated Zero-Sum Games[J]. J Shanghai Jiaotong Univ Sci, 2025, 30(2): 385-398.
[1] GANZFRIED S, SANDHOLM T. Game theory-based opponent modeling
in large imperfect-information games [C]//The 10th International Conference on
Autonomous Agents and Multiagent Systems-Volume 2. Taipei: International
Foundation for Autonomous Agents and Multiagent Systems, 2011: 533-540. in Machine Learning, 2011, 4(2): 107-194. [8] HAGHTALAB N, NOOTHIGATTU R, PROCACCIA A. Weighted voting via no-regret learning [J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2018, 32(1): 1055-1062. [9] CHEN Y L, VAUGHAN J W. A new understanding of prediction markets via no-regret learning [C]//11th ACM conference on Electronic commerce. Cambridge: ACM Press, 2010: 189-198. [10] HARTLINE J, SYRGKANIS V, TARDOS E. Noregret learning in Bayesian games [M]//Advances in neural information processing systems 28. Red Hook: Curran Associates, 2015: 3061-3069. [11] BILLINGS D, PAPP D, SCHAEFFER J, et al. Opponent modeling in poker [C]//15th National Conference on Artificial Intelligence and 10th Innovative Applications of Artificial Intelligence Conference. Madison: AAAI, 1998: 493-499. [12] SCHADD F, BAKKES S, SPRONCK P. Opponent modeling in real-time strategy games [C]//GAMEON’2007. Bologna: University of Bologna, 2007: 61-70. [13] ALBRECHT S V, STONE P. Autonomous agents modelling other agents: A comprehensive survey and open problems [J]. Artificial Intelligence, 2018, 258: 66-95. [14] TANG Z T, ZHU Y H, ZHAO D B, et al. Enhanced rolling horizon evolution algorithm with opponent model learning [J]. IEEE Transactions on Games, 2020. [15] MCCRACKEN P, BOWLING M. Safe strategies for agent modelling in games [C]//2004 AAAI Fall Symposium. Arlington: AAAI, 2004: 103-110. [16] GANZFRIED S, SANDHOLM T. Safe opponent exploitation [J]. ACM Transactions on Economics and Computation, 2015, 3(2): 1-28. [17] ZHANG C J, LESSER V. Multi-agent learning with policy prediction [C]//24th AAAI Conference on Artificial Intelligence. Atlanta: AAAI, 2010: 927-934. [18] FOERSTER J, CHEN R Y, AL-SHEDIVAT M, et al. Learning with opponent-learning awareness [DB/OL]. (2017-09-13). [19] KIM D K, LIUM, RIEMER M, et al. A policy gradient algorithm for learning to learn in multiagent reinforcement learning [C]//38th International Conference on Machine Learning. Online: PMLR, 2021: 5541-5550. [20] SODERSTROM T, STOICA P. System identification [M]. London: Prentice-Hall International, 1989. [21] LI L K. Approximation theory and recurrent networks [C]//International Joint Conference on Neural Networks. Baltimore: IEEE, 1992: 266-271. [22] FUNAHASHI K I, NAKAMURA Y. Approximation of dynamical systems by continuous time recurrent neural networks [J]. Neural Networks, 1993, 6(6): 801-806. [23] ZIMMERMANN H, NEUNEIER R. Modeling dynamical systems by recurrent neural networks [M]//Data mining II. Southampton: WIT Press, 2000: 557-566. [24] BILLINGS S A. Nonlinear system identification [M]. Chichester: John Wiley & Sons, Ltd, 2013. [25] CESA-BIANCHI N, FREUND Y, HAUSSLER D, et al. How to use expert advice [J]. Journal of the ACM, 1997, 44(3): 427-485. [26] FREUND Y, SCHAPIRE R E. A decision-theoretic generalization of on-line learning and an application to boosting [J]. Journal of Computer and System Sciences, 1997, 55(1): 119-139. [27] LITTLESTONE N, WARMUTH M K. The weighted majority algorithm [J]. Information and Computation, 1994, 108(2): 212-261. [28] RAKHLIN A, SRIDHARAN K. Optimization, learning, and games with predictable sequences [M]//Advances in neural information processing systems 26. Red Hook: Curran Associates, 2013: 3066-3074. [29] BROWN N, KROER C, SANDHOLM T. Dynamic thresholding and pruning for regret minimization [J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2017, 31(1): 421-429. [30] NEUMANN J. Zur theorie der gesellschaftsspiele [J]. Mathematische Annalen, 1928, 100(1): 295-320. [31] AL-SHEDIVAT M, BANSAL T, BURDA Y, et al. Continuous adaptation via meta-learning in nonstationary and competitive environments [DB/OL]. (2017-10-10). [32] DASKALAKIS C, DECKELBAUM A, KIM A. Nearoptimal no-regret algorithms for zero-sum games [J]. Games and Economic Behavior, 2015, 92: 327-348. [33] AUER P, CESA-BIANCHI N, FREUND Y, et al. Gambling in a rigged casino: The adversarial multiarmed bandit problem [C]//IEEE 36th Annual Foundations of Computer Science. Milwaukee: IEEE, 1995: 322-331. [34] RUMELHART D E, HINTON G E, WILLIAMS R J. Learning representations by back-propagating errors [J]. Nature, 1986, 323(6088): 533-536. [35] ELMAN J L. Finding structure in time [J]. Cognitive Science, 1990, 14(2): 179-211. [36] JIN L, NIKIFORUK P N, GUPTA M M. Approximation of discrete-time state-space trajectories using dynamic recurrent neural networks [J]. IEEE Transactions on Automatic Control, 1995, 40(7): 1266-1270. [37] HORNIK K, STINCHCOMBE M, WHITE H. Multilayer feedforward networks are universal approximators [J]. Neural Networks, 1989, 2(5): 359-366. [38] LLANAS B, LANTAR′ON S, S′AINZ F J. Constructive approximation of discontinuous functions by neural networks [J]. Neural Processing Letters, 2008, 27(3): 209-226. [39] HOCHREITER S, SCHMIDHUBER J. Long shortterm memory [J]. Neural Computation, 1997, 9(8): 1735-1780. [40] JAEGER H. A tutorial on training recurrent neural networks, covering BPTT, RTRL, EKF and the “echo state network” approach [R]. Bremen: German National Research Center for Information Technology, 2002. [41] BILLINGS D. The first international roshambo programming competition [J]. ICGA Journal, 2000, 23(1): 42-50. [42] PASZKE A, GROSS S, MASSA F, et al. Pytorch: An imperative style, high-performance deep learning library [M]//Advances in neural information processing systems 32. Red Hook: Curran Associates, 2019: 8024-8035. [43] KINGMA D P, BA J. Adam: A method for stochastic optimization [DB/OL]. (2014-12-22). [44] VAJENTE G, HUANG Y, ISI M, et al. Machinelearning nonstationary noise out of gravitationalwave detectors [J]. Physical Review D, 2020, 101(4): 042003. [45] BERGER U. Fictitious play in 2×n games [J]. Journal of Economic Theory, 2005, 120(2): 139-154. [46] BAILEY J P, PILIOURAS G. Fast and furious learning in zero-s[46] BAILEY J P, PILIOURAS G. Fast and furious learning in zero-sum games: Vanishing regret with nonvanishing step sizes [M]//Advances in neural information processing systems 32. Red Hook: Curran Associates, 2019: 12977-12987. |
[1] | Xiao Wenbo, Xiong Jiakai, Yu Lesheng, He Yinshui, Ma Guohong. Weld Defect Monitoring Based on Two-Stage Convolutional Neural Network [J]. J Shanghai Jiaotong Univ Sci, 2025, 30(2): 291-299. |
[2] | Diao Zijian, Cao Shuai, Li Wenwei, Liang Jianan, Wen Guilin, Huang Weixi, Zhang Shouming. Person Re-Identification Based on Spatial Feature Learning and Multi-Granularity Feature Fusion [J]. J Shanghai Jiaotong Univ Sci, 2025, 30(2): 363-374. |
[3] | DING Lihui1, 2(丁黎辉), FU Lijun1, 3 (付立军), YANG Guang4(杨光), WAN Lin4, 5 (万林), CHANG Zhijun7(常志军). Video-Based Detection of Epileptic Spasms in IESS: Modeling, Detection, and Evaluation [J]. J Shanghai Jiaotong Univ Sci, 2025, 30(1): 1-9. |
[4] | KE Jing1(柯晶), ZHU Junchao2 (朱俊超), YANG Xin1(杨鑫), ZHANG Haolin3 (张浩林), SUN Yuxiang1(孙宇翔), WANG Jiayi1(王嘉怡), LU Yizhou4(鲁亦舟), SHEN Yiqing5(沈逸卿), LIU Sheng6(刘晟), JIANG Fusong7(蒋伏松), HUANG Qin8(黄琴). TshFNA-Examiner: A Nuclei Segmentation and Cancer Assessment Framework for Thyroid Cytology Image [J]. J Shanghai Jiaotong Univ Sci, 2024, 29(6): 945-957. |
[5] | LI Mingai1, 2∗ (李明爱), WEI Lina1 (魏丽娜). Motor Imagery Classification Based on Plain Convolutional Neural Network and Linear Interpolation [J]. J Shanghai Jiaotong Univ Sci, 2024, 29(6): 958-966. |
[6] | LIU Yuesheng (刘月笙), HE Ning∗ (贺宁), HE Lile (贺利乐), ZHANG Yiwen (张译文), XI Kun (习坤), ZHANG Mengrui (张梦芮). Self-Tuning of MPC Controller for Mobile Robot Path Tracking Based on Machine Learning [J]. J Shanghai Jiaotong Univ Sci, 2024, 29(6): 1028-1036. |
[7] | PENG Shiwei1 (彭诗玮), ZHANG Xi1∗ (张希), ZHU Wangwang1 (朱旺旺), DOU Rui2 (窦瑞). Comfort of Autonomous Vehicles Incorporating Quantitative Indices for Passenger Feeling [J]. J Shanghai Jiaotong Univ Sci, 2024, 29(6): 1063-1070. |
[8] | LIU Wen1, 3 (刘文), XU Jianxin2, 4 (许剑新), YANG Genke1, 3∗ (杨根科), CHEN Yuanfang5 (陈媛芳). Online Vehicle Forensics Method of Responsible Party for Accidents Based on LSTM-BiDBN External Intrusion Detection [J]. J Shanghai Jiaotong Univ Sci, 2024, 29(6): 1161-1168. |
[9] | GENG Zongsheng1 (耿宗盛), ZHAO Dongdong1,2 (赵东东), ZHOU Xingwen1 (周兴文), YAN Lei1 (闫磊), YAN Shi1,2∗ (阎石). Leader-Following Consensus of Multi-Agent Systems via Fully Distributed Event-Based Control [J]. J Shanghai Jiaotong Univ Sci, 2024, 29(4): 640-645. |
[10] | LIU Zengmin (刘增敏), WANG Shentao(王申涛), YAO Lixiu(姚莉秀), CAI Yunze(蔡云泽). Online Multi-Object Tracking Under Moving Unmanned Aerial Vehicle Platform Based on Object Detection and Feature Extraction Network [J]. J Shanghai Jiaotong Univ Sci, 2024, 29(3): 388-399. |
[11] | ZHANG Yanjun(张彦军), WANG Biyun(王碧云),CAI Yunze (蔡云泽). Multi-Channel Based on Attention Network for Infrared Small Target Detection [J]. J Shanghai Jiaotong Univ Sci, 2024, 29(3): 414-427. |
[12] | ZHU Jianghui(朱江辉),YE Hanghang(叶航航), YAO Lixiu1(姚莉秀), CAI Yunze(蔡云泽). Algorithm for Solving Traveling Salesman Problem Based on Self-Organizing Mapping Network [J]. J Shanghai Jiaotong Univ Sci, 2024, 29(3): 463-470. |
[13] | LI Mingai1,2,3∗ (李明爱), XU Dongqin1 (许东芹). Transfer Learning in Motor Imagery Brain Computer Interface: A Review [J]. J Shanghai Jiaotong Univ Sci, 2024, 29(1): 37-59. |
[14] | WANG Yujuan1 (王玉娟),LI Wengang2 (李文刚),LIU .Jianyong3 (刘建勇),CHEN Guangxue4 (陈广学),WANG Jun1*(汪军). Color Prediction Model of Gray Hybrid Multifilament Fabric [J]. J Shanghai Jiaotong Univ Sci, 2023, 28(6): 802-808. |
[15] | LÜ Runyan (吕润妍), PENG Na (彭娜), WU Yi (吴怡), CAI Yunze∗ (蔡云泽). Improved Spatial Registration Algorithm for Sensors on Multiple Mobile Platforms [J]. J Shanghai Jiaotong Univ Sci, 2022, 27(5): 638-648. |
Viewed | ||||||||||||||||||||||||||||||||||||||||||||||
Full text 3
|
|
|||||||||||||||||||||||||||||||||||||||||||||
Abstract 34
|
|
|||||||||||||||||||||||||||||||||||||||||||||