J Shanghai Jiaotong Univ Sci ›› 2026, Vol. 31 ›› Issue (1): 187-194.doi: 10.1007/s12204-025-2816-6

• Intelligent Robots • Previous Articles     Next Articles

Cooperative Pursuit of Unmanned Surface Vehicles Using Multi-Agent Reinforcement Learning

基于多智能体强化学习的无人艇集群协同围捕

曲星儒,李初,江雨泽,龙飞飞,张汝波   

  1. College of Mechanical and Electronic Engineering, Dalian Minzu University, Dalian 116600, Liaoning, China
  2. 大连民族大学 机电工程学院,辽宁大连 116600
  • Received:2024-11-13 Accepted:2024-12-02 Online:2026-02-28 Published:2026-02-12

Abstract: This paper is concerned with the cooperative pursuit of unmanned surface vehicles (USVs) against the dynamic escaping target using multi-agent reinforcement learning. The Markov game process is established for pursuit-evasion, and the success criteria for cooperative capture of USVs are given by using distance and angle constraints. By virtue of the centralized training and decentralized execution framework as well as the long short-term memory network, cooperative pursuit training is conducted using the multi-agent soft actor-critic reinforcement learning, which can optimize capture performance of USVs against the escaping target. Besides, to avoid the occurrence of lazy capturer and increase the capture success rate, a multi-stage reward guidance method is developed, where the training process can be optimized according to the current states of both sides, effectively guiding vehicle to achieve the capture task from easy to difficult. Simulations are provided to illustrate the effectiveness of the proposed reinforcement learning method for cooperative pursuit of USVs.

Key words: unmanned surface vehicle (USV), cooperative pursuit, multi-agent soft actor-critic, multi-stage reward

摘要: 本文考虑动态逃逸的目标,研究基于多智能体强化学习的无人艇集群协同围捕问题。建立追逃博弈下的马尔科夫博弈过程,给出关于距离与角度的围捕成功标准。利用集中训练分布执行框架以及长短时记忆网络,设计基于多智能体柔性动作评判的协同围捕训练方法,以优化无人艇对逃逸目标的围捕行为。为避免出现懒惰的围捕者以及提高围捕成功率,设计多阶段引导下的奖励函数,根据追逃状态来优化训练过程,有效引导无人艇由易到难地实现围捕任务。仿真结果展示了所提基于多智能体强化学习的无人艇集群协同围捕方法的有效性。

关键词: 水面无人舰艇,协同围捕,多智能体柔性动作评判,多阶段奖励

CLC Number: