Cooperative Pursuit of Unmanned Surface Vehicles Using Multi-Agent Reinforcement Learning

doi:10.1007/s12204-025-2816-6

Abstract

Abstract: This paper is concerned with the cooperative pursuit of unmanned surface vehicles (USVs) against the dynamic escaping target using multi-agent reinforcement learning. The Markov game process is established for pursuit-evasion, and the success criteria for cooperative capture of USVs are given by using distance and angle constraints. By virtue of the centralized training and decentralized execution framework as well as the long short-term memory network, cooperative pursuit training is conducted using the multi-agent soft actor-critic reinforcement learning, which can optimize capture performance of USVs against the escaping target. Besides, to avoid the occurrence of lazy capturer and increase the capture success rate, a multi-stage reward guidance method is developed, where the training process can be optimized according to the current states of both sides, effectively guiding vehicle to achieve the capture task from easy to difficult. Simulations are provided to illustrate the effectiveness of the proposed reinforcement learning method for cooperative pursuit of USVs.

Key words: unmanned surface vehicle (USV), cooperative pursuit, multi-agent soft actor-critic, multi-stage reward

摘要： 本文考虑动态逃逸的目标，研究基于多智能体强化学习的无人艇集群协同围捕问题。建立追逃博弈下的马尔科夫博弈过程，给出关于距离与角度的围捕成功标准。利用集中训练分布执行框架以及长短时记忆网络，设计基于多智能体柔性动作评判的协同围捕训练方法，以优化无人艇对逃逸目标的围捕行为。为避免出现懒惰的围捕者以及提高围捕成功率，设计多阶段引导下的奖励函数，根据追逃状态来优化训练过程，有效引导无人艇由易到难地实现围捕任务。仿真结果展示了所提基于多智能体强化学习的无人艇集群协同围捕方法的有效性。

关键词: 水面无人舰艇，协同围捕，多智能体柔性动作评判，多阶段奖励

CLC Number:

TP242

Qu Xingru, Li Chu, Jiang Yuze, Long Feifei, Zhang Rubo. Cooperative Pursuit of Unmanned Surface Vehicles Using Multi-Agent Reinforcement Learning[J]. J Shanghai Jiaotong Univ Sci, 2026, 31(1): 187-194.

References

[1] MU Z X, PAN J, ZHOU Z Y, et al. A survey of the pursuit–evasion problem in swarm intelligence [J]. Frontiers of Information Technology & Electronic Engineering, 2023, 24(8): 1093-1116.

[2] GAN W H, QU X Q, SONG D L, et al. Multi-USV cooperative chasing strategy based on obstacles assistance and deep reinforcement learning [J]. IEEE Transactions on Automation Science and Engineering, 2024, 21(4): 5895-5910.

[3] CHEN L, DUAN H B. Cooperative enclosing control for networked unmanned aerial vehicles to faster target [J]. Journal of Guidance, Control, and Dynamics, 2024, 47(2): 366-374.

[4] ZHOU M, WANG Z H, WANG J, et al. Multi-robot collaborative hunting in cluttered environments with obstacle-avoiding voronoi cells [J]. IEEE/CAA Journal of Automatica Sinica, 2024, 11(7): 1643-1655.

[5] XING N, ZHANG H T, ZHU L J. Prescribed-time collective evader-capturing for autonomous surface vehicles [J]. Automatica, 2024, 167: 111761.

[6] FAN Z L, YANG H Y, LIU F, et al. Reinforcement learning method for target hunting control of multi-robot systems with obstacles [J]. International Journal of Intelligent Systems, 2022, 37(12): 11275-11298.

[7] FANG X, WANG C, XIE L H, et al. Cooperative pursuit with multi-pursuer and one faster free-moving evader [J]. IEEE Transactions on Cybernetics, 2022, 52(3): 1405-1414.

[8] CHEN C, LIANG X, ZHANG Z, et al. Cooperative strategy based on a two-layer game model for inferior USVs to intercept a superior USV [J]. Ocean Engineering, 2024, 293: 116600.

[9] SUN W, TSIOTRAS P, LOLLA T, et al. Multiple-pursuer/one-evader pursuit–evasion game in dynamic flowfields [J]. Journal of Guidance, Control, and Dynamics, 2017, 40(7): 1627-1637.

[10] QU X R, JIANG Y Z, ZHANG R B, et al. A deep reinforcement learning-based path-following control scheme for an uncertain under-actuated autonomous marine vehicle [J]. Journal of Marine Science and Engineering, 2023, 11(9): 1762.

[11] DONG Y B, CUI T, ZHOU Y F, et al. Reward function design method for long episode pursuit tasks under polar coordinate in multi-agent reinforcement learning [J]. Journal of Shanghai Jiao Tong University (Science), 2024, 29(4): 646-655.

[12] DU W B, GUO T, CHEN J, et al. Cooperative pursuit of unauthorized UAVs in urban airspace via multi-agent reinforcement learning [J]. Transportation Research Part C: Emerging Technologies, 2021, 128: 103122.

[13] MA J C, LU H M, XIAO J H, et al. Multi-robot target encirclement control with collision avoidance via deep reinforcement learning [J]. Journal of Intelligent & Robotic Systems, 2020, 99(2): 371-386.

[14] XIA J W, LUO Y S, LIU Z K, et al. Cooperative multi-target hunting by unmanned surface vehicles based on multi-agent reinforcement learning [J]. Defence Technology, 2023, 29: 80-94.

[15] NANTOGMA S, ZHANG S Y, YU X W, et al. Multi-USV dynamic navigation and target capture: A guided multi-agent reinforcement learning approach [J]. Electronics, 2023, 12(7): 1523.

[16] QU X Q, GAN W H, SONG D L, et al. Pursuit-evasion game strategy of USV based on deep reinforcement learning in complex multi-obstacle environment [J]. Ocean Engineering, 2023, 273: 114016.

[17] LI F B, YIN M M, WANG T D, et al. Distributed pursuit-evasion game of limited perception USV swarm based on multiagent proximal policy optimization [J]. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2024, 54(10): 6435-6446.

[18] ZHANG H Q, SHI J H, WU L H, et al. Multi-agent self-organizing cooperative hunting in non-convex environment with improved MADDPG algorithm [J]. Journal of Frontiers of Computer Science and Technology, 2024, 18(8): 2080-2090 (in Chinese).

[19] FOSSEN T. Handbook of marine craft hydrodynamics and motion control [M]. Chichester: Wiley, 2011.

[20] HE Z C, DONG L, SONG C W, et al. Multiagent soft actor-critic based hybrid motion planner for mobile robots [J]. IEEE Transactions on Neural Networks and Learning Systems, 2023, 34(12): 10980-10992.

[21] WANG N, SUN Z, JIAO Y H, et al. Surge-heading guidance-based finite-time path following of underactuated marine vehicles [J]. IEEE Transactions on Vehicular Technology, 2019, 68(9): 8523-8532.