HighlightsMore...

14 July 2024, Volume 29 Issue 4 Previous Issue   
Special Issue on Multi-Agent Collaborative Perception and Control
Multi-Agent Path Planning Method Based on Improved Deep Q-Network in Dynamic Environments
LI Shuyi (李舒逸), LI Minzhe (李旻哲), JING Zhongliang (敬忠良)
2024, 29 (4):  601-612.  doi: 10.1007/s12204-024-2732-1
Abstract ( 26 )   PDF (1213KB) ( 19 )  
The multi-agent path planning problem presents significant challenges in dynamic environments, primarily due to the ever-changing positions of obstacles and the complex interactions between agents’ actions. These factors contribute to a tendency for the solution to converge slowly, and in some cases, diverge altogether. In addressing this issue, this paper introduces a novel approach utilizing a double dueling deep Q-network (D3QN), tailored for dynamic multi-agent environments. A novel reward function based on multi-agent positional constraints is designed, and a training strategy based on incremental learning is performed to achieve collaborative path planning of multiple agents. Moreover, the greedy and Boltzmann probability selection policy is introduced for action selection and avoiding convergence to local extremum. To match radar and image sensors, a convolutional neural network - long short-term memory (CNN-LSTM) architecture is constructed to extract the feature of multi-source measurement as the input of the D3QN. The algorithm’s efficacy and reliability are validated in a simulated environment, utilizing robot operating system and Gazebo. The simulation results show that the proposed algorithm provides a real-time solution for path planning tasks in dynamic scenarios. In terms of the average success rate and accuracy, the proposed method is superior to other deep learning algorithms, and the convergence speed is also improved.
References | Related Articles | Metrics
Fault-Tolerant Dynamical Consensus of Double-Integrator Multi-Agent Systems in the Presence of Asynchronous Self-Sensing Function Failures
WU Zhihai (吴治海), XIE Linbo (谢林柏)
2024, 29 (4):  613-624.  doi: 10.1007/s12204-024-2716-1
Abstract ( 13 )   PDF (540KB) ( 3 )  
Double-integrator multi-agent systems (MASs) might not achieve dynamical consensus, even if only partial agents suffer from self-sensing function failures (SSFFs). SSFFs might be asynchronous in real engineering application. The existing fault-tolerant dynamical consensus protocol suitable for synchronous SSFFs cannot be directly used to tackle fault-tolerant dynamical consensus of double-integrator MASs with partial agents subject to asynchronous SSFFs. Motivated by these facts, this paper explores a new fault-tolerant dynamical consensus protocol suitable for asynchronous SSFFs. First, multi-hop communication together with the idea of treating asynchronous SSFFs as multiple piecewise synchronous SSFFs is used for recovering the connectivity of network topology among all normal agents. Second, a fault-tolerant dynamical consensus protocol is designed for doubleintegrator MASs by utilizing the history information of an agent subject to SSFF for computing its own state information at the instants when its minimum-hop normal neighbor set changes. Then, it is theoretically proved that if the strategy of network topology connectivity recovery and the fault-tolerant dynamical consensus protocol with proper time-varying gains are used simultaneously, double-integrator MASs with all normal agents and all agents subject to SSFFs can reach dynamical consensus. Finally, comparison numerical simulations are given to illustrate the effectiveness of the theoretical results.
References | Related Articles | Metrics
Event-Triggered Fixed-Time Consensus of Second-Order Nonlinear Multi-Agent Systems with Delay and Switching Topologies
XING Youjing1 (邢优靖), GAO Jinfeng1∗ (高金凤), LIU Xiaoping1,2 (刘小平), WU Ping1 (吴平)
2024, 29 (4):  625-639.  doi: 10.1007/s12204-024-2695-2
Abstract ( 11 )   PDF (1059KB) ( 6 )  
To address fixed-time consensus problems of a class of leader-follower second-order nonlinear multiagent systems with uncertain external disturbances, the event-triggered fixed-time consensus protocol is proposed. First, the virtual velocity is designed based on the backstepping control method to achieve the system consensus and the bound on convergence time only depending on the system parameters. Second, an event-triggered mechanism is presented to solve the problem of frequent communication between agents, and triggered condition based on state information is given for each follower. It is available to save communication resources, and the Zeno behaviors are excluded. Then, the delay and switching topologies of the system are also discussed. Next, the system stabilization is analyzed by Lyapunov stability theory. Finally, simulation results demonstrate the validity of the presented method.
References | Related Articles | Metrics
Leader-Following Consensus of Multi-Agent Systems via Fully Distributed Event-Based Control
GENG Zongsheng1 (耿宗盛), ZHAO Dongdong1,2 (赵东东), ZHOU Xingwen1 (周兴文), YAN Lei1 (闫磊), YAN Shi1,2∗ (阎石)
2024, 29 (4):  640-645.  doi: 10.1007/s12204-024-2718-z
Abstract ( 13 )   PDF (591KB) ( 10 )  
This paper aims to study the leader-following consensus of linear multi-agent systems on undirected graphs. Specifically, we construct an adaptive event-based protocol that can be implemented in a fully distributed way by using only local relative information. This protocol is also resource-friendly as it will be updated only when the agent violates the designed event-triggering function. A sufficient condition is proposed for the leaderfollowing consensus of linear multi-agent systems based on the Lyapunov approach, and the Zeno-behavior is excluded. Finally, two numerical examples are provided to illustrate the effectiveness of the theoretical results.
References | Related Articles | Metrics
Reward Function Design Method for Long Episode Pursuit Tasks Under Polar Coordinate in Multi-Agent Reinforcement Learning
DONG Yubo1 (董玉博), CUI Tao1 (崔涛), ZHOU Yufan1 (周禹帆), SONG Xun2 (宋勋), ZHU Yue2 (祝月), DONG Peng1∗ (董鹏)
2024, 29 (4):  646-655.  doi: 10.1007/s12204-024-2713-4
Abstract ( 14 )   PDF (567KB) ( 4 )  
Multi-agent reinforcement learning has recently been applied to solve pursuit problems. However, it suffers from a large number of time steps per training episode, thus always struggling to converge effectively, resulting in low rewards and an inability for agents to learn strategies. This paper proposes a deep reinforcement learning (DRL) training method that employs an ensemble segmented multi-reward function design approach to address the convergence problem mentioned before. The ensemble reward function combines the advantages of two reward functions, which enhances the training effect of agents in long episode. Then, we eliminate the non-monotonic behavior in reward function introduced by the trigonometric functions in the traditional 2D polar coordinates observation representation. Experimental results demonstrate that this method outperforms the traditional single reward function mechanism in the pursuit scenario by enhancing agents’ policy scores of the task. These ideas offer a solution to the convergence challenges faced by DRL models in long episode pursuit problems, leading to an improved model training performance.
References | Related Articles | Metrics
Distributed Cooperative Anti-Disturbance Control for High-Order MIMO Nonlinear Multi-Agent Systems
JIN Feiyu (金飞宇), CHEN Longsheng (陈龙胜), LI Tongshuai (李统帅), SHI Tongxin (石童昕)
2024, 29 (4):  656-666.  doi: 10.1007/s12204-023-2673-0
Abstract ( 10 )   PDF (680KB) ( 2 )  
To solve the synchronization and tracking problems, a cooperative control scheme is proposed for a class of higher-order multi-input and multi-output (MIMO) nonlinear multi-agent systems (MASs) subjected to uncertainties and external disturbances. First, coupled relationships among Laplace matrix, leader-following adjacency matrix and consensus error are analyzed based on undirected graph. Furthermore, nonlinear disturbance observers (NDOs) are designed to estimate compounded disturbances in MASs, and a distributed cooperative antidisturbance control protocol is proposed for high-order MIMO nonlinear MASs based on the outputs of NDOs and dynamic surface control approach. Finally, the feasibility and effectiveness of the proposed scheme are proven based on Lyapunov stability theory and simulation experiments.
References | Related Articles | Metrics
Multi-Objective Loosely Synchronized Search for Multi-Objective Multi-Agent Path Finding with Asynchronous Actions
DU Haikuo1,2 (杜海阔), GUO Zhengyu3,4(郭正玉), ZHANG Lulu1,2(章露露), CAI Yunze1,2∗ (蔡云泽)
2024, 29 (4):  667-677.  doi: 10.1007/s12204-024-2744-x
Abstract ( 13 )   PDF (1177KB) ( 7 )  
In recent years, the path planning for multi-agent technology has gradually matured, and has made breakthrough progress. The main difficulties in path planning for multi-agent are large state space, long algorithm running time, multiple optimization objectives, and asynchronous action of multiple agents. To solve the above problems, this paper first introduces the main problem of the research: multi-objective multi-agent path finding with asynchronous action, and proposes the algorithm framework of multi-objective loose synchronous (MOLS) search. By combining A∗ and M∗, MO LS-A∗ and MO-LS-M∗ algorithms are respectively proposed. The completeness and optimality of the algorithm are proved, and a series of comparative experiments are designed to analyze the factors affecting the performance of the algorithm, verifying that the proposed MO-LS-M∗ algorithm has certain advantages.
References | Related Articles | Metrics
CBF-Based Distributed Model Predictive Control for Safe Formation of Autonomous Mobile Robots
MU Jianbin (穆建彬), YANG Haili (杨海丽), HE Defeng (何德峰)
2024, 29 (4):  678-688.  doi: 10.1007/s12204-024-2747-7
Abstract ( 10 )   PDF (969KB) ( 2 )  
A distributed model predictive control (DMPC) method based on robust control barrier function (RCBF) is developed to achieve the safe formation target of multi-autonomous mobile robot systems in an uncertain disturbed environment. The first step is to analyze the safety requirements of the system during safe formation and categorize them into collision avoidance and distance connectivitymaintenance. RCBF constraints are designed based on collision avoidance and connectivity maintenance requirements, and security constraints are achieved through a combination. Then, the specified safety constraints are integrated with the objective of forming a multi-autonomous mobile robot formation. To ensure safe control, the optimization problem is integrated with the DMPC method. Finally, the RCBF-DMPC algorithm is proposed to ensure iterative feasibility and stability while meeting the constraints and expected objectives. Simulation experiments illustrate that the designed algorithm can achieve cooperative formation and ensure system security.
References | Related Articles | Metrics
AlgoTime-Varying Formation-Containment Tracking Control for Unmanned Aerial Vehicle Swarm Systems with Switching Topologies and a Non-Cooperative Target
WU Xiaojing(武晓晶), CAO Tongyao (曹童瑶), ZHEN Ran (甄然), LI Zhijie (李志杰)
2024, 29 (4):  689-701.  doi: 10.1007/s12204-024-2728-x
Abstract ( 11 )   PDF (1627KB) ( 2 )  
This paper studies the time-varying formation-containment tracking control problems for unmanned aerial vehicle (UAV) swarm systems with switching topologies and a non-cooperative target, where the UAV swarm systems consist of one tracking-leader, several formation-leaders, and followers. The formation-leaders are required to accomplish a predefined time-varying formation and track the desired trajectory of the tracking-leader, and the states of the followers should converge to the convex hull spanned by those of the formation-leaders. First, a formation-containment tracking protocol is proposed with the neighboring relative information, and the feasibilit condition for formation-containment tracking and the algebraic Riccati equation are given. Then, the stability of the control system with the designed control protocol is proved by constructing a reasonable Lyapunov function. Finally, the simulation examples are applied to verify the effectiveness of the theoretical results. The simulation results show that both the formation tracking error and the containment error are convergent, so the system can complete the formation containment tracking control well. In the actual battlefield, combat UAVs need to chase and attack hostile UAVs, but sometimes when multiple UAVs work together for military interception, formationcontainment tracking control will occur.
References | Related Articles | Metrics
Data Augmentation of Ship Wakes in SAR Images Based on Improved CycleGAN
YAN Congqiang1,2 (鄢丛强), GUO Zhengyun3,4 (郭正玉), CAI Yunze1,2∗∗ (蔡云泽)
2024, 29 (4):  702-711.  doi: 10.1007/s12204-024-2746-8
Abstract ( 10 )   PDF (1418KB) ( 1 )  
The study on ship wakes of synthetic aperture radar (SAR) images holds great importance in detecting ship targets in the ocean. In this study, we focus on the issues of low quantity and insufficient diversity in ship wakes of SAR images, and propose a method of data augmentation of ship wakes in SAR images based on the improved cycle-consistent generative adversarial network (CycleGAN). The improvement measures mainly include two aspects: First, to enhance the quality of the generated images and guarantee a stable training process of the model, the least-squares loss is employed as the adversarial loss function; Second, the decoder of the generator is augmented with the convolutional block attention module (CBAM) to address the issue of missing details in the generated ship wakes of SAR images at the microscopic level. The experiment findings indicate that the improved CycleGAN model generates clearer ship wakes of SAR images, and outperforms the traditional CycleGAN models in both subjective and objective aspects.
References | Related Articles | Metrics
Iterative Model Predictive Control for Automatic Carrier Landing of Carrier-Based Aircrafts Under Complex Surroundings and Constraints
ZHANG Xiaotian1(张啸天), HE Defeng1* (何德峰), LIAO Fei2 (廖飞)
2024, 29 (4):  712-724.  doi: 10.1007/s12204-023-2690-z
Abstract ( 7 )   PDF (1363KB) ( 1 )  
This paper considers the automatic carrier landing problem of carrier-based aircrafts subjected to constraints, deck motion, measurement noises, and unknown disturbances. The iterative model predictive control (MPC) strategy with constraints is proposed for automatic landing control of the aircraft. First, the long shortterm memory (LSTM) neural network is used to calculate the adaptive reference trajectories of the aircraft. Then the Sage-Husa adaptive Kalman filter and the disturbance observer are introduced to design the composite compensator. Second, an iterative optimization algorithm is presented to fast solve the receding horizon optimal control problem of MPC based on the Lagrange’s theory. Moreover, some sufficient conditions are derived to guarantee the stability of the landing system in a closed loop with the MPC. Finally, the simulation results of F/A-18A aircraft show that compared with the conventional MPC, the presented MPC strategy improves the computational efficiency by nearly 56% and satisfies the control performance requirements of carrier landing.
References | Related Articles | Metrics
Multi-AGVs Scheduling with Vehicle Conflict Consideration in Ship Outfitting Items Warehouse
DONG Dejin1,2 (董德金), DONG Shiyin3 (董诗音), ZHANG Lulu1,2 (章露露), CAI Yunze1,2∗ (蔡云泽)
2024, 29 (4):  725-736.  doi: 10.1007/s12204-024-2731-2
Abstract ( 11 )   PDF (1452KB) ( 3 )  
The path planning problem of complex wild environment with multiple elements still poses challenges. This paper designs an algorithm that integrates global and local planning to apply to the wild environmental path planning. The modeling process of wild environment map is designed. Three optimization strategies are designed to improve the A-Star in overcoming the problems of touching the edge of obstacles, redundant nodes and twisting paths. A new weighted cost function is designed to achieve different planning modes. Furthermore, the improved dynamic window approach (DWA) is designed to avoid local optimality and improve time efficiency compare to traditional DWA. For the necessary path re-planning of wild environment, the improved A-Star is integrated with the improved DWA to solve re-planning problem of unknown and moving obstacles in wild environment with multiple elements. The improved fusion algorithm effectively solves problems and consumes less time, and the simulation results verify the effectiveness of improved algorithms above.
References | Related Articles | Metrics