基于表征增强多智能体强化学习的焊接流水车间调度与维护联合优化方法
网络出版日期: 2025-12-31
基金资助
国家自然科学基金资助项目(52005099),中央高校基本科研业务费专项(2232025G-14)
Integrated Optimization Method for Scheduling and Maintenance in Welding Flow Shops Based on Representation-Enhanced Multi-Agent Reinforcement Learning
Online published: 2025-12-31
李洪森a, b, c, d, 张朋b, c, d, 王明a, b, c, d, 张洁b, c, d, 相文彬b, c, d, e . 基于表征增强多智能体强化学习的焊接流水车间调度与维护联合优化方法[J]. 上海交通大学学报, 0 : 1 . DOI: 10.16183/j.cnki.jsjtu.2025.227
Aiming at the scheduling problem of welding flow shops considering equipment preventive maintenance, with the optimization objective of minimizing the maximum completion time, and addressing such difficulties as equipment failure shocks, limited buffers, and ambiguous representation caused by high-dimensional state spaces, this paper proposes a joint optimization method for scheduling and maintenance based on representation-enhanced multi-agent reinforcement learning. The problem is decomposed into two sub-problems: processing scheduling and preventive maintenance, and a scheduling-maintenance dual-agent architecture is constructed. Given the strong coupling between these sub-problems in which scheduling influences equipment failure risks and maintenance alters equipment availability, the dual agents leverage the value-decomposition multi-agent actor-critics (VDAC) algorithm to decompose the global value function into their respective local value functions. This allows both agents to naturally incorporate considerations of each other’s sub-problems when optimizing their own local objectives, thereby enabling collaborative problem-solving. Representation enhancement extracts key information from high-dimensional states through an autoencoder, resolving the problems of information redundancy and ambiguous representation in high-dimensional state spaces, allowing the agents to make decisions based on key representational information and improving the performance of joint scheduling and maintenance optimization. Case studies show that compared with other algorithms, the minimized maximum completion time is reduced by an average of 4.13%, and by an average of 13.34% compared with rule-based algorithms.
/
| 〈 |
|
〉 |