基于表征增强多智能体强化学习的焊接流水车间调度与维护联合优化方法

展开
  • 东华大学  a. 机械工程学院;b. 纺织工业人工智能技术教育部工程研究中心;c. 人工智能研究院;d. 上海工业大数据与智能系统工程技术研究中心;e. 信息科学与技术学院,上海201620
李洪森(2001—),硕士生,从事机器人调度研究
张朋,副教授;Email:zhangp88@dhu.edu.cn.

网络出版日期: 2025-12-31

基金资助

国家自然科学基金资助项目(52005099),中央高校基本科研业务费专项(2232025G-14)

Integrated Optimization Method for Scheduling and Maintenance in Welding Flow Shops Based on Representation-Enhanced Multi-Agent Reinforcement Learning

Expand
  • a. College of Mechanical Engineering;b. Engineering Research Center of Artificial Intelligence for Textile Industry Ministry of Education;c. Institute of Artificial Intelligence;d. Shanghai Engineering Research Center of Industrial Big Data and Intelligent System;e. College of Information Science and Technology, Donghua University,Shanghai 201620,China

Online published: 2025-12-31

摘要

针对带设备预防性维护的焊接流水车间调度问题,以最小化最大完工时间为优化目标,考虑设备故障冲击、有限缓冲区及高维状态空间致表征模糊等难点,提出基于表征增强多智能体强化学习的调度与维护联合优化方法。将问题拆分为加工调度和预防性维护两个子问题,构建调度-维护双智能体架构;由于子问题强耦合性(调度影响设备故障风险,维护改变设备可用状态),双智能体通过价值分解多智能体演员-评论家(value-decomposition multi-agent actor-critics,VDAC)算法将全局价值函数分解为双智能体的局部价值函数,使两者在优化各自局部价值时自然嵌入对对方子问题的考量,从而实现协同求解;表征增强通过自编码器提炼高维状态的关键信息,解决了高维状态空间信息冗余、表征模糊问题,使智能体能基于关键表征信息决策,提升调度与维护联合优化性能。算例验证显示,与其他算法相比,最小化最大完工时间平均减少4.13%,较规则算法平均减少13.34%。

本文引用格式

李洪森a, b, c, d, 张朋b, c, d, 王明a, b, c, d, 张洁b, c, d, 相文彬b, c, d, e . 基于表征增强多智能体强化学习的焊接流水车间调度与维护联合优化方法[J]. 上海交通大学学报, 0 : 1 . DOI: 10.16183/j.cnki.jsjtu.2025.227

Abstract

Aiming at the scheduling problem of welding flow shops considering equipment preventive maintenance, with the optimization objective of minimizing the maximum completion time, and addressing such difficulties as equipment failure shocks, limited buffers, and ambiguous representation caused by high-dimensional state spaces, this paper proposes a joint optimization method for scheduling and maintenance based on representation-enhanced multi-agent reinforcement learning. The problem is decomposed into two sub-problems: processing scheduling and preventive maintenance, and a scheduling-maintenance dual-agent architecture is constructed. Given the strong coupling between these sub-problems in which scheduling influences equipment failure risks and maintenance alters equipment availability, the dual agents leverage the value-decomposition multi-agent actor-critics (VDAC) algorithm to decompose the global value function into their respective local value functions. This allows both agents to naturally incorporate considerations of each other’s sub-problems when optimizing their own local objectives, thereby enabling collaborative problem-solving. Representation enhancement extracts key information from high-dimensional states through an autoencoder, resolving the problems of information redundancy and ambiguous representation in high-dimensional state spaces, allowing the agents to make decisions based on key representational information and improving the performance of joint scheduling and maintenance optimization. Case studies show that compared with other algorithms, the minimized maximum completion time is reduced by an average of 4.13%, and by an average of 13.34% compared with rule-based algorithms.

文章导航

/