上海交通大学学报 ›› 2022, Vol. 56 ›› Issue (9): 1262-1275.doi: 10.16183/j.cnki.jsjtu.2021.215

• 机械与动力工程 • 上一篇    下一篇

面向柔性作业车间动态调度的双系统强化学习方法

刘亚辉1, 申兴旺1, 顾星海1, 彭涛2, 鲍劲松1(), 张丹1   

  1. 1.东华大学 机械工程学院,上海 201620
    2.浙江大学 机械工程学院,杭州 310027
  • 收稿日期:2021-06-22 出版日期:2022-09-28 发布日期:2022-10-09
  • 通讯作者: 鲍劲松 E-mail:bao@dhu.edu.cn
  • 作者简介:刘亚辉(1997-),女,河南省许昌市人,硕士生,从事认知制造、知识图谱、智能调度研究.
  • 基金资助:
    国家重点研发计划(2019YFB1706300)

A Dual-System Reinforcement Learning Method for Flexible Job Shop Dynamic Scheduling

LIU Yahui1, SHEN Xingwang1, GU Xinghai1, PENG Tao2, BAO Jinsong1(), ZHANG Dan1   

  1. 1. School of Mechanical Engineering, Donghua University, Shanghai 201620, China
    2. School of Mechanical Engineering, Zhejiang University, Hangzhou 310027, China
  • Received:2021-06-22 Online:2022-09-28 Published:2022-10-09
  • Contact: BAO Jinsong E-mail:bao@dhu.edu.cn

摘要:

航天结构件生产过程中批产任务与研发任务并存,个性化小批量研发生产任务导致紧急插单现象频发.为了保障任务如期完成,解决柔性作业车间面临的动态调度问题,以最小化设备平均负载和最小化总完工时间为优化目标,提出了感知-认知双系统驱动的双环深度Q网络方法.感知系统基于知识图谱实现对车间知识的表示并生成多维信息矩阵;认知系统将调度过程分别抽象为资源配置智能体和工序排序智能体两个阶段,分别对应两个优化目标,设计了车间状态矩阵对问题和约束进行描述,调度决策中分步骤引入动作指令;最后分别设计奖励函数实现资源配置决策和工序排序决策的评价.经某动力所航天壳体加工的实例验证和算法对比分析,验证了所提方法的优越性.

关键词: 感知-认知双系统, 双环深度Q网络, 动态调度, 知识图谱, 多智能体

Abstract:

In the production process of aerospace structural parts, there coexist batch production tasks and research and development (R&D) tasks. Personalized small-batch R&D and production tasks lead to frequent emergency insertion orders. In order to ensure that the task is completed on schedule and to solve the flexible job shop dynamic scheduling problem, this paper takes minimization of equipment average load and total completion time as optimization goals, and proposes a dual-loop deep Q network (DL-DQN) method driven by a perception-cognition dual system. Based on the knowledge graph, the perception system realizes the representation of workshop knowledge and the generation of multi-dimensional information matrix. The cognitive system abstracts the scheduling process into two stages: resource allocation agent and process sequencing agent, corresponding to two optimization goals respectively. The workshop status matrix is designed to describe the problems and constraints. In scheduling decision, action instructions are introduced step by step. Finally, the reward function is designed to realize the evaluation of resource allocation decision and process sequence decision. Application of the proposed method in the aerospace shell processing of an aerospace institute and comparative analysis of different algorithms verify the superiority of the proposed method.

Key words: perception-cognition dual system, dual-loop deep Q network (DL-DQN), dynamic scheduling, knowledge graph, multi-agent

中图分类号: