混合专家图强化学习驱动的多目标作业车间调度优化
DOI:
作者:
作者单位:

1.华南理工大学;2.福建福耀科技大学 智造与未来技术学院

作者简介:

通讯作者:

中图分类号:

TH16; TH18; TP18

基金项目:

广东省基础和应用基础研究基金(2024A1515011048, 2025A1515010139);


Multi-objective Job Shop Scheduling Optimization Driven by Mixture-of-Experts Graph Reinforcement Learning
Author:
Affiliation:

1.South China University of Technology;2.Fuyao University of Science and Technology

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    在工业5.0与智能制造驱动下,制造业订单规模持续扩大,但多目标作业车间调度环节仍存在传统算法耗时久、适应性差等效率瓶颈。为解决多目标作业车间调度问题求解中的性能与实用性矛盾,提出了一种融合动态异构图建模、混合专家网络与自适应奖励修正机制的图强化学习求解框架GRL-MOE-AR。首先建立以最大完工时间和加权提前/拖期时间为优化目标的数学模型;其次,针对复杂车间环境,采用动态异构图对车间状态进行建模,并设计由异构图神经网络与混合专家网络构成的双阶段嵌入网络。同时,为平衡多目标优化并提升训练稳定性,提出基于自适应奖励修正机制的强化学习训练算法。最后,通过与元启发式算法和其它强化学习算法比较,在多个不同规模的合成实例与公共基准实例中,验证了所提算法在解集质量、收敛性与多样性方面的优越性。此外,消融实验与灵敏度分析进一步验证了该算法的有效性与稳定性。

    Abstract:

    Driven by Industry 5.0 and smart manufacturing, the escalating scale of manufacturing orders has exposed significant efficiency bottlenecks in traditional multi-objective job shop scheduling algorithms, particularly regarding excessive computational latency and poor adaptability. To resolve the conflict between optimization performance and practical utility, this paper proposes a graph reinforcement learning framework with integration of dynamic heterogeneous graph modeling, mixture-of-experts network, and adaptive reward correction mechanism, called MOE-GRL-AR. First, a mathematical model is established with makespan and weighted earliness/tardiness as optimization objectives. Second, to address the complexity of the shop floor environment, a dynamic heterogeneous graph is employed to model the shop floor state, and a dual-stage embedding network consisting of a heterogeneous graph neural network and a mixture-of-experts network is designed. Meanwhile, to balance multi-objective optimization and enhance training stability, a reinforcement learning training algorithm based on an adaptive reward correction mechanism is proposed. Finally, through comparisons with metaheuristic algorithms and other reinforcement learning methods, the superiority of the proposed algorithm in terms of solution set quality, convergence, and diversity is validated on multiple synthetic instances of different scales and public benchmark instances. In addition, ablation experiments and sensitivity analysis further demonstrate the effectiveness and stability of the proposed algorithm.

    参考文献
    相似文献
    引证文献
引用本文

晏惠峰,姚锡凡. 混合专家图强化学习驱动的多目标作业车间调度优化[J]. 科学技术与工程, , ():

复制
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2026-02-01
  • 最后修改日期:2026-04-21
  • 录用日期:2026-05-10
  • 在线发布日期:
  • 出版日期:
×
2026年会通知 | “技术经济学驱动智能经济生态构建与治理变革”——中国技术经济学会第三十三届学术年会(2026)会议通知暨征文启事(第一轮)
亟待确认版面费归属稿件,敬请作者关注