Abstract:Driven by Industry 5.0 and smart manufacturing, the escalating scale of manufacturing orders has exposed significant efficiency bottlenecks in traditional multi-objective job shop scheduling algorithms, particularly regarding excessive computational latency and poor adaptability. To resolve the conflict between optimization performance and practical utility, this paper proposes a graph reinforcement learning framework with integration of dynamic heterogeneous graph modeling, mixture-of-experts network, and adaptive reward correction mechanism, called MOE-GRL-AR. First, a mathematical model is established with makespan and weighted earliness/tardiness as optimization objectives. Second, to address the complexity of the shop floor environment, a dynamic heterogeneous graph is employed to model the shop floor state, and a dual-stage embedding network consisting of a heterogeneous graph neural network and a mixture-of-experts network is designed. Meanwhile, to balance multi-objective optimization and enhance training stability, a reinforcement learning training algorithm based on an adaptive reward correction mechanism is proposed. Finally, through comparisons with metaheuristic algorithms and other reinforcement learning methods, the superiority of the proposed algorithm in terms of solution set quality, convergence, and diversity is validated on multiple synthetic instances of different scales and public benchmark instances. In addition, ablation experiments and sensitivity analysis further demonstrate the effectiveness and stability of the proposed algorithm.