参考文献:Aissani, N., Bekrar, A., Trentesaux, D., & Beldjilali, B. (2012). Dynamic scheduling for multi-site companies: A decisional approach based on reinforcement multi-agent learning. Journal of Intelligent Manufacturing, 23(6), 2513–2529. doi:10.1007/s10845-011-0580-y .CrossRef Archimede, B., Letouzey, A., Memon, M. A., & Xu, J. (2013). Towards a distributed multi-agent framework for shared resources scheduling. Journal of Intelligent Manufacturing. doi:10.1007/s10845-013-0748-8 . Berenguer, C., Chu, C., & Grall, A. (1997). Inspection and maintenance planning: An application of semi-Markov decision processes. Journal of Intelligent Manufacturing, 8(5), 467–476. doi:10.1023/A:1018570518804 .CrossRef Busoniu, L., Babuska, R., & De Schutter, B. (2008). A comprehensive survey of multiagent reinforcement learning. IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews, 38(2), 156–172.CrossRef Busoniu, L., De Schutter, B., & Babuska, R. (2005). Learning and coordination in dynamic multiagent systems. Technical report , Delft University of Technolgy, The Netherlands. Cui, L., Kuo, W., Loh, H. T., & Xie, M. (2004). Optimal allocation of minimal & perfect repairs under resource constraints. IEEE Transactions on Reliability, 53(2193), 193–199.CrossRef Das, T. D., Gosavi, A., Mahadevan, S., & Marchalleck, N. (1999). Solving semi-Markov decision problems using average reward reinforcement learning. Management Science, 45, 560–574.CrossRef Edwards, D. J., Holt, G. D., & Harris, F. C. (2000). A model for predicting plant maintenance costs. Construction Management and Economics, 18, 65–75.CrossRef Fan, H., Hu, C., Chen, M., & Zhou, D. (2011). Cooperative predictive maintenance of repairable systems with dependent failure modes and resource constraint. IEEE Transactions on Reliability, 60(1), 144–157.CrossRef Friedrich, H., Rogalla, O., & Dillmann, R. (1998). Integrating skills into multi-agent systems. Journal of Intelligent Manufacturing, 9(2), 119–127. doi:10.1023/A:1008811827890 .CrossRef Gabel, T., & Riedmiller, M. (2007). On a successful application of multi-agent reinforcement learning to operations research benchmarks. In Proceedings of the 2007 IEEE symposium on approximate dynamic programming and reinforcement learning (pp. 68–75), Honolulu. Ganesan, R., Balakrishna, P., & Sherry, L. (2010). Improving quality of prediction in highly dynamic environments using approximate dynamic programming. Quality and Reliability Engineering International, 26(7), 717–732.CrossRef Herrera, I.A., & Hovden, J. (2008). Leading indicators applied to maintenance in the framework of resilience engineering: A conceptual approach. In The 3rd resilience engineering symposium (pp. 28–30), AntibesJuan Les Pins. Karamatsoukis, C. C., & Kyriakidis, E. G. (2010). Optimal maintenance of two stochastically deteriorating machines with an intermediate buffer. European Journal of Operational Research, 207(1), 297–308.CrossRef Kim, J., & Gershwin, S. B. (2005). Integrated quality and quantity modeling of a production line. OR Spectrum, 27(2–3), 287–314. Kuo, Y. (2006). Optimal adaptive control policy for joint machine maintenance and product quality control. European Journal of Operational Research, 171, 97–586.CrossRef Kyriakidis, E. G., & Dimitrakos, T. D. (2006). Optimal preventive maintenance of a production system with an intermediate buffer. European Journal of Operational Research, 168(1), 86–99. Liao, G. (2012). Joint production and maintenance strategy for economic production quantity model with imperfect production processes. Journal of Intelligent Manufacturing. doi:10.1007/s10845-012-0658-1 . Mosley, S. A., Teyner, T., & Uzsoy, R. M. (1998). Maintenance scheduling and staffing policies in a wafer fabrication facility. IEEE Transactions on Semiconductor Manufacturing, 11(2), 316–323.CrossRef Nguyen, D. G., & Murthy, D. N. P. (1981). Optimal preventive maintenance policies for repairable systems. Operations Research, 29, 1181–1194.CrossRef Radhoui, M., Rezg, N., & Chelbi, A. (2010). Joint quality control and preventive maintenance strategy for imperfect production processes. Journal of Intelligent Manufacturing, 21(2), 205–212. doi:10.1007/s10845-008-0198-x .CrossRef Schick, I. C., Gershwin, S. B., & Kim, J. (2005). Quality/quantity modeling and analysis of production lines subject to uncertainty. Final Report, Laboratory for Manufacturing and Productivity, Massachusetts Institute of Technology: Phase I. Van Noortwijk, J. M. (2009). A survey of the application of gamma processes in maintenance. Reliability Engineering and System Safety, 94, 2–21.CrossRef Wang, G., & Mahadevan, S. (1999). Hierarchical optimization of policy-coupled semi-Markov decision processes. In 16th International conference on machine learning (pp. 464–473), San Francisco, CA. Wang, H. (2002). A survey of maintenance policies of deteriorating systems. European Journal of Operational Research, 139, 469–489.CrossRef Zhang, F., & Jardine, A. S. (1998). Optimal maintenance models with minimal repair, periodic overhaul and complete renewal. IIE Transactions, 30(12), 1109–1119.
作者单位:Xiao Wang (1) (2) Hongwei Wang (1) Chao Qi (1)
1. Institute of Systems Engineering/The State Key Laboratory of Education Ministry for Image Processing and Intelligent Control, Huazhong University of Science and Technology, Wuhan, 430074, People’s Republic of China 2. College of Safety Engineering, Shenyang Aerospace University, Shenyang, People’s Republic of China
刊物类别:Business and Economics
刊物主题:Economics Production and Logistics Manufacturing, Machines and Tools Automation and Robotics
出版者:Springer Netherlands
ISSN:1572-8145
文摘
This paper investigates the maintenance problem for a flow line system consisting of two series machines with an intermediate finite buffer in between. Both machines independently deteriorate as they operate, resulting in multiple yield levels. Resource constrained imperfect preventive maintenance actions may bring the machine back to a better state. The problem is modeled as a semi-Markov decision process. A distributed multi-agent reinforcement learning algorithm is proposed to solve the problem and to obtain the control-limit maintenance policy for each machine associated with the observed state represented by yield level and buffer level. An asynchronous updating rule is used in the learning process since the state transitions of both machines are not synchronous. Experimental study is conducted to evaluate the efficiency of the proposed algorithm.