Multi-agent reinforcement learning based maintenance policy for a resource constrained flow line system

详细信息查看全文

作者：Xiao Wang ; Hongwei Wang ; Chao Qi
关键词：Multiple yield deterioration ; Semi ; Markov decision process ; Constrained resource ; Multi ; agent reinforcement learning ; Two ; machine flow line
刊名：Journal of Intelligent Manufacturing
出版年：2016
出版时间：April 2016
年：2016
卷：27
期：2
页码：325-333
全文大小：737 KB
参考文献：Aissani, N., Bekrar, A., Trentesaux, D., & Beldjilali, B. (2012). Dynamic scheduling for multi-site companies: A decisional approach based on reinforcement multi-agent learning. Journal of Intelligent Manufacturing, 23(6), 2513–2529. doi:10.1007/s10845-011-0580-y .CrossRef
Archimede, B., Letouzey, A., Memon, M. A., & Xu, J. (2013). Towards a distributed multi-agent framework for shared resources scheduling. Journal of Intelligent Manufacturing. doi:10.1007/s10845-013-0748-8 .
Berenguer, C., Chu, C., & Grall, A. (1997). Inspection and maintenance planning: An application of semi-Markov decision processes. Journal of Intelligent Manufacturing, 8(5), 467–476. doi:10.1023/A:1018570518804 .CrossRef
Busoniu, L., Babuska, R., & De Schutter, B. (2008). A comprehensive survey of multiagent reinforcement learning. IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews, 38(2), 156–172.CrossRef
Busoniu, L., De Schutter, B., & Babuska, R. (2005). Learning and coordination in dynamic multiagent systems. Technical report , Delft University of Technolgy, The Netherlands.
Cui, L., Kuo, W., Loh, H. T., & Xie, M. (2004). Optimal allocation of minimal & perfect repairs under resource constraints. IEEE Transactions on Reliability, 53(2193), 193–199.CrossRef
Das, T. D., Gosavi, A., Mahadevan, S., & Marchalleck, N. (1999). Solving semi-Markov decision problems using average reward reinforcement learning. Management Science, 45, 560–574.CrossRef
Edwards, D. J., Holt, G. D., & Harris, F. C. (2000). A model for predicting plant maintenance costs. Construction Management and Economics, 18, 65–75.CrossRef
Fan, H., Hu, C., Chen, M., & Zhou, D. (2011). Cooperative predictive maintenance of repairable systems with dependent failure modes and resource constraint. IEEE Transactions on Reliability, 60(1), 144–157.CrossRef
Friedrich, H., Rogalla, O., & Dillmann, R. (1998). Integrating skills into multi-agent systems. Journal of Intelligent Manufacturing, 9(2), 119–127. doi:10.1023/A:1008811827890 .CrossRef
Gabel, T., & Riedmiller, M. (2007). On a successful application of multi-agent reinforcement learning to operations research benchmarks. In Proceedings of the 2007 IEEE symposium on approximate dynamic programming and reinforcement learning (pp. 68–75), Honolulu.
Ganesan, R., Balakrishna, P., & Sherry, L. (2010). Improving quality of prediction in highly dynamic environments using approximate dynamic programming. Quality and Reliability Engineering International, 26(7), 717–732.CrossRef
Herrera, I.A., & Hovden, J. (2008). Leading indicators applied to maintenance in the framework of resilience engineering: A conceptual approach. In The 3rd resilience engineering symposium (pp. 28–30), AntibesJuan Les Pins.
Karamatsoukis, C. C., & Kyriakidis, E. G. (2010). Optimal maintenance of two stochastically deteriorating machines with an intermediate buffer. European Journal of Operational Research, 207(1), 297–308.CrossRef
Kim, J., & Gershwin, S. B. (2005). Integrated quality and quantity modeling of a production line. OR Spectrum, 27(2–3), 287–314.
Kuo, Y. (2006). Optimal adaptive control policy for joint machine maintenance and product quality control. European Journal of Operational Research, 171, 97–586.CrossRef
Kyriakidis, E. G., & Dimitrakos, T. D. (2006). Optimal preventive maintenance of a production system with an intermediate buffer. European Journal of Operational Research, 168(1), 86–99.
Liao, G. (2012). Joint production and maintenance strategy for economic production quantity model with imperfect production processes. Journal of Intelligent Manufacturing. doi:10.1007/s10845-012-0658-1 .
Mosley, S. A., Teyner, T., & Uzsoy, R. M. (1998). Maintenance scheduling and staffing policies in a wafer fabrication facility. IEEE Transactions on Semiconductor Manufacturing, 11(2), 316–323.CrossRef
Nguyen, D. G., & Murthy, D. N. P. (1981). Optimal preventive maintenance policies for repairable systems. Operations Research, 29, 1181–1194.CrossRef
Radhoui, M., Rezg, N., & Chelbi, A. (2010). Joint quality control and preventive maintenance strategy for imperfect production processes. Journal of Intelligent Manufacturing, 21(2), 205–212. doi:10.1007/s10845-008-0198-x .CrossRef
Schick, I. C., Gershwin, S. B., & Kim, J. (2005). Quality/quantity modeling and analysis of production lines subject to uncertainty. Final Report, Laboratory for Manufacturing and Productivity, Massachusetts Institute of Technology: Phase I.
Van Noortwijk, J. M. (2009). A survey of the application of gamma processes in maintenance. Reliability Engineering and System Safety, 94, 2–21.CrossRef
Wang, G., & Mahadevan, S. (1999). Hierarchical optimization of policy-coupled semi-Markov decision processes. In 16th International conference on machine learning (pp. 464–473), San Francisco, CA.
Wang, H. (2002). A survey of maintenance policies of deteriorating systems. European Journal of Operational Research, 139, 469–489.CrossRef
Zhang, F., & Jardine, A. S. (1998). Optimal maintenance models with minimal repair, periodic overhaul and complete renewal. IIE Transactions, 30(12), 1109–1119.
作者单位：Xiao Wang (1) (2)
Hongwei Wang (1)
Chao Qi (1)

1. Institute of Systems Engineering/The State Key Laboratory of Education Ministry for Image Processing and Intelligent Control, Huazhong University of Science and Technology, Wuhan, 430074, People’s Republic of China
2. College of Safety Engineering, Shenyang Aerospace University, Shenyang, People’s Republic of China
刊物类别：Business and Economics
刊物主题：Economics
Production and Logistics
Manufacturing, Machines and Tools
Automation and Robotics
出版者：Springer Netherlands
ISSN：1572-8145

文摘

This paper investigates the maintenance problem for a flow line system consisting of two series machines with an intermediate finite buffer in between. Both machines independently deteriorate as they operate, resulting in multiple yield levels. Resource constrained imperfect preventive maintenance actions may bring the machine back to a better state. The problem is modeled as a semi-Markov decision process. A distributed multi-agent reinforcement learning algorithm is proposed to solve the problem and to obtain the control-limit maintenance policy for each machine associated with the observed state represented by yield level and buffer level. An asynchronous updating rule is used in the learning process since the state transitions of both machines are not synchronous. Experimental study is conducted to evaluate the efficiency of the proposed algorithm.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700