参考文献:1. Auer, P, Cesa-Bianchi, N, Fischer, P (2002) Finite time analysis of the multiarmed bandit problem. Machine Learning 47: pp. 235-256 52" target="_blank" title="It opens in new window">CrossRef 2. de Berg, M., van Kreveld, M., Overmars, M., Schwarzkopf, O.: Computational Geometry. Springer (2000) 3. Drugan, M.M., Nowe, A.: Designing multi-objective multi-armed bandits: a study. In: Proc of International Joint Conference of Neural Networks (IJCNN) (2013) 4. Drugan, M.M., Nowe, A.: Scalarization based pareto optimal set of arms identification algorithms. In: Proc of International Joint Conference of Neural Networks (IJCNN) (2014) 5. Drugan, M.M., Nowe, A., Manderick, B.: Pareto upper confidence bounds algorithms: an empirical study. In: Proc of IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL) (2014) 6. Even-Dar, E, Mannor, S, Mansour, Y (2006) Action elimination and stopping conditions for the multi-armed bandit and reinforcement learning problems. J. of Machine Learning Research 7: pp. 1079-1105 7. Heidrich-Meisner, V., Igel, C.: Hoeffding and bernstein races for selecting policies in evolutionary direct policy search. In: Proc of International Conference on Machine Learning (ICML), p. 51 (2009) 8. Kim, IY, Weck, OL (2006) Adaptive weighted sum method for multiobjective optimization: a new method for pareto front generation. Structural and Multidisciplinary Optimization 31: pp. 105-116 58-005-0557-6" target="_blank" title="It opens in new window">CrossRef 9. Roijers, D., Scharpff, J., Spaan, M., Oliehoek, F., De Weerdt, M., Whiteson, S.: Bounded approximations for linear multi-objective planning under uncertainty. In: ICAPS 2014: Proceedings of the Twenty-Fourth International Conference on Automated Planning and Scheduling (2014) 10. Van Vaerenbergh, K., Rodriguez, A., Gagliolo, M., Vrancx, P., Nowe, A., Stoev, J., Goossens, S., Pinte, G., Symens, W.: Improving wet clutch engagement with reinforcement learning. In: Proc of International Joint Conference of Neural Networks (IJCNN) (2012) 11. Moffaert, K, Drugan, MM, Now茅, A Hypervolume-based multi-objective reinforcement learning. In: Purshouse, RC, Fleming, PJ, Fonseca, CM, Greco, S, Shaw, J eds. (2013) Evolutionary Multi-Criterion Optimization. Springer, Heidelberg, pp. 352-366 CrossRef 12. Vo脽, T, Trautmann, H, Igel, C New uncertainty handling strategies in multi-objective evolutionary optimization. In: Schaefer, R, Cotta, C, Ko艂odziej, J, Rudolph, G eds. (2010) Parallel Problem Solving from Nature, PPSN XI. Springer, Heidelberg, pp. 260-269 13. Yahyaa, S.Q., Drugan, M.M., Manderick, B.: The scalarized multi-objective multi-armed bandit problem: an empirical study of its exploration vs. exploitation tradeoff. In: Proc of International Joint Conference on Neural Networks (IJCNN), pp. 2290鈥?297 (2014)
作者单位:Evolutionary Multi-Criterion Optimization
丛书名:978-3-319-15891-4
刊物类别:Computer Science
刊物主题:Artificial Intelligence and Robotics Computer Communication Networks Software Engineering Data Encryption Database Management Computation by Abstract Devices Algorithm Analysis and Problem Complexity
出版者:Springer Berlin / Heidelberg
ISSN:1611-3349
文摘
Multi-objective multi-armed bandits (MOMAB) is a multi-arm bandit variant that uses stochastic reward vectors. In this paper, we propose three MOMAB algorithms. The first algorithm uses a fixed set of linear scalarization functions to identify the Pareto front. Two topological approaches identify the Pareto front using linear weighted combinations of reward vectors. The weight hyper-rectangle decomposition algorithm explores a convex shape Pareto front by grouping scalarization functions that optimise the same arm into weight hyperrectangles. It is generally acknowledged that linear scalarization is not able to identify all the Pareto front for non-convex shapes. The hierarchical PAC algorithm iteratively decomposes the Pareto front into a set of convex shapes to identify the entire Pareto front. We compare the performance of these algorithms on a bi-objective stochastic environment inspired from a real life control application.