Linear Scalarization for Pareto Front Identification in Stochastic Environments

详细信息查看全文

作者：Madalina M. Drugan (16)

16. Artificial Intelligence Lab ; Vrije Universiteit Brussels ; Pleinlaan 2 ; 1050 ; Brussels ; Belgium
关键词：Multi ; objective multi ; armed bandits ; Scalarization functions ; Pareto front identification ; Topological decomposition
刊名：Lecture Notes in Computer Science
出版年：2015
出版时间：2015
年：2015
卷：9019
期：1
页码：156-171
全文大小：385 KB
参考文献：1. Auer, P, Cesa-Bianchi, N, Fischer, P (2002) Finite time analysis of the multiarmed bandit problem. Machine Learning 47: pp. 235-256 52" target="_blank" title="It opens in new window">CrossRef
2. de Berg, M., van Kreveld, M., Overmars, M., Schwarzkopf, O.: Computational Geometry. Springer (2000)
3. Drugan, M.M., Nowe, A.: Designing multi-objective multi-armed bandits: a study. In: Proc of International Joint Conference of Neural Networks (IJCNN) (2013)
4. Drugan, M.M., Nowe, A.: Scalarization based pareto optimal set of arms identification algorithms. In: Proc of International Joint Conference of Neural Networks (IJCNN) (2014)
5. Drugan, M.M., Nowe, A., Manderick, B.: Pareto upper confidence bounds algorithms: an empirical study. In: Proc of IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL) (2014)
6. Even-Dar, E, Mannor, S, Mansour, Y (2006) Action elimination and stopping conditions for the multi-armed bandit and reinforcement learning problems. J. of Machine Learning Research 7: pp. 1079-1105
7. Heidrich-Meisner, V., Igel, C.: Hoeffding and bernstein races for selecting policies in evolutionary direct policy search. In: Proc of International Conference on Machine Learning (ICML), p. 51 (2009)
8. Kim, IY, Weck, OL (2006) Adaptive weighted sum method for multiobjective optimization: a new method for pareto front generation. Structural and Multidisciplinary Optimization 31: pp. 105-116 58-005-0557-6" target="_blank" title="It opens in new window">CrossRef
9. Roijers, D., Scharpff, J., Spaan, M., Oliehoek, F., De Weerdt, M., Whiteson, S.: Bounded approximations for linear multi-objective planning under uncertainty. In: ICAPS 2014: Proceedings of the Twenty-Fourth International Conference on Automated Planning and Scheduling (2014)
10. Van Vaerenbergh, K., Rodriguez, A., Gagliolo, M., Vrancx, P., Nowe, A., Stoev, J., Goossens, S., Pinte, G., Symens, W.: Improving wet clutch engagement with reinforcement learning. In: Proc of International Joint Conference of Neural Networks (IJCNN) (2012)
11. Moffaert, K, Drugan, MM, Now茅, A Hypervolume-based multi-objective reinforcement learning. In: Purshouse, RC, Fleming, PJ, Fonseca, CM, Greco, S, Shaw, J eds. (2013) Evolutionary Multi-Criterion Optimization. Springer, Heidelberg, pp. 352-366 CrossRef
12. Vo脽, T, Trautmann, H, Igel, C New uncertainty handling strategies in multi-objective evolutionary optimization. In: Schaefer, R, Cotta, C, Ko艂odziej, J, Rudolph, G eds. (2010) Parallel Problem Solving from Nature, PPSN XI. Springer, Heidelberg, pp. 260-269
13. Yahyaa, S.Q., Drugan, M.M., Manderick, B.: The scalarized multi-objective multi-armed bandit problem: an empirical study of its exploration vs. exploitation tradeoff. In: Proc of International Joint Conference on Neural Networks (IJCNN), pp. 2290鈥?297 (2014)
作者单位：Evolutionary Multi-Criterion Optimization
丛书名：978-3-319-15891-4
刊物类别：Computer Science
刊物主题：Artificial Intelligence and Robotics
Computer Communication Networks
Software Engineering
Data Encryption
Database Management
Computation by Abstract Devices
Algorithm Analysis and Problem Complexity
出版者：Springer Berlin / Heidelberg
ISSN：1611-3349

文摘

Multi-objective multi-armed bandits (MOMAB) is a multi-arm bandit variant that uses stochastic reward vectors. In this paper, we propose three MOMAB algorithms. The first algorithm uses a fixed set of linear scalarization functions to identify the Pareto front. Two topological approaches identify the Pareto front using linear weighted combinations of reward vectors. The weight hyper-rectangle decomposition algorithm explores a convex shape Pareto front by grouping scalarization functions that optimise the same arm into weight hyperrectangles. It is generally acknowledged that linear scalarization is not able to identify all the Pareto front for non-convex shapes. The hierarchical PAC algorithm iteratively decomposes the Pareto front into a set of convex shapes to identify the entire Pareto front. We compare the performance of these algorithms on a bi-objective stochastic environment inspired from a real life control application.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700