Linear Scalarization for Pareto Front Identification in Stochastic Environments
详细信息    查看全文
  • 作者:Madalina M. Drugan (16)

    16. Artificial Intelligence Lab
    ; Vrije Universiteit Brussels ; Pleinlaan 2 ; 1050 ; Brussels ; Belgium
  • 关键词:Multi ; objective multi ; armed bandits ; Scalarization functions ; Pareto front identification ; Topological decomposition
  • 刊名:Lecture Notes in Computer Science
  • 出版年:2015
  • 出版时间:2015
  • 年:2015
  • 卷:9019
  • 期:1
  • 页码:156-171
  • 全文大小:385 KB
  • 参考文献:1. Auer, P, Cesa-Bianchi, N, Fischer, P (2002) Finite time analysis of the multiarmed bandit problem. Machine Learning 47: pp. 235-256 52" target="_blank" title="It opens in new window">CrossRef
    2. de Berg, M., van Kreveld, M., Overmars, M., Schwarzkopf, O.: Computational Geometry. Springer (2000)
    3. Drugan, M.M., Nowe, A.: Designing multi-objective multi-armed bandits: a study. In: Proc of International Joint Conference of Neural Networks (IJCNN) (2013)
    4. Drugan, M.M., Nowe, A.: Scalarization based pareto optimal set of arms identification algorithms. In: Proc of International Joint Conference of Neural Networks (IJCNN) (2014)
    5. Drugan, M.M., Nowe, A., Manderick, B.: Pareto upper confidence bounds algorithms: an empirical study. In: Proc of IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL) (2014)
    6. Even-Dar, E, Mannor, S, Mansour, Y (2006) Action elimination and stopping conditions for the multi-armed bandit and reinforcement learning problems. J. of Machine Learning Research 7: pp. 1079-1105
    7. Heidrich-Meisner, V., Igel, C.: Hoeffding and bernstein races for selecting policies in evolutionary direct policy search. In: Proc of International Conference on Machine Learning (ICML), p. 51 (2009)
    8. Kim, IY, Weck, OL (2006) Adaptive weighted sum method for multiobjective optimization: a new method for pareto front generation. Structural and Multidisciplinary Optimization 31: pp. 105-116 58-005-0557-6" target="_blank" title="It opens in new window">CrossRef
    9. Roijers, D., Scharpff, J., Spaan, M., Oliehoek, F., De Weerdt, M., Whiteson, S.: Bounded approximations for linear multi-objective planning under uncertainty. In: ICAPS 2014: Proceedings of the Twenty-Fourth International Conference on Automated Planning and Scheduling (2014)
    10. Van Vaerenbergh, K., Rodriguez, A., Gagliolo, M., Vrancx, P., Nowe, A., Stoev, J., Goossens, S., Pinte, G., Symens, W.: Improving wet clutch engagement with reinforcement learning. In: Proc of International Joint Conference of Neural Networks (IJCNN) (2012)
    11. Moffaert, K, Drugan, MM, Now茅, A Hypervolume-based multi-objective reinforcement learning. In: Purshouse, RC, Fleming, PJ, Fonseca, CM, Greco, S, Shaw, J eds. (2013) Evolutionary Multi-Criterion Optimization. Springer, Heidelberg, pp. 352-366 CrossRef
    12. Vo脽, T, Trautmann, H, Igel, C New uncertainty handling strategies in multi-objective evolutionary optimization. In: Schaefer, R, Cotta, C, Ko艂odziej, J, Rudolph, G eds. (2010) Parallel Problem Solving from Nature, PPSN XI. Springer, Heidelberg, pp. 260-269
    13. Yahyaa, S.Q., Drugan, M.M., Manderick, B.: The scalarized multi-objective multi-armed bandit problem: an empirical study of its exploration vs. exploitation tradeoff. In: Proc of International Joint Conference on Neural Networks (IJCNN), pp. 2290鈥?297 (2014)
  • 作者单位:Evolutionary Multi-Criterion Optimization
  • 丛书名:978-3-319-15891-4
  • 刊物类别:Computer Science
  • 刊物主题:Artificial Intelligence and Robotics
    Computer Communication Networks
    Software Engineering
    Data Encryption
    Database Management
    Computation by Abstract Devices
    Algorithm Analysis and Problem Complexity
  • 出版者:Springer Berlin / Heidelberg
  • ISSN:1611-3349
文摘
Multi-objective multi-armed bandits (MOMAB) is a multi-arm bandit variant that uses stochastic reward vectors. In this paper, we propose three MOMAB algorithms. The first algorithm uses a fixed set of linear scalarization functions to identify the Pareto front. Two topological approaches identify the Pareto front using linear weighted combinations of reward vectors. The weight hyper-rectangle decomposition algorithm explores a convex shape Pareto front by grouping scalarization functions that optimise the same arm into weight hyperrectangles. It is generally acknowledged that linear scalarization is not able to identify all the Pareto front for non-convex shapes. The hierarchical PAC algorithm iteratively decomposes the Pareto front into a set of convex shapes to identify the entire Pareto front. We compare the performance of these algorithms on a bi-objective stochastic environment inspired from a real life control application.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700