The K-armed dueling bandits problem

详细信息查看全文

作者：Yisong Yue^a ; ^{yisongyue@cmu.edu} ; Josef Broder^b ; ^{jbroder@cam.cornell.edu} ; Robert Kleinberg^c ; ^{rdk@cs.cornell.edu} ; Thorsten Joachims^c ; ^{tj@cs.cornell.edu}
关键词：Online learning ; Multi-armed bandits ; Preference elicitation
刊名：Journal of Computer and System Sciences
出版年：2012
出版时间：September, 2012
年：2012
卷：78
期：5
页码：1538-1556
全文大小：330 K

文摘

We study a partial-information online-learning problem where actions are restricted to noisy comparisons between pairs of strategies (also known as bandits). In contrast to conventional approaches that require the absolute reward of the chosen strategy to be quantifiable and observable, our setting assumes only that (noisy) binary feedback about the relative reward of two chosen strategies is available. This type of relative feedback is particularly appropriate in applications where absolute rewards have no natural scale or are difficult to measure (e.g., user-perceived quality of a set of retrieval results, taste of food, product attractiveness), but where pairwise comparisons are easy to make. We propose a novel regret formulation in this setting, as well as present an algorithm that achieves information-theoretically optimal regret bounds (up to a constant factor).

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700