Consensus algorithms for biased labeling in crowdsourcing

详细信息查看全文

作者：Jing Zhang^a ; ^b ; ^{jingzhang.cs@gmail.com} ; Victor S. Sheng^c ; ^d ; Qianmu Li^a ; ^b ; Jian Wu^e ; Xindong Wu^f
关键词：Labeling bias ; Crowdsourcing ; EM algorithm ; Consensus ; Weighted majority voting
刊名：Information Sciences
出版年：2017
出版时间：March 2017
年：2017
卷：382-383
期：Complete
页码：254-273
全文大小：1597 K
卷排序：382

文摘

Although it has become an accepted lay view that when labeling objects through crowdsourcing systems, non-expert annotators often exhibit biases, this argument lacks sufficient evidential observation and systematic empirical study. This paper initially analyzes eight real-world datasets from different domains whose class labels were collected from crowdsourcing systems. Our analyses show that biased labeling is a systematic tendency for binary categorization; in other words, for a large number of annotators, their labeling qualities on the negative class (supposed to be the majority) are significantly greater than are those on the positive class (minority). Therefore, the paper empirically studies the performance of four existing EM-based consensus algorithms, DS, GLAD, RY, and ZenCrowd, on these datasets. Our investigation shows that all of these state-of-the-art algorithms ignore the potential bias characteristics of datasets and perform badly although they model the complexity of the systems. To address the issue of handling biased labeling, the paper further proposes a novel consensus algorithm, namely adaptive weighted majority voting (AWMV), based on the statistical difference between the labeling qualities of the two classes. AWMV utilizes the frequency of positive labels in the multiple noisy label set of each example to obtain a bias rate and then assigns weights derived from the bias rate to negative and positive labels. Comparison results among the five consensus algorithms (AWMV and the four existing) show that the proposed AWMV algorithm has the best overall performance. Finally, this paper notes some potential related topics for future study.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700