摘要
Supervised learning techniques construct predictive models by learning from a large number of training examples, where each training example has a label indicating its ground-truth output. Though current techniques have achieved great success; it is noteworthy that in many tasks it is difficult to get strong supervision information like fully ground-truth labels due to the high cost of the data-labeling process. Thus,it is desirable for machine-learning techniques to work with weak supervision. This article reviews some research progress of weakly supervised learning, focusing on three typical types of weak supervision:incomplete supervision, where only a subset of training data is given with labels; inexact supervision, where the training data are given with only coarse-grained labels; and inaccurate supervision, where the given labels are not always ground-truth.
Supervised learning techniques construct predictive models by learning from a large number of training examples, where each training example has a label indicating its ground-truth output. Though current techniques have achieved great success; it is noteworthy that in many tasks it is difficult to get strong supervision information like fully ground-truth labels due to the high cost of the data-labeling process. Thus,it is desirable for machine-learning techniques to work with weak supervision. This article reviews some research progress of weakly supervised learning, focusing on three typical types of weak supervision:incomplete supervision, where only a subset of training data is given with labels; inexact supervision, where the training data are given with only coarse-grained labels; and inaccurate supervision, where the given labels are not always ground-truth.
引文
1.Goodfellow I,Bengio Y and Courville A.Deep Learning.Cambridge:MIT Press,2016.
2.Settles B.Active learning literature survey.Technical Report 1648.Department of Computer Sciences,University of Wisconsin at Madison,Wisconsin,WI,2010[http://pages.cs.wisc.edu/~bsettles/pub/settles.act ivelearning.pdf].
3.Chapelle 0,Scholkopf B and Zien A(eds).Semi-Supervised Learning.Cambridge:MIT Press,2006.
4.Zhu X.Semi-supervised learning literature survey.Technical Report 1530.Department of Computer Sciences,University of Wisconsin at Madison,Madison,WI,2008[http://www.cs.wisc.edu/~jerryzhu/pub/ssl'survey.pdf].
5.Zhou Z-H and Li M.Semi-supervised learning by disagreement.Knowl Inform Syst2010;24:415-39.
6.Huang SJ,Jin R and Zhou ZH.Active learning by querying informative and representative examples.IEEE Trans Pattern Anal Mach Intell 2014;36:1936-49.
7.Lewis D and Gale W.A sequential algorithm for training text classifiers.In 17th Annual International ACM SIGIRConference on Research and Development in Information Retrieval,Dublin,Ireland,1994:3-12.
8.Seung H,Opper M and Sompolinsky H.Query by committee.In5th ACM Workshop on Computational Learning Theory,Pittsburgh,PA,1992;287-94.
9.Abe N and Mamitsuka H.Query learning strategies using boosting and bagging.In 15th International Conference on Machine Learning,Madison.WI,1998;1-9.
10.Nguyen HT and Smeulders AWM.Active learning using preclustering.In 21st International Conference on Machine Learning,Banff,Canada,2004;623-30.
11.Dasgupta S and Hsu D.Hierarchical sampling for active learning.In 25th International Conference on Machine Learning,Helsinki,Finland,2008;208-15.
12.Wang Z and Ye J.Querying discriminative and representative samples for batch mode active learning.In 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining,Chicago,IL,2013;158-66.
13.Dasgupta S,Kalai AT and Monteleoni C.Analysis of perceptron-based active learning.In 28th Conference on Learning Theory,Paris,France,2005;249-63.
14.Dasgupta S.Analysis of a greedy active learning strategy.In Advances in Neural Information Processing Systems 17,Cambridge,MA:MIT Press,2005;337-44.
15.Kaariainen M.Active learning in the non-realizable case.In21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics,Sydney,Australia.2006;63-77.
16.Balcan MF,BroderAZ and Zhang T.Margin based active learning.In 20th Annual Conference on Learning Theory,San Diego,CA.2007;35-50.
17.Hanneke S.Adaptive rates of convergence in active learning.In22nd Conference on Learning Theory,Montreal,Canada,2009.
18.Wang W and Zhou ZH.Multi-view active learning in the non-realizable case.In Advances in Neural Information Processing Systems 23,Cambridge,MA:MIT Press,2010;2388-96.
19.Miller DJ and Uyar HS.A mixture of experts classifier with learning based on both labelled and unlabelled data.In Advances in Neural Information Processing Systems 9,Cambridge,MA:MIT Press,1997;571-7.
20.Nigam K,McCallum AK and Thrun S et al.Text classification from labeled and unlabeled documents using EM.Mach Learn2000;39:103-34.
21.Dempster AP,Laird NM and Rubin DB.Maximum likelihood from incomplete data via the EM algorithm.J Roy Stat Soc B Stat Meth 1977;39:1-38.
22.Fujino A,Ueda N and Saito K.A hybrid generative/discriminative approach to semi-supervised classifier design.In 20th National Conference on Artificial Intelligence,Pittsburgh,PA,2005;764-9.
23.Blum A and Chawla S.Learning from labeled and unlabeled data using graph mincuts.In ICML,2001;19-26.
24.Zhu X,Ghahramani Z and Lafferty J.Semi-supervised learning using Gaussian fields and harmonic functions.In 20th International Conference on Machine Learning,Washington,DC,2003;912-9.
25.Zhou D,Bousquet 0 and Lal TN et al.Learning with local and global consistency.In Advances in Neural Information Processing Systems 16,Cambridge,MA:MIT Press,2004;321-8.
26.Carreira-Perpinan MAandZemel RS.Proximity graphs for clustering and manifold learning.In Advances in Neural Information Processing Systems 17,Cambridge,MA:MIT Press,2005;225-32.
27.Wang F and Zhang C.Label propagation through linear neighborhoods.In 23rd International Conference on Machine Learning,Pittsburgh,PA,2006;985-92.
28.Hein M and Maier M.Manifold denoising.In Advances in Neural Information Processing Systems 19,Cambridge,MA:MIT Press,2007;pp.561-8.
29.Joachims T.Transductive inference for text classification using support vector machines.In 16th International Conference on Machine Learning,Bled,Slovenia,1999;200-9.
30.Chapelle 0 and Zien A.Semi-supervised learning by low density separation.In 10th International Workshop on Artificial Intelligence and Statistics,Barbados,2005;57-64.
31.Li YF,Tsang IW and KwokJT et al.Convex and scalable weakly labeled SVMs.J Mach Learn Res 2013;14:2151-88.
32.Blum A and Mitchell T.Combining labeled and unlabeled data with cotraining.In 11th Conference on Computational Learning Theory,Madison.WI,1998;92-100.
33.Zhou Z-H and Li M.Tri-training:exploiting unlabeled data using three classifiers.IEEE Trans Knowl Data Eng 2005;17:1529-41.
34.Zhou Z-H.When semi-supervised learning meets ensemble learning.In 8th International Workshop on Multiple Classifier Systems,Reykjavik,Iceland,2009:529-38.
35.Zhou Z-H.Ensemble Methods:Foundations and Algorithms.Boca Raton:CRC Press,2012.
36.Cozman FG and Cohen I.Unlabeled data can degrade classification performance of generative classifiers.In 15th International Conference of the Florida Artificial Intelligence Research Society,Pensacola,FL,2002;327-31.
37.Li YF and Zhou ZH.Towards making unlabeled data never hurt.IEEE Trans Pattern Anal Mach Intell 2015:37:175-88.
38.Castelli V and CoverTM.On the exponential value of labeled samples.Pattern Recogn Lett 1995;16:105-11.
39.Wang W and Zhou ZH.Theoretical foundation of co-training and disagreement-based algorithms.arXiv:1708.04403,2017.
40.Dietterich TG,Lathrop RH and Lozano-Pérez T.Solving the multiple-instance problem with axis-parallel rectangles.Artif Intell 1997;89:31-71.
41.Foulds J and Frank E.A review of multi-instance learning assumptions.Knowl Eng Rev 2010;25:1-25.
42.Zhou Z-H.Multi-instance learning from supervised view.J Comput Sci Technol2006;21:800-9.
43.Zhou Z-H and Zhang M-L.Solving multi-instance problems with classifier ensemble based on constructive clustering.Knowl Inform Syst2007;11:155-70.
44.Wei X-S,Wu J and Zhou Z-H Scalable algorithms for multi-instance learning.IEEE Trans Neural Network Learn Syst2017;28:975-87.
45.Amores J.Multiple instance classification:review,taxonomy and comparative study.Artif Intell 2013;201:81-105.
46.Zhou Z-H and Xu J-M.On the relation between multi-instance learning and semi-supervised Iearning.In 24th International Conference on Machine Learning,Corvallis,OR.2007;1167-74.
47.Zhou Z-H,Sun Y-Y and Li Y-F.Multi-instance learning by treating instances as non-i.i.d.samples.In 26th International Conference on Machine Learning,Montreal,Canada,2009;1249-56.
48.Chen Y and Wang JZ.Image categorization by learning and reasoning with regions.J Mach Learn Res2004;5:913-39.
49.Zhang Q,Yu W and Goldman SA et al.Content-based image retrieval using multiple-instance learning.In 19th International Conference on Machine Learning,Sydney,Australia,2002;682-9.
50.Tang JH,Li HJ and Qi GJ et al.Image annotation by graph-based inference with integrated multiple/single instance representations.IEEE Trans Multimed 2010;12:131-41.
51.Andrews S,Tsochantaridis I and Hofmann T.Support vector machines for multiple-instance learning.In Advances in Neural Information Processing Systems 75.Cambridge,MA:MIT Press,2003;561-8.
52.Settles B,Craven M and Ray S.Multiple-instance active learning.In Advances in Neural Information Processing Systems 20,Cambridge,MA:MIT Press,2008;1289-96.
53.Jorgensen Z,Zhou Y and Inge M.A multiple instance learning strategy for combating good word attacks on spam filters.JMach Learn Res 2008;8:993-1019.
54.Fung G,Dundar M and Krishnappuram B et al.Multiple instance learning for computer aided diagnosis.In Advances in Neural Information Processing Systems 19.Cambridge,MA:MIT Press,2007;425-32.
55.Viola P,Platt J and Zhang C.Multiple instance boosting for object detection.In Advances in Neural Information Processing Systems 18,Cambridge,MA:MIT Press,2006;1419-26.
56.Felzenszwalb PF,Girshick RB and McAllester D et al.Object detection with discriminatively trained part-based models.IEEE Trans Pattern Anal Mach Intell 2010:32:1627-45.
57.Zhu J-Y,Wu J andXuY etal.Unsupervised object class discovery via saliencyguided multiple class learning.IEEE Trans Pattern Anal Mach Intell 2015;37:862-75.
58.Babenko B,Yang MH and Belongie S.Robust object tracking with online multiple instance learning.IEEE Trans Pattern Anal Mach Intell 2011;33:1619-32.
59.Wei X-S and Zhou Z-H.An empirical study on image bag generators for multiinstance learning.Mach Learn 2016;105:155-98.
60.Liu G,Wu J and Zhou ZH.Key instance detection in multi-instance learning.In 4th Asian Conference on Machine Learning,Singapore,2012;253-68.
61.Xu X and Frank E.Logistic regression and boosting for labeled bags of instances.In 8th Pacific-Asia Conference on Knowledge Discovery and Data Mining,Sydney,Australia,2004;272-81.
62.Chen Y,Bi J and Wang JZ.MILES:multiple-instance learning via embedded instance selection.IEEE Trans Pattern Anal Mach Intell 2006;28:1931-47.
63.Weidmann N,Frank E and Pfahringer B.A two-level learning method for generalized multi-instance problem.In 14th European Conference on Machine Learning,Cavtat-Dubrovnik,Croatia,2003;468-79.
64.Long PM and Tan L.PAC learning axis-aligned rectangles with respect to product distributions from multiple-instance examples.Mach Learn 1998;30:7-21.
65.Auer P,Long PM and Srinivasan A.Approximating hyper-rectangles:learning and pseudo-random sets.J Comput Syst Sci 1998;57:376-88.
66.Blum A and Kalai A.A note on learning from multiple-instance examples.Mach Learn 1998:30:23-9.
67.Sabato S and Tishby N.Homogenous multi-instance learning with arbitrary dependence.In 22nd Conference on Learning Theory,Montreal.Canada,2009.
68.Frénay B and Verleysen M.Classification in the presence of label noise:a survey.IEEE Trans Neural Network Learn Syst 2014;25:845-69.
69.Angluin D and Laird P.Learning from noisy examples.Mach Learn 1988;2:343-70.
70.Blum A,Kalai A and Wasserman H.Noise-tolerant learning,the parity problem,and the statistical query model.J ACM 2003:50:506-19.
71.Gao W,Wang L and Li YF et al.Risk minimization in the presence of label noise.In 30th AAAI Conference on Artificial Intelligence,Phoenix,AZ,2016;1575-81.
72.Brodley CE and Friedl MA.Identifying mislabeled training data.J Artif Intell Res 1999;11:131-67.
73.Muhlenbach F,Lallich S and Zighed DA.Identifying and handling mislabelled instances.J Intell Inform Syst2004;22:89-109.
74.Brabham DC.Crowdsourcing as a model for problem solving:an introduction and cases.Convergence 2008;14:75-90.
75.Sheng VS,Provost FJ and Ipeirotis PG.Get another label?Improving data quality and data mining using multiple,noisy labelers.In 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining,Las Vegas,NV,2008;614-22.
76.Snow R,O'Connor B and Jurafsky D et al.Cheap and fast-but is it good?Evaluating non-expert annotations for natural language tasks.In 2008Conference on Empirical Methods in Natural Language Processing,Honolulu,HI,2008;254-63.
77.Raykar VC,Yu S and Zhao LH etal.Learning from crowds.J Mach Learn Res2010;11:1297-322.
78.Whitehill J,Ruvolo P and Wu T et al.Whose vote should count more:optimal integration of labels from labelers of unknown expertise.In Advances in Neural Information Processing Systems 22,Cambridge,MA:MIT Press.2009;2035-43.
79.Raykar VC and Yu S.Eliminating spammers and ranking annotators for crowdsourced labeling tasks.J Mach Learn Res 2012;13:491-518.
80.Wang W and Zhou ZH.Crowdsourcing label quality:a theoretical analysis.Sci China Inform Sci2015;58:1-12.
81.Dekel O and Shamir O.Good learners for evil teachers.In 26th International Conference on Machine Learning.Montreal.Canada,2009;233-40.
82.Urner R,Ben-David S and Shamir 0.Learning from weak teachers.In 15th International Conference on Artificial Intelligence and Statistics,La Palma,Canary Islands,2012;1252-60.
83.Wang L and Zhou ZH.Cost-saving effect of crowdsourcing learning.In 25th International Joint Conference on Artificial Intelligence,New York,NY,2016;2111-7.
84.Karger DR,Sewoong O and Devavrat S.Iterative learning for reliable crowdsourcing systems.In Advances in Neural Information Processing Systems 24,Cambridge,MA:MIT Press,2011;1953-61.
85.Tran-Thanh L,Venanzi M and Rogers A et al.Efficient budget allocation with accuracy guarantees for crowdsourcing classification tasks.In 12th International conference on Autonomous Agents and Multi-Agent Systems,Saint Paul,MN,2013;901-8.
86.Ho CJ,Jabbari S and Vaughan JW.Adaptive task assignment for crowdsourced classification.In 30th International Conference on Machine Learning,Atlanta,GA.2013;534-42.
87.Chen X,Lin Q and Zhou D.Optimistic knowledge gradient policy for optimal budget allocation in crowdsourcing.In 30th International Conference on Machine Learning,Atlanta,GA,2013;64-72.
88.Dawid AP and Skene AM.Maximum likelihood estimation of observer errorrates using the EM algorithm.J Roy Stat Soc C Appl Stat 1979;28:20-8.
89.Zhong J,Tang K and Zhou Z-H.Active learning from crowds with unsure option.In 24th International Joint Conference on Artificial Intelligence,Buenos Aires,Argentina,2015;1061-7.
90.Ding YX and Zhou ZH.Crowdsourcing with unsure opinion.arXiv:1609.00292,2016.
91.Shah NB and Zhou D.Double or nothing:multiplicative incentive mechanisms for crowdsourcing.In Advances in Neural Information Processing Systems 28,Cambridge,MA:MIT Press,2015;1-9.
92.Rahmani R and Goldman SA.MISSL:multiple-instance semi-supervised learning.In 23rd International Conference on Machine Learning,Pittsburgh,PA,2006;705-12.
93.Yan Y,Rosales R and Fung G et al.Active learning from crowds.In 28th International Conference on Machine Learning,Bellevue,WA,2011;1161-8.
94.Sutton RS and Barto AG.Reinforcement Learning:An Introduction.Cambridge:MIT Press,1998.
95.Schwenker F and Trentin E.Partially supervised learning for pattern recognition.Pattern Recogn Lett 2014;37:1-3.
96.Garcia-Garcia D and Williamson RC.Degrees of supervision.In Advances in Neural Information Processing Systems 17,Cambridge.MA:MIT Press Workshops,2011.
97.Hernandez-Gonzalez J,Inza I and Lozano JA.Weak supervision and other nonstandard classification problems:a taxonomy.Pattern Recogn Lett2016;69:49-55.
98.Kuncheva LI,Rodiguez JJ and Jackson AS.Restricted set classification:who is there?Pattern Recogn 2017;63:158-70.99.Zhang M-L and Zhou Z-H.A review on multi-label learning algorithms.IEEE
Trans Knowl Data Eng 2014;26:1819-37.
100.Sun YY,Zhang Y and Zhou ZH.Multi-label learning with weak label.In 24th AAAI Conference on Artificial Intelligence,Atlanta,GA,2010;593-8.
101.Li X and Guo Y.Active learning with multi-label SVM classification.In 23rd International Joint Conference on Artificial Intelligence.Beijing,China,2013;1479-85.
102.Qi GJ,Hua XS and Rui Y et al.Two-dimensional active learning for image classification.In IEEE Computer Society Conference on Computer Vision and Pattern Recognition,Anchorage,AK,2008.
103.Huang SJ,Chen S and Zhou ZH.Multi-label active learning:query type matters.In 24th International Joint Conference on Artificial Intelligence,Buenos Aires,Argentina,2015;946-52.