A brief introduction to weakly supervised learning

英文篇名：A brief introduction to weakly supervised learning
作者：Zhi-Hua ; Zhou
英文作者：Zhi-Hua Zhou;National Key Laboratory for Novel Software Technology,Nanjing University;
英文关键词：machine learning;;weakly supervised learning;;supervised learning
中文刊名：NASR
英文刊名：国家科学评论(英文版)
机构：National Key Laboratory for Novel Software Technology,Nanjing University;
出版日期：2018-01-15
出版单位：National Science Review
年：2018
期：v.5
基金：supported by the National Natural Science Foundation of China(61333014);; the National Key Basic Research Program of China(2014CB340501);; the Collaborative Innovation Center of Novel Software Technology and Industrialization
语种：英文;
页：NASR201801015
页数：10
CN：01
ISSN：10-1088/N
分类号：48-57

摘要

Supervised learning techniques construct predictive models by learning from a large number of training examples, where each training example has a label indicating its ground-truth output. Though current techniques have achieved great success; it is noteworthy that in many tasks it is difficult to get strong supervision information like fully ground-truth labels due to the high cost of the data-labeling process. Thus,it is desirable for machine-learning techniques to work with weak supervision. This article reviews some research progress of weakly supervised learning, focusing on three typical types of weak supervision:incomplete supervision, where only a subset of training data is given with labels; inexact supervision, where the training data are given with only coarse-grained labels; and inaccurate supervision, where the given labels are not always ground-truth.
Supervised learning techniques construct predictive models by learning from a large number of training examples, where each training example has a label indicating its ground-truth output. Though current techniques have achieved great success; it is noteworthy that in many tasks it is difficult to get strong supervision information like fully ground-truth labels due to the high cost of the data-labeling process. Thus,it is desirable for machine-learning techniques to work with weak supervision. This article reviews some research progress of weakly supervised learning, focusing on three typical types of weak supervision:incomplete supervision, where only a subset of training data is given with labels; inexact supervision, where the training data are given with only coarse-grained labels; and inaccurate supervision, where the given labels are not always ground-truth.

引文

1.Goodfellow I,Bengio Y and Courville A.Deep Learning.Cambridge:MIT Press,2016.
    2.Settles B.Active learning literature survey.Technical Report 1648.Department of Computer Sciences,University of Wisconsin at Madison,Wisconsin,WI,2010[http://pages.cs.wisc.edu/~bsettles/pub/settles.act ivelearning.pdf].
    3.Chapelle 0,Scholkopf B and Zien A(eds).Semi-Supervised Learning.Cambridge:MIT Press,2006.
    4.Zhu X.Semi-supervised learning literature survey.Technical Report 1530.Department of Computer Sciences,University of Wisconsin at Madison,Madison,WI,2008[http://www.cs.wisc.edu/~jerryzhu/pub/ssl'survey.pdf].
    5.Zhou Z-H and Li M.Semi-supervised learning by disagreement.Knowl Inform Syst2010;24:415-39.
    6.Huang SJ,Jin R and Zhou ZH.Active learning by querying informative and representative examples.IEEE Trans Pattern Anal Mach Intell 2014;36:1936-49.
    7.Lewis D and Gale W.A sequential algorithm for training text classifiers.In 17th Annual International ACM SIGIRConference on Research and Development in Information Retrieval,Dublin,Ireland,1994:3-12.
    8.Seung H,Opper M and Sompolinsky H.Query by committee.In5th ACM Workshop on Computational Learning Theory,Pittsburgh,PA,1992;287-94.
    9.Abe N and Mamitsuka H.Query learning strategies using boosting and bagging.In 15th International Conference on Machine Learning,Madison.WI,1998;1-9.
    10.Nguyen HT and Smeulders AWM.Active learning using preclustering.In 21st International Conference on Machine Learning,Banff,Canada,2004;623-30.
    11.Dasgupta S and Hsu D.Hierarchical sampling for active learning.In 25th International Conference on Machine Learning,Helsinki,Finland,2008;208-15.
    12.Wang Z and Ye J.Querying discriminative and representative samples for batch mode active learning.In 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining,Chicago,IL,2013;158-66.
    13.Dasgupta S,Kalai AT and Monteleoni C.Analysis of perceptron-based active learning.In 28th Conference on Learning Theory,Paris,France,2005;249-63.
    14.Dasgupta S.Analysis of a greedy active learning strategy.In Advances in Neural Information Processing Systems 17,Cambridge,MA:MIT Press,2005;337-44.
    15.Kaariainen M.Active learning in the non-realizable case.In21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics,Sydney,Australia.2006;63-77.
    16.Balcan MF,BroderAZ and Zhang T.Margin based active learning.In 20th Annual Conference on Learning Theory,San Diego,CA.2007;35-50.
    17.Hanneke S.Adaptive rates of convergence in active learning.In22nd Conference on Learning Theory,Montreal,Canada,2009.
    18.Wang W and Zhou ZH.Multi-view active learning in the non-realizable case.In Advances in Neural Information Processing Systems 23,Cambridge,MA:MIT Press,2010;2388-96.
    19.Miller DJ and Uyar HS.A mixture of experts classifier with learning based on both labelled and unlabelled data.In Advances in Neural Information Processing Systems 9,Cambridge,MA:MIT Press,1997;571-7.
    20.Nigam K,McCallum AK and Thrun S et al.Text classification from labeled and unlabeled documents using EM.Mach Learn2000;39:103-34.
    21.Dempster AP,Laird NM and Rubin DB.Maximum likelihood from incomplete data via the EM algorithm.J Roy Stat Soc B Stat Meth 1977;39:1-38.
    22.Fujino A,Ueda N and Saito K.A hybrid generative/discriminative approach to semi-supervised classifier design.In 20th National Conference on Artificial Intelligence,Pittsburgh,PA,2005;764-9.
    23.Blum A and Chawla S.Learning from labeled and unlabeled data using graph mincuts.In ICML,2001;19-26.
    24.Zhu X,Ghahramani Z and Lafferty J.Semi-supervised learning using Gaussian fields and harmonic functions.In 20th International Conference on Machine Learning,Washington,DC,2003;912-9.
    25.Zhou D,Bousquet 0 and Lal TN et al.Learning with local and global consistency.In Advances in Neural Information Processing Systems 16,Cambridge,MA:MIT Press,2004;321-8.
    26.Carreira-Perpinan MAandZemel RS.Proximity graphs for clustering and manifold learning.In Advances in Neural Information Processing Systems 17,Cambridge,MA:MIT Press,2005;225-32.
    27.Wang F and Zhang C.Label propagation through linear neighborhoods.In 23rd International Conference on Machine Learning,Pittsburgh,PA,2006;985-92.
    28.Hein M and Maier M.Manifold denoising.In Advances in Neural Information Processing Systems 19,Cambridge,MA:MIT Press,2007;pp.561-8.
    29.Joachims T.Transductive inference for text classification using support vector machines.In 16th International Conference on Machine Learning,Bled,Slovenia,1999;200-9.
    30.Chapelle 0 and Zien A.Semi-supervised learning by low density separation.In 10th International Workshop on Artificial Intelligence and Statistics,Barbados,2005;57-64.
    31.Li YF,Tsang IW and KwokJT et al.Convex and scalable weakly labeled SVMs.J Mach Learn Res 2013;14:2151-88.
    32.Blum A and Mitchell T.Combining labeled and unlabeled data with cotraining.In 11th Conference on Computational Learning Theory,Madison.WI,1998;92-100.
    33.Zhou Z-H and Li M.Tri-training:exploiting unlabeled data using three classifiers.IEEE Trans Knowl Data Eng 2005;17:1529-41.
    34.Zhou Z-H.When semi-supervised learning meets ensemble learning.In 8th International Workshop on Multiple Classifier Systems,Reykjavik,Iceland,2009:529-38.
    35.Zhou Z-H.Ensemble Methods:Foundations and Algorithms.Boca Raton:CRC Press,2012.
    36.Cozman FG and Cohen I.Unlabeled data can degrade classification performance of generative classifiers.In 15th International Conference of the Florida Artificial Intelligence Research Society,Pensacola,FL,2002;327-31.
    37.Li YF and Zhou ZH.Towards making unlabeled data never hurt.IEEE Trans Pattern Anal Mach Intell 2015:37:175-88.
    38.Castelli V and CoverTM.On the exponential value of labeled samples.Pattern Recogn Lett 1995;16:105-11.
    39.Wang W and Zhou ZH.Theoretical foundation of co-training and disagreement-based algorithms.arXiv:1708.04403,2017.
    40.Dietterich TG,Lathrop RH and Lozano-Pérez T.Solving the multiple-instance problem with axis-parallel rectangles.Artif Intell 1997;89:31-71.
    41.Foulds J and Frank E.A review of multi-instance learning assumptions.Knowl Eng Rev 2010;25:1-25.
    42.Zhou Z-H.Multi-instance learning from supervised view.J Comput Sci Technol2006;21:800-9.
    43.Zhou Z-H and Zhang M-L.Solving multi-instance problems with classifier ensemble based on constructive clustering.Knowl Inform Syst2007;11:155-70.
    44.Wei X-S,Wu J and Zhou Z-H Scalable algorithms for multi-instance learning.IEEE Trans Neural Network Learn Syst2017;28:975-87.
    45.Amores J.Multiple instance classification:review,taxonomy and comparative study.Artif Intell 2013;201:81-105.
    46.Zhou Z-H and Xu J-M.On the relation between multi-instance learning and semi-supervised Iearning.In 24th International Conference on Machine Learning,Corvallis,OR.2007;1167-74.
    47.Zhou Z-H,Sun Y-Y and Li Y-F.Multi-instance learning by treating instances as non-i.i.d.samples.In 26th International Conference on Machine Learning,Montreal,Canada,2009;1249-56.
    48.Chen Y and Wang JZ.Image categorization by learning and reasoning with regions.J Mach Learn Res2004;5:913-39.
    49.Zhang Q,Yu W and Goldman SA et al.Content-based image retrieval using multiple-instance learning.In 19th International Conference on Machine Learning,Sydney,Australia,2002;682-9.
    50.Tang JH,Li HJ and Qi GJ et al.Image annotation by graph-based inference with integrated multiple/single instance representations.IEEE Trans Multimed 2010;12:131-41.
    51.Andrews S,Tsochantaridis I and Hofmann T.Support vector machines for multiple-instance learning.In Advances in Neural Information Processing Systems 75.Cambridge,MA:MIT Press,2003;561-8.
    52.Settles B,Craven M and Ray S.Multiple-instance active learning.In Advances in Neural Information Processing Systems 20,Cambridge,MA:MIT Press,2008;1289-96.
    53.Jorgensen Z,Zhou Y and Inge M.A multiple instance learning strategy for combating good word attacks on spam filters.JMach Learn Res 2008;8:993-1019.
    54.Fung G,Dundar M and Krishnappuram B et al.Multiple instance learning for computer aided diagnosis.In Advances in Neural Information Processing Systems 19.Cambridge,MA:MIT Press,2007;425-32.
    55.Viola P,Platt J and Zhang C.Multiple instance boosting for object detection.In Advances in Neural Information Processing Systems 18,Cambridge,MA:MIT Press,2006;1419-26.
    56.Felzenszwalb PF,Girshick RB and McAllester D et al.Object detection with discriminatively trained part-based models.IEEE Trans Pattern Anal Mach Intell 2010:32:1627-45.
    57.Zhu J-Y,Wu J andXuY etal.Unsupervised object class discovery via saliencyguided multiple class learning.IEEE Trans Pattern Anal Mach Intell 2015;37:862-75.
    58.Babenko B,Yang MH and Belongie S.Robust object tracking with online multiple instance learning.IEEE Trans Pattern Anal Mach Intell 2011;33:1619-32.
    59.Wei X-S and Zhou Z-H.An empirical study on image bag generators for multiinstance learning.Mach Learn 2016;105:155-98.
    60.Liu G,Wu J and Zhou ZH.Key instance detection in multi-instance learning.In 4th Asian Conference on Machine Learning,Singapore,2012;253-68.
    61.Xu X and Frank E.Logistic regression and boosting for labeled bags of instances.In 8th Pacific-Asia Conference on Knowledge Discovery and Data Mining,Sydney,Australia,2004;272-81.
    62.Chen Y,Bi J and Wang JZ.MILES:multiple-instance learning via embedded instance selection.IEEE Trans Pattern Anal Mach Intell 2006;28:1931-47.
    63.Weidmann N,Frank E and Pfahringer B.A two-level learning method for generalized multi-instance problem.In 14th European Conference on Machine Learning,Cavtat-Dubrovnik,Croatia,2003;468-79.
    64.Long PM and Tan L.PAC learning axis-aligned rectangles with respect to product distributions from multiple-instance examples.Mach Learn 1998;30:7-21.
    65.Auer P,Long PM and Srinivasan A.Approximating hyper-rectangles:learning and pseudo-random sets.J Comput Syst Sci 1998;57:376-88.
    66.Blum A and Kalai A.A note on learning from multiple-instance examples.Mach Learn 1998:30:23-9.
    67.Sabato S and Tishby N.Homogenous multi-instance learning with arbitrary dependence.In 22nd Conference on Learning Theory,Montreal.Canada,2009.
    68.Frénay B and Verleysen M.Classification in the presence of label noise:a survey.IEEE Trans Neural Network Learn Syst 2014;25:845-69.
    69.Angluin D and Laird P.Learning from noisy examples.Mach Learn 1988;2:343-70.
    70.Blum A,Kalai A and Wasserman H.Noise-tolerant learning,the parity problem,and the statistical query model.J ACM 2003:50:506-19.
    71.Gao W,Wang L and Li YF et al.Risk minimization in the presence of label noise.In 30th AAAI Conference on Artificial Intelligence,Phoenix,AZ,2016;1575-81.
    72.Brodley CE and Friedl MA.Identifying mislabeled training data.J Artif Intell Res 1999;11:131-67.
    73.Muhlenbach F,Lallich S and Zighed DA.Identifying and handling mislabelled instances.J Intell Inform Syst2004;22:89-109.
    74.Brabham DC.Crowdsourcing as a model for problem solving:an introduction and cases.Convergence 2008;14:75-90.
    75.Sheng VS,Provost FJ and Ipeirotis PG.Get another label?Improving data quality and data mining using multiple,noisy labelers.In 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining,Las Vegas,NV,2008;614-22.
    76.Snow R,O'Connor B and Jurafsky D et al.Cheap and fast-but is it good?Evaluating non-expert annotations for natural language tasks.In 2008Conference on Empirical Methods in Natural Language Processing,Honolulu,HI,2008;254-63.
    77.Raykar VC,Yu S and Zhao LH etal.Learning from crowds.J Mach Learn Res2010;11:1297-322.
    78.Whitehill J,Ruvolo P and Wu T et al.Whose vote should count more:optimal integration of labels from labelers of unknown expertise.In Advances in Neural Information Processing Systems 22,Cambridge,MA:MIT Press.2009;2035-43.
    79.Raykar VC and Yu S.Eliminating spammers and ranking annotators for crowdsourced labeling tasks.J Mach Learn Res 2012;13:491-518.
    80.Wang W and Zhou ZH.Crowdsourcing label quality:a theoretical analysis.Sci China Inform Sci2015;58:1-12.
    81.Dekel O and Shamir O.Good learners for evil teachers.In 26th International Conference on Machine Learning.Montreal.Canada,2009;233-40.
    82.Urner R,Ben-David S and Shamir 0.Learning from weak teachers.In 15th International Conference on Artificial Intelligence and Statistics,La Palma,Canary Islands,2012;1252-60.
    83.Wang L and Zhou ZH.Cost-saving effect of crowdsourcing learning.In 25th International Joint Conference on Artificial Intelligence,New York,NY,2016;2111-7.
    84.Karger DR,Sewoong O and Devavrat S.Iterative learning for reliable crowdsourcing systems.In Advances in Neural Information Processing Systems 24,Cambridge,MA:MIT Press,2011;1953-61.
    85.Tran-Thanh L,Venanzi M and Rogers A et al.Efficient budget allocation with accuracy guarantees for crowdsourcing classification tasks.In 12th International conference on Autonomous Agents and Multi-Agent Systems,Saint Paul,MN,2013;901-8.
    86.Ho CJ,Jabbari S and Vaughan JW.Adaptive task assignment for crowdsourced classification.In 30th International Conference on Machine Learning,Atlanta,GA.2013;534-42.
    87.Chen X,Lin Q and Zhou D.Optimistic knowledge gradient policy for optimal budget allocation in crowdsourcing.In 30th International Conference on Machine Learning,Atlanta,GA,2013;64-72.
    88.Dawid AP and Skene AM.Maximum likelihood estimation of observer errorrates using the EM algorithm.J Roy Stat Soc C Appl Stat 1979;28:20-8.
    89.Zhong J,Tang K and Zhou Z-H.Active learning from crowds with unsure option.In 24th International Joint Conference on Artificial Intelligence,Buenos Aires,Argentina,2015;1061-7.
    90.Ding YX and Zhou ZH.Crowdsourcing with unsure opinion.arXiv:1609.00292,2016.
    91.Shah NB and Zhou D.Double or nothing:multiplicative incentive mechanisms for crowdsourcing.In Advances in Neural Information Processing Systems 28,Cambridge,MA:MIT Press,2015;1-9.
    92.Rahmani R and Goldman SA.MISSL:multiple-instance semi-supervised learning.In 23rd International Conference on Machine Learning,Pittsburgh,PA,2006;705-12.
    93.Yan Y,Rosales R and Fung G et al.Active learning from crowds.In 28th International Conference on Machine Learning,Bellevue,WA,2011;1161-8.
    94.Sutton RS and Barto AG.Reinforcement Learning:An Introduction.Cambridge:MIT Press,1998.
    95.Schwenker F and Trentin E.Partially supervised learning for pattern recognition.Pattern Recogn Lett 2014;37:1-3.
    96.Garcia-Garcia D and Williamson RC.Degrees of supervision.In Advances in Neural Information Processing Systems 17,Cambridge.MA:MIT Press Workshops,2011.
    97.Hernandez-Gonzalez J,Inza I and Lozano JA.Weak supervision and other nonstandard classification problems:a taxonomy.Pattern Recogn Lett2016;69:49-55.
    98.Kuncheva LI,Rodiguez JJ and Jackson AS.Restricted set classification:who is there?Pattern Recogn 2017;63:158-70.99.Zhang M-L and Zhou Z-H.A review on multi-label learning algorithms.IEEE
    Trans Knowl Data Eng 2014;26:1819-37.
    100.Sun YY,Zhang Y and Zhou ZH.Multi-label learning with weak label.In 24th AAAI Conference on Artificial Intelligence,Atlanta,GA,2010;593-8.
    101.Li X and Guo Y.Active learning with multi-label SVM classification.In 23rd International Joint Conference on Artificial Intelligence.Beijing,China,2013;1479-85.
    102.Qi GJ,Hua XS and Rui Y et al.Two-dimensional active learning for image classification.In IEEE Computer Society Conference on Computer Vision and Pattern Recognition,Anchorage,AK,2008.
    103.Huang SJ,Chen S and Zhou ZH.Multi-label active learning:query type matters.In 24th International Joint Conference on Artificial Intelligence,Buenos Aires,Argentina,2015;946-52.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700