基于半监督学习的物体识别

英文题名：Object Classification Based on Semi-supervised Learning
作者：褚镇飞
论文级别：硕士
学科专业名称：信号与信息处理
中文关键词：半监督学习 ; 协同训练 ; 伪标注 ; 信息瓶颈理论
英文关键词：Semi-supervised learning ; Co-training ; Pseudo-labels ; Information bottleneck principle
学位年度：2010
导师：杨小康
学科代码：081002
学位授予单位：上海交通大学
论文提交日期：2010-01-01

摘要

物体识别是机器学习中的基本问题,解决对文本、图片、视频等数据做分类识别的问题。在数据量较少的情况下,传统的机器学习方法已经取得了很好的效果。但是,随着信息量指数式的增加,获得大量的数据标注已经变得几乎无法完成,这使得传统的机器学习方法在处理这类问题的时候显得力不从心。在这样的情况下,半监督学习方法应运而生,它是使用少量有标注数据的信息,将其扩展到未标注数据上,从而可以解决示例数据和标注数据在数量上严重不匹配的问题。
     本文阐述了针对难以获得的精确标注和容易获得的粗略标注同时存在的情况下的半监督学习问题,研究了协同训练的鲁棒性问题,即对给定初始标注数据中的错误,对协同训练性能的影响。
     在协同训练的鲁棒性问题的基础上,本文将信息瓶颈算法和计算后验概率的方法相结合,创新性地提出了一种使用无监督学习方法产生伪标注的方法。与现有方法相比,该方法仅需要较少的标注信息,并可有效降低计算复杂度。
     在使用伪标注的过程中,本文创新性地提出了一种使用伪标注的协同训练方法。该方法以重排序算法为主要框架,与现有方法相比,此方法对初始的错误标注,具有较高的鲁棒性。在初始标注中存在较多错误时,改进后的方法仍然可以训练出性能较好的分类器。
     本文在利用伪标注来进行协同训练时,从统计学角度对该方法进行了理论分析,在数学上对该方法在提高协同训练的鲁棒性方面的有效性进行了研究,并探讨了朴素贝叶斯分类和信息瓶颈方法在理论基础上的相似性。
Object classification is one of the basic problems in machine learning, which aiming at solving the classification and recognition problem on text, image, and video data. In the case of small amount of data, traditional machine learning methods have already achieved a sound performace. However, as the exponential booming of information, it is impossible to obtain such a large amount of data with labels, which leads to ineffectivity of traditional methods. In such scenario, semi-supervised learning methods become a hot point in research. It uses small amount of data with labels and extends them to unlabeled data to fill the quantity gap of labeled examples and unlabeled examples.
     In this thesis, we focus on a typical semi-supervised learning problem with small amount of high-accurate labels and large amount of low-accurate labels. We also propose the robustness factor of co-training, which denotes the influence of initial incorrect labels to co-training process.
     Based on robustness problem of co-training, we originally propose an unsupervised pseudo-label-generating method based on the combination of information bottleneck principle and the method of posteri. In comparison with existing methods, this improvement needs smaller amount of labels and requires lower computation complexity.
     In applying pseudo-labels, we creatively discover a pseudo-label-aided co-training method. Comparing with existing methods, this method is more robust to initial incorrect labels. This improvement can guide co-training to obtain better classifiers even in the case that there are many incorrect labels in the labeled data.
     We also raise a theoretical analysis on this improvement in the angle of statics. We also mathematically prove the effectiveness in boosting robustness of co-training and discuss the similarity of Naive Bayes Classification and Information Bottleneck Principle.

引文

[1] K. Nigam, et al., "Text classification from labeled and unlabeled documents using EM," Machine Learning vol. 39, pp. 103-134, 2000.
    [2] G. Fung and O. Mangasarian, "Semi-supervised support vector machines for unlabeled data classification," Data Mining Institute, University of Wisconsin Madison.1999.
    [3] T. Joachims, "Transductive inference for text classification using support vector machines," Proc. 16th International Conf. on Machine Learning, pp. 200-209, 1999.
    [4] R. Collobert, et al., "Large Scale Transductive SVMs," Machine Learning Research, vol. 7, pp. 1687--1712, 2006.
    [5] A. Blum and S. Chawla, "Learning from Labeled and Unlabeled Data using Graph Mincuts," Proc. of the Eighteenth International Conference on Machine Learning(ICML), pp. 19--26, 2001.
    [6] X. Zhu, et al., "Semi-Supervised Learning Using Gaussian Fields and Harmonic Functions," The 20th International Conference on Machine Learning (ICML), pp. 912--919, 2003.
    [7] A. Blum and T. Mitchell, "Combining labeled and unlabeled data with co-training," Proc. of the 11th Annual Conference on Computational Learning Theory pp. 92-100, 1998.
    [8] K. Nigam and R. Ghani, "Analyzing the Effectiveness and Applicability of Co-training," Proc. of the Ninth International Conference on Information and Knowledge Management, 2000.
    [9] S. Goldman and Y. Zhou, "Enhancing supervised learning with unlabeled data," Proc. 17th International Conf. on Machine Learning, pp. 327--334, 2000.
    [10] Y. Zhou and S. Goldman, "Democratic co-learing," Proc. of 16th IEEE International Conference on Tools with Artificial Intelligence, 2004.
    [11] Z.H. Zhou and M. Li, "Tri-training: exploiting unlabeled data using three classifiers," IEEE Trans. on Knowledge and Data Engineering, pp. 1529--1541, 2005.
    [12] M. F. Balcan, et al., "Co-training and expansion: Towards bridging theory and practice," Advances in neural information processing systems 17, 2005.
    [13] R. Johnson and T. Zhang, "Two-view feature generation model for semi-supervised learning," 24th International Conference on Machine Learning, 2007.
    [14] Z.H. Zhou, et al., "Semi-supervised learning with very few labeled training examples," Proceedings of the 22nd AAAI conference on artificial intelligence, pp. 675--680, 2007.
    [15]周志华, "半监督学习中的协同训练风范," in机器学习及其应用. vol. 13,周志华,王珏, Ed., ed北京:清华大学出版社, 2007, pp. 259-275.
    [16] B. Shahshahani and D. Landgrebe, "The effect of unlabeled samples in reducing the small sample size problem and mitigating the hughes phenomenon," IEEE Transactions on Geoscience and Remote Sensing, vol. 5, pp. 1087-1095, 1994.
    [17] R. P. Lippmann, "Pattern classification using neural networks," IEEE Communications, vol. 11, pp. 47-64, 1989.
    [18] D. J. Miller and H. S. Uyar, "A mixture of experts classifier with learning based on both labelled and unlabelled data," M. Mozer, M. I. Jordan, T. Petsche, eds. Advances in Neural Information Processing Systems 9, pp. 571-577, 1997.
    [19] Xiaojin Zhu, "Semi-Supervised Learning Literature Survey," Computer Sciences, University of Wisconsin-Madison 1530, 2005.
    [20] Matthias Seeger, "Learning with labeled and unlabeled data," University of Edinburgh2001.
    [21] A. Dempster, et al., "Maximum likelihood from incomplete data via the EM algorithm," Journal of the Royal Statistical Society, Series B, 1997.
    [22] David Yarowsky, "Unsupervised word sense disambiguation rivaling supervised methods," Proceedings of the 33rd Annual Meeting of the Association for Computational Linguistics, pp. 189-196, 1995.
    [23] E. Riloff, et al., "Learning subjective nouns using extraction pattern bootstrapping," Proceedings of the Seventh Conference on Natural Language Learning (CoNLL-2003), 2003.
    [24] Beatriz Maeireizo, et al., "Co-training for predicting emotions with spoken dialogue data," The Companion Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL), 2004.
    [25] Charles Rosenberg, et al., "Semi-supervised self-training of object detection models," Seventh IEEE Workshop on Applications of Computer Vision, 2005.
    [26] Mark Culp and George Michailidis, "An iterative algorithm for extending learners to asemisupervised setting," The 2007 Joint Statistical Meetings (JSM), 2007.
    [27] Fillip Mulier, "Vapnik Chervortenkis (VC) Learning Theory and Its Application," IEEE Transactions on Neural Network, vol. 5, pp. 32-37, 1999.
    [28] V.N. Vapnik, Statistical Learning Theory. New York: John Wiley & Son, 1998.
    [29] V.N. Vapnik, The Nature of Statistical Learning Theory. New York: Springer-Verlag, 1995.
    [30] Corinna Cotes and V. N. Vapnik, "Support-Vector Network," Machine Learning, vol. 2, pp. 273-297, 1995.
    [31] K. Bennet, "Combining support vector and mathematical programming methods for classification," in Advances in Kernel Methods-Support Vector Machine, B. Scholkopf, et al., Eds., ed: MIT Press, 1998.
    [32] K. Branson. A Naive Bayes classifier using transductive inference for text classification. Available: http://www-cse.ucsd.edu
    [33] Xiaojin Zhu, "Semi-supervised learning with graphs," PhD, Language Technologies, Carnegie Mellon University, Pittsburgh, PA, 2005.
    [34] M. Szummer and T. Jaakkola, "Patially Labeled Classification with Markov Random Walks," AdVances in Neural Information Processing System 14, pp. 945-952, 2002.
    [35] M. R. Amini and P. Gallinari, "Semi-supervised logjstic regreasion," Proceedings of the 15th European Conference on Artificial Intelligence (ECAI), pp. 390-394, 2002.
    [36] X. Zhang and W S. Lee, "Hyperparameter Learning for graph based semi-supervised leaning algorithms," adVances in Neural Information Processing System 19, pp. 1585-1592, 2006.
    [37] X. Zhu, et al., "Combining active learning and semi-supervised learning using Gaussian fields and harmonic functions," presented at the ICML 2003 workshop on The Continuum from Labeled to Unlabeled Data in Machine Leanlmg and Data Mining, 2003.
    [38] L. Grady and G. Funka-Lee, "Multi-label image segmentation for medical applications based on graph-theoretic electrical potentials," presented at the ECCV Worlcshops CVAMIA and MMBLA, 2004.
    [39] A. Levin, et al., "Colorization using optimization," ACM Trans. Graphics, vol. 3, pp. 689-694, 2004.
    [40] Z Y. Niu, et al., "Word sense disambiguation using label propagation based semi-supervised learning," in Proceedings of the 43th annual MeetiIlg on Association for ComputationalLinguistics(ACL), 2005, pp. 395-402.
    [41] A. Goldberg, et al., "Dissimilarity in graph-based semi-supervised classification," presented at the 11th Intenational Conference on Artificial Intelligence and Statistics (AISTATS), 2007.
    [42] D Y. Zhou, et al., "Learning with Local and GlobaI Consistency," 18th Annual Conf. on Neural Information Processing System 16, 2003.
    [43] T. Joachims, "Transductive learning via spectral graph partitioning," in Proceeding of The 20th Intenational Conference on Machine Leaning(ICML03), 2003.
    [44] T. P. Pham, et al., "Word sense disambiguation with semi-supervised learning," presented at the AAAI-05, The 20th National Conference on Artificial Intelligence, 2005.
    [45] N. Tishby, et al., "The Information Bottleneck Method," Proc. of 37th Allerton Conference on Communication and Computation, 1999.
    [46] N. Slonim and N. Tishby, "Agglomerative Information Bottleneck," Proc. of 37th Neural Information Processing Systems (NIPS-12), 1999.
    [47] N. Slonim, et al., "Unsupervised Document Classification using Sequential Information Maximization," ACM SIGIR'02, 2002.
    [48] NIST. (2007, TRECVID官方网站. Available: http://www-nlpir.nist.gov/projects/t01v/
    [49] W. Hsu, et al., "Video Search Reranking via Information Bottleneck Principle," ACM Multimedia, 2006.
    [50] T.M. Cover and J.A. Thomas, Elements of Information Theory. New York: John Wiley & Sons, 1991.
    [51] T.Berger, Rate Distortion Theory. Engle wood Cliffs,New Jersey: Prentice Hall, 1971.
    [52] G. Salton and M J. McGill, Introduction to Modem lnformation Retrieval. New York: McGraw Hill, 1983.
    [53] Nicolas. J.Belkin and W.Bruce Croft, "Information filtering and information retrieval: two sides of the same coin," Communications of the ACM, vol. 12, pp. 29-38, 1992.
    [54]边肇祺,张学工等,模式识别(第二版).北京:清华大学出版社, 2000.
    [55]揭春雨,刘源,梁南元, "论汉语自动分词方法,"中文信息学报, vol. 1, pp. 1-8, 1989.
    [56] C.J. Van Rijsbergen, Information Retrieval (2nd ed.). Newton, MA: Butterworth-Heinemann, 1979.
    [57]李凡,鲁明羽,陆玉昌, "关于文本特征抽取新方法的研究,".清华大学学报(自然科学版),vol. 7, pp. 98-101, 2001.
    [58] W. Lam and C.Y. Ho, "Using a generalized instance set for automatic text catigorization," in Proceedings of the 21st Ann Int ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '98), 1998, pp. 81-89.
    [59] H. Zhang, "The optimality of naive bayes," Proc. of the 17th International FLAIRS conference (FLAIRS2004), AAAI Press, 2004.
    [60] A. Strehl, et al., "Impact of Similarity Measures on Web-page Clustering," Workshop of Artificial Intelligence for Web Search(AAAI-2000), pp. 58-64, 2000.
    [61] C. S. Myers and L. R. Rabiner, "A comparative study of several dynamic time-warping algorithms for connected word recognition," The Bell System Technical Journal, vol. 7, pp. 1389-1409, 1981.
    [62] S. Salvador and P. Chan, "FastDTW: Toward Accurate Dynamic Time Warping in Linear Time and Space," Intelligent Data Analysis, 2007.
    [63] H. Sakoe and S. Chiba, "Dynamic programming algorithm optimization for spoken word recognition," IEEE Trans. Acoustics, Speech, and Signal Proc, vol. ASSP-26, 1978.
    [64] F. Itakura, "Minimum Prediction Residual Principle Applied to Speech Recognition," IEEE Trans. Acoustics, Speech, and Signal Proc, vol. ASSP-23, 1975.
    [65] M. Li, et al., "Semi-supervised document retrieval," Information Processing and Management, pp. 341--355, 2009.
    [66] Z.H. Zhou and M. Li, "NeC4.5: Neural Ensemble Based C4.5," IEEE Trans. on Knowledge and Data Engineering, pp. 770--773, 2004.
    [67] A. McCallum and K. Nigam, "A Comparison of Event Models for Naive Bayes Text Classification," AAAI/ICML-98 Workshop on Learning for Text Categorization, pp. 41-48, 1998.
    [68] E.T. Jaynes, "Information theory and statistical mechanics," Physical Review, vol. 4, pp. 620-630, 1957.
    [69] A. Giffin and A. Caticha, "Updating probabilities with data and moments," presented at the 27th International Workshop on Bayesian Inference and Maximum Entropy Methods in Science and Engineering, Saratoga Springs, NY, 2007.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700