概率混合模型的研究及其应用
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
概率混合模型是一种常用的统计分析工具。由于其表达灵活,概率混合模型已成为当前最流行的密度估计与聚类工具之一。然而,概率混合模型的一般形式往往无法直接投入某些特殊的应用,例如自适应学习、大规模分类、以及多任务学习等。本文将对概率混合模型在上述三方面进行扩展。
     首先,本文提出了一种基于“从一般到特定”学习策略的递归式混合模型。它能够从一个离线学习得到的“一般模型”开始,通过在特定样本域内“主动”检测潜在正样本对自身增量式更新,最终进化为能够适应该特定样本域的“特定模型”。本文将其用于在线自适应建立适合特定图像光照条件的肤色模型,其检测剑的皮肤区域较之于传统方法在精确率上有显著提高。
     其次,本文提出了一种基于“最大化簇间距”学习准则的判别式混合模型——支持簇机。它整合了贝叶斯最优分类器与间距最大化分类器两者的优点:其一,使用高斯混合模型作为训练样本以降低样本数并同时保留原始数据的分布信息;其二,最大化簇间距以增强分类器的泛化能力。本文将其用于大规模数据分类问题,在基本不损失精度的情况下,它能使时间复杂度急剧降低。
     再次,本文提出了一种用于“跨域知识共享”的双向式混合模型——评分矩阵生成模型。通过对来自多个相关协同过滤域的评分矩阵使用双向式混合模型联合聚类,每个评分矩阵中的用户与条目均可看作抽样自评分矩阵生成模型,从而使其成为多个域之间知识传输与共享的桥梁。本文将其用于跨域协同过滤。实验证明,它确实能使每个任务从其它任务中获取额外的有用信息。
     本文提出的三种概率混合模型的新颖扩展各具特色,它们分别被用于解决三个常见但具有很高现实意义的机器学习问题。相对于现有方法,它们不但能够在实际测试中凸现其明显的性能优势,而且为解决这些已经发展至瓶颈的问题提供了与传统方法不同的学习框架与解决思路。
Probability mixture model is very popular in density estimation and clustering.However, its plain form is usually incompetent for some specific applications, suchas adaptive learning, large-scale classification, and multi-task learning. In this article,we will investigate its extensions in terms of the aforementioned three aspects.
     First, a recursive mixture model based on a "Generic to Specific" learning strategyis proposed. It starts as a "generic model" learned off-line, then "actively" detectsthe potential positive samples in a specific domain to update itself, and finally evolvesinto a "specific model" for that domain. It is applied to learning adaptive skin modelunder different illumination conditions. Comparing to the traditional methods, the skinregions detected by our method is more accurate.
     Second, a discriminative mixture model, named support cluster machine(SCM),is proposed. SCM combines the advantages of both Bayes optimal and max-marginclassifiers: 1) it adopts Gaussian mixture models as the training units to reduce thesample size as well as retain the original data distribution; 2) it maximizes the marginbetween positive and negative clusters to improve the generalization ability. SCM cansignificantly reduce the time complexity in large-scale classification.
     Third, a two-sided mixture model, named rating-matrix generative model(RMGM),is proposed. By co-clustering the rating matrices from multiple related domains, theusers and items in each rating matrix can be viewed as drawing from the same RMGM,which thus becomes a bridge among multiple domains for knowledge transferring andsharing. The experimental results validate that RMGM indeed can gain additional usefulknowledge from other domains for a certain domain.
     The proposed three novel extensions for probability mixture model are used forsolving three classical machine learning problems. Comparing to the existing methods,they can not only obtain better results in the empirical tests, but also provide promisingsolutions for solving these problems.
引文
1 http://en.wikipedia.org/wiki/Contingency_table
    1 http://en.wikipedia.org/wiki/Law_of_large_numbers
    1 http://en.wikipedia.org/wiki/Mahalanobis_distance
    1 可从此链接下载:http://yann.lecun.com/exdb/mnist/。
    2 可从此链接下载:http://www.ics.uci.edu/mlearn/MLRepository.html。
    1 http://en.wikipedia.org/wiki/Greedy_algorithm
    2 http://en.wikipedia.org/wiki/Singular_value_decomposition
    1 http://www.grouplens.org/node/73
    2 http://www.cs.cmu.edu/~lebanon/IR-lab.htm
    3 http://www.informatik.uni-freiburg.de/~cziegler/BX
    [1] Pearson K. Contributions to the Theory of the Mathematical Evolution. Philosophical Transactions of the Royal Society of London, 1894, A(185):71-l10.
    [2] Everitt B S, Hand D J. Finite mixture distributions. London: Chapman & Hall, 1981.
    [3] Titterington D, Smith A, Makov U. Statistical Analysis of Finite Mixture Distributions. New York: John Wiley & Sons, 1985.
    [4] McLachlan G J, Basford K. E. Mixture Models: Inference and Applications to Clustering.New York: Marcel Dekker, 1988.
    [5] McLachlan G J, Peel D. Finite Mixture Models. New York: John Wiley & Sons, 2000.
    [6] Reynolds D A, Quatieri T F, Dunn R B. Speaker Verification Using Adapted Gaussian Mixture Models. Digital Signal Processing, 2000,10(1-3): 19-41.
    [7] McKenna S J, Gong S, Raja Y. Modelling Facial Colour and Identity with Gaussian Mixtures.Pattern Recognition, 1998,31(12):1883-1892.
    [8] Duda R O, Hart P E, Stork D G. Pattern classification (2nd edition). New York: John Wiley & Sons, 2001.
    [9] Bishop C M. Pattern Recognition and Machine Learning. New York: Springer-Verlag, 2006.
    [10] Kim B S. Studies of Multinomial Mixture Models[Doctor Thesis]. NC, USA: University of North Carolina at Chapel Hill, 1984.
    [11] Webb A R. Gamma Mixture Models for Target Recognition. Pattern Recognition, 2000,33(12):2045-2054.
    [12] Han H, Xu W, Zha H, et al. A Hierarchical Naive Bayes Mixture Model for Name Disambiguation in Author Citations. Proceedings of the ACM Symposium on Applied Computing Table of Contents, 2005. 1065-1069.
    [13] Chib S. Calculating Posterior Distributions and Modal Estimates in Markov Mixture Models.Journal of Econometrics, 1996,75(l):79-97.
    [14] Aldrich J. R. A. Fisher and the Making of Maximum Likelihood 1912-1922. Statistical Science, 1997, 12(3): 162-176.
    [15] Dempster A P, Laird N M, Rubin D B. Maximum Likelihood from Incomplete Data via the EM Algorithm. Journal of the Royal Statistics Society, Series B (Methodological), 1977,39(1): 1-38.
    [16] MacQueen J B. Some Methods for Classification and Analysis of Multivariate Observations. Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability, 1967. 281-297.
    [17] Parzen E. On the Estimation of a Probability Density Function and the Mode. Annals of Mathematical Statistics, 1962, 33:1065-1076.
    [18] Wolpert D H, Macready W G. No Free Lunch Theorems for Optimization. IEEE Transactions on Evolutionary Computation, 1997, 1:67-82.
    [19] Vapnik V N. The Nature of the Statistical Learning Theory. New York: Springer-Verlag,1995.
    [20] Akaike H. A New Look at the Statistical Model Identification. IEEE Transactions on Automatic Control, 1974, 19(6):716-723.
    [21] SchwarzG. Estimating the Dimension of a Model. Annals of Statistics, 1978, 6(2):461-464.
    [22] Ferguson T. Bayesian Analysis of Some Nonparametric Problems. Annals of Statistics,1973, 1(2):209-230.
    [23] Neal R M. Markov Chain Sampling Methods for Dirichlet Process Mixture Models. Journal of Computational and Graphical Statistics, 2000, 9(2):249-265.
    [24] Rasmussen C. The Infinite Gaussian Mixture Model. Advances in Neural Information Processing Systems, 2000, 12:554-560.
    [25] Geman S, Geman D. Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1984,6(6):721-741.
    [26] Stauffer C, Grimson W. Adaptive background mixture models for real-time tracking. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 1999. 246-252.
    [27] Jepson A D, Fleet D J, El-Maraghi T F. Robust Online Appearance Models for Visual Tracking. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2003, 25(10): 1296-1311.
    [28] Zivkovic Z. Improved Adaptive Gaussian Mixture Model for Background Subtraction. Proceedings of the International Conference on Pattern Recognition, volume 2, 2004. 28-31.
    [29] Vlassis N, Likas A. A Greedy EM Algorithm for Gaussian Mixture Learning. Neural Processing Letters, 2002,15:77-87.
    [30] Verbeek J J, Vlassis N, Krose B. Efficient Greedy Learning of Gaussian Mixture Models.Neural Computation, 2003,15(2):469-485.
    [31] Zivkovic Z, Heijden F. Recursive Unsupervised Learning of Finite Mixture Models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2004,26(5):651-656.
    [32] Ueda N, Nakano R, Ghahramani Z, et al. SMEM Algorithm for Mixture Models. Neural Computation, 2000,12(12):2109-2128.
    [33] Hall P, Hicks Y. A Method to add Gaussian Mixture Models. Technical Report 2004-03,ISSN 1740-9497,2004..
    [34] Arandjelovic O, Cipolla R. Incremental Learning of Temporally Coherent Gaussian mixture models. Proceedings of the 16th British Machine Vision Conference, 2005.
    [35] Song M, Wang H. Highly Efficient Incremental Estimation of Gaussian Mixture Models for Online Data Stream Clustering. Proceedings of SPIE on Intelligent Computing - Theory and Applications Ⅲ, volume 5803,2005. 174-183.
    [36] Neal R, Hinton G. A View of the EM Algorithm that Justifies Incremental, Sparse, and other Variants. In: Jordan M I, (eds.). Proceedings of Learning in Graphical Models. New York: Kluwer Academic, 1998:.
    [37] Vezhnevets V, Sazonov V, Andreeva A. A Survey on Pixel-Based Skin Color Detection Techniques. Proceedings of Graphicon-2003 (Moscow, Russia), 2003. 85-92.
    [38] Phung S L, Bouzerdoum A, Chai D. Skin Segmentation Using Color Pixel Classification:Analysis and Comparison. IEEE Transactions on Pattern Analysis and Machine Intelligence,2005,27(1): 148-154.
    [39] Land E H. The Retinex Theory of Color Vision. Scientific American, 1977,237(6): 108-128.
    [40] Martinkauppi B, Soriano M, Pietikainen M. Detection of Skin Color Under Changing Illumination: A Comparative Study. Proceedings of the 12th International Conference on Image Analysis and Processing, 2003. 652-657.
    [41] Jones M J, Rehg J M. Statistical Color Models with Application to Skin Detection. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition,1999. 274-280.
    [42] Hsu R L, Mohamed A M, Jain K A. Face Detection in Color Images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2002,24(5):696-706.
    [43] Raja Y, McKenna S J, Gong S. Tracking and Segmenting People in Varying Lighting Conditions using Colour. Proceedings of the IEEE International Conference on Automatic Face and Gesture Recognition, 1998. 228-233.
    [44] Sigal L, Sclaroff S, Athitsos V. Estimation and Prediction of Evolving Color Distributions for Skin Segmentation under Varying Illumination. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2000. 152-159.
    [45] Soriano M, Martinkauppi B, Huovinen S, et al. Adaptive Skin Color Modeling Using the Skin Locus for Selecting Training Pixels. Pattern Recognition, 2003, 36(3):681-690.
    [46] Sahbi H, Boujemaa N. From Coarse to Fine Skin and Face Detection. Proceedings of the ACM International Conference on Multimedia, 2000. 432-434.
    [47] Zhu Q, Wu C T, Cheng K T, et al. An Adaptive Skin Model and Its Application to Objectionable Image Filtering. Proceedings of the ACM International Conference on Multimedia,2004. 56-63.
    [48] Albiol A, Torres L, Delp E J. Optimum Color Spaces for Skin Detection. Proceedings of the IEEE International Conference on Image Processing, 2001. 122-124.
    [49] Canny J. A Computational Approach to Edge Detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1986, 8(6):679-698.
    [50] Adams R, Bischof L. Seeded Region Growing. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1994, 16(6):641-647.
    [51] Li B, Xue X, Fan J. A Robust Incremental Learning Framework for Accurate Skin Region Segmentation in Color Images. Pattern Recognition, 2007,40(12):3621-3632.
    [52] Cortes C, Vapnik V. Support Vector Networks. Machine Learning, 1995, 20(3):273-297.
    [53] Hastie T, Tibshirani R, Friedman J. The Elements of Statistical Learning. New York:Springer-Verlag, 2001.
    [54] Fisher R A. The Use of Multiple Measurements in Taxonomic Problems. Annals of Eugenics,1936,7:179-188.
    [55] Rao C R. Linear Statistical Inference and Its Applications. New York: John Wiley & Sons,1973.
    [56] Hastie T, Tibshirani R. Discriminant Analysis by Gaussian Mixtures. Journal of the Royal Statistics Society, Series B (Methodological), 1996, 58(1): 155-176.
    [57] Klautau A, Jevtic N, Orlitsky A. Discriminative Gaussian Mixture Models: A Comparison with Kernel Classifiers. Proceedings of the 20th International Conference on Machine Learning, 2003. 353-360.
    [58] Kim M, Pavlovic V. A Recursive Method for Discriminative Mixture Learning. Proceedings of the 24th International Conference on Machine Learning, 2007. 409-416.
    [59] Rubinstein Y D, Hastie T. Discriminative vs Informative Learning. Proceedings of the Third International Conference on Knowledge Discovery and Data Mining, 1997. 49-53.
    [60] Ng A Y, Jordan M I. On Discriminative vs. Generative Classifiers: A Comparison of Logistic Regression and Naive Bayes. Proceedings of Advances in Neural Information Processing Systems, 2001.
    [61] Bouchard G, Triggs B. The Trade-off Between Generative and Discriminative Classifiers.Proceedings of International Conference on Computational Statistics, 2004. 721-728.
    [62] Lasserre J A, Bishop C M, Minka T P. Principled Hybrids of Generative and Discriminative Models. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2006. 87-94.
    [63] Yu H, Yang J, Han J. Classifying Large Data Sets Using SVMs with Hierarchical Clusters.Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2003. 306-315.
    [64] Yuan J, Li J, Zhang B. Learning Concepts from Large Scale Imbalanced Data Sets Using Support Cluster Machines. Proceedings of the ACM International Conference on Multimedia, 2006. 441-450.
    [65] Fung G, Mangasarian O L. Proximal Support Vector Machine Classifiers. Proceedings of the 7th International Conference on Knowledge Discovery and Data Mining, 2001. 77-86.
    [66] Lawrence N, Seeger M, Herbrich R. Fast Sparse Gaussian Process Methods: The Informative Vector Machine. Proceedings of Advances in Neural Information Processing Systems,Workshop on Kernel Methods, 2003.
    [67] Hartigan J A, Wong M A. A K-Means Clustering Algorithm. Applied Statistics, 1979,28:100-108.
    [68] Zhang T, Ramakrishnan R, Livny M. BIRCH: An Efficient Data Clustering Method for Very Large Databases. Proceedings of the ACM SIGMOD International Conference on Management of Data, 1996. 103-114.
    [69] Friedman M, Kandel A. Introduction to Pattern Recognition. London: Imperial College Press, 1999: 70-73.
    [70] Jebara T, Kondor R, Howard A. Probability Product Kernels. Journal of Machine Learning Research, 2004,5:819-844.
    [71] Osuna E, Freund R, Girosi F. An Improved Training Algorithm for Support Vector Machines. Proceedings of the IEEE Workshop on Neural Networks for Signal Processing, 1997. 276-285.
    [72] Platt J. Fast Training of Support Vector Machines using Sequential Minimal Optimization.In: Schoelkopf B, Burges C J C, Smola A J, (eds.). Proceedings of Advances in Kernel Methods - Support Vector Learning. Cambridge: MIT Press, 1999: 185-208.
    [73] Joachims T. Making Large-Scale SVM Learning Practical. In: Schoelkopf B, Burges C J C,Smola A J, (eds.). Proceedings of Advances in Kernel Methods - Support Vector Learning.Cambridge: MIT Press, 1999: 169-184.
    [74] Collobert R, Bengio S. SVMTorch: Support Vector Machines for Large-Scale Regression Problems. Journal of Machine Learning Research, 2001, 1:143-160.
    [75] Keerthi S S, Shevade S K, Bhattacharyya C, et al. Improvements to Platt's SMO Algorithm for SVM Classifier Design. Neural Computation, 2001, 13:637-649.
    [76] Cauwenberghs G, Poggio T. Incremental and Decremental Support Vector Machine Learning. Proceedings of Advances in Neural information Processing Systems, Cambridge: MIT Press, 2000.
    [77] Fung G, Mangasarian O L. Incremental Support Vector Machine Classification. Proceedings of the SIAM International Conference on Data Mining, 2002.
    [78] Laskov P, Gehl C, Krueger S, et al. Incremental Support Vector Learning: Analysis, Implementation and Applications. Journal of Machine Learning Research, 2006, 7:1909-1936.
    [79] Lee Y, Mangasarian O L. RSVM: Reduced Support Vector Machines. Proceedings of the SIAM International Conference on Data Mining, 2001.
    [80] Tsang I W, Kwok J T, Cheung P. Core Vector Machines: Fast SVM Training on Very Large Data Sets. Journal of Machine Learning Research, 2005, 6:363-392.
    [81] Schohn G, Cohn D. Less is More: Active Learning with Support Vector Machines. Proceedings of the International Conference on Machine Learning, 2000.
    [82] Boley D, Cao D. Training Support Vector Machine using Adaptive Clustering. Proceedings of the SIAM International Conference on Data Mining, 2004.
    [83] Sun S, Tseng C L, Chen Y H, et al. Cluster-based Support Vector Machines in Text-Independent Speaker Identification. Proceedings of the International Joint Conference on Neural Network, 2004.
    [84] Chang C, Lin C. LIBSVM: A Library for Support Vector Machines, 2001.
    [85] Cheng Y, Church G. Biclustering of Expression Data. Proceedings of the 8th International Conference on Intelligent Systems for Molecular Biology, 2000. 93-103.
    [86] Dhillon IS, Mallela S, Modha D S. Information-Theoretic Co-clustering. Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2003.89-98.
    [87] Deerwester S, Dumais S, Furnas G W, et al. Indexing by Latent Semantic Analysis. Journal of the American Society for Information Science, 1990,41(6):391-407.
    [88] Hofmann T. Probabilistic Latent Semantic Analysis. Proceedings of the 15th Conference on Uncertainty in Artificial Intelligence, 1999. 289-296.
    [89] Ding C, Li T, Peng W, et al. Orthogonal Nonnegative Matrix Tri-Factorizations for Clustering. Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2006. 126-135.
    [90] Lee D D, Seung H S. Algorithms for Non-negative Matrix Factorization. Advances in Neural Information Processing Systems, 2000, 13:556-562.
    [91] Si L, Jin R. Flexible Mixture Model for Collaborative Filtering. Proceedings of the 20th International Conference on Machine Learning, 2003. 704-711.
    [92] Hofmann T, Puzicha J. Latent Class Models for Collaborative Filtering. Proceedings of the 16th International Joint Conference on Artificial Intelligence, 1999. 688-693.
    [93] Argyriou A, Evgeniou T, Pontil M. Multi-Task Feature Learning. Proceedings of Advances in Neural Information Processing Systems, volume 20, 2007. 41-48.
    [94] Jebara T. Multi-Task Feature and Kernel Selection for SVMs. Proceedings of the 21st International Conference on Machine Learning, 2004. 329-336.
    [95] Caruana R A. Multitask Learning. Machine Learning, 1997,28:41-75.
    [96] Baxter J. A Model of Inductive Bias Learning. Journal of Artificial Intelligence Research,2000,12:149-198.
    [97] Coyle M, Smyth B. Web Search Shared: Social Aspects of a Collaborative, Community-based Search Network. Proceedings of the Fifth International Conference on Adaptive Hypermedia and Adaptive Web-Based Systems, 2008. 103-112.
    [98] Hofmann T, Puzicha J. Statistical Models for Co-occurrence Data. Technical Report AIM-1625, Artifical Intelligence Laboratory Memo, M.I.T., 1998.
    [99] Rosenstein M T, Marx Z, Kaelbling L P. To Transfer or Not to Transfer. Proceedings of theNIPS Workshop on Inductive Transfer: 10 Years Later, 2005.
    [100] Resnick P, Iacovou N, Suchak M, et al. GroupLens: An Open Architecture for Collaborative Filtering of Netnews. Proceedings of the ACM Conference on Computer Supported Cooperative Work, 1994. 175-186.
    [101] Sarwar B, Karypis G, Konstan J, et al. Item-based Collaborative Filtering Recommendation Algorithms. Proceedings of the 10th International World Wide Web Conference, 2001. 285-295.
    [102] Pennock D M, Horvitz E, Lawrence S, et al. Collaborative Filtering by Personality Diagnosis: A Hybrid Memory- and Model-Based Approach. Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence, 2000. 473-480.
    [103] Srebro N, Jaakkola T. Weighted Low-Rank Approximations. Proceedings of the 20th International Conference on Machine Learning, 2003. 720-727.
    [104] Wu M. Collaborative Filtering via Ensembles of Matrix Factorizations. Proceedings of the KDD Cup and Workshop 2007, 2007. 43-47.
    [105] Wilson D C, Smyth B, O'sullivan D. Sparsity Reduction in Collaborative Recommendation: A Case-Based Approach. International Journal of Pattern Recognition and Artificial Intelligence, 2003, 17(5):863-884.
    [106] Xue G R, Lin C, Yang Q, et al. Scalable Collaborative Filtering using Cluster-based Smoothing. Proceedings of 28th Annual International ACM SIGIR Conference, 2005. 114-121.
    [107] George T, Merugu S. A Scalable Collaborative Filtering Framework Based on Co-Clustering.Proceedings of the Fifth IEEE International Conference on Data Mining, 2005. 625-628.
    [108] Chen G, Wang F, Zhang C. Collaborative Filtering using Orthogonal Nonnegative Matrix Tri-factorization. Proceedings of the Workshop on High Performance Data Mining, 2007.303-308.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700