摘要
大数据时代,在实际应用中所面临的数据体量大幅增长,由于对数据进行详细标记的难度很大而且成本极高,弱标签数据已经成为了大数据时代所面临的主要数据。比例标签数据作为弱标签数据中的一个重要类型,有着广阔的应用场景,但目前仍未受到广泛关注。已有的比例标签学习模型在处理大规模问题时,计算速度往往较慢。为了提高学习速度,本文提出Lap-Inv Cal模型,利用LapESVR进行比例标签学习。大量实验表明,该模型在保证较高精度的同时,大幅提升了训练速度,能够广泛应用于大规模比例标签学习问题中。
In big data era,data volume has experienced a significant increase and it is nearly impossible to label all the collected data samples. As a result,weakly labeled data has become dominant in real world applications. Data labeled with class proportions is one of the most important categories in weakly labeled data,which has wide application scenarios but attracts little attention. Existing methods for Learning with Label Proportion Problem( LLP) usually have high complexity and are not efficient to solve large scale problems. In this paper,motivated by Lap ESVR and Inv Cal,we propose a novel LLP model named Lap-InvCal,which incorporates the idea of manifold learning into LLP. Extensive experiments demonstrate the high accuracy and speed of Lap-Inv Cal,indicating the promising potential of Lap-InvCal in handling big data.
引文
[1]Turner V.,Gantz J.F.,Reinsel D.,et al.The Digital Universe of Opportunities:Rich Data and the Increasing Value of the Internet of Things[R].International Data Corporation,White Paper,IDC_1672,2014
[2]Mann G.S.,McCallum A.Generalized Expectation Criteria for Semi-Supervised Learning with Weakly Labeled Data[J].Journal of Machine Learning Research,2010,11(2):955-984
[3]Tang K.,Sukthankar R.,Yagnik J.,et al.Discriminative Segment Annotation in Weakly Labeled Video[C].In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,2013
[4]Xu X.,Li W.,Xu D.,et al.Co-Labeling for Multi-View Weakly Labeled Learning[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2016,38(6):1113-1125
[5]程圣军.基于带约束随机游走图模型的弱监督学习算法研究[D].哈尔滨工业大学博士学位论文,2014
[6]Chapelle O.,Scholkopf B.,Zien A.Semi-Supervised Learning[J].IEEE Transactions on Neural Networks,2009,20(3):542-542
[7]Zhu X.,Ghahramani Z.,Lafferty J.Semi-Supervised Learning Using Gaussian Fields and Harmonic Functions[C].In Proceedings of the 20th International Conference on Machine Learning,2003
[8]Andrews S.,Tsochantaridis I.,Hofmann T.Support Vector Machines for Multiple-Instance Learning[C].Advances in Neural Information Processing Systems,2003
[9]Zhou Z.H.,Zhang M.L.Multi-Instance Multi-Label Learning with Application to Scene Classification[C].Advances in Neural Information Processing Systems,2007
[10]Rueping S.Svm Classifier Estimation from Group Probabilities[C].In Proceedings of the 27th International Conference on Machine Learning,2010
[11]Quadrianto N.,Smola A.J.,Caetano T.S.,et al.Estimating Labels from Label Proportions[J].The Journal of Machine Learning Research,2009,10:2349-2374
[12]Lai K.T.,Yu F.X.,Chen M.S.,et al.Video Event Detection by Inferring Temporal Instance Labels[C].In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,2014
[13]Yu F.X.,Cao L.,Merler M.,et al.Modeling Attributes from Category-Attribute Proportions[C].In Proceedings of the 22nd ACM International Conference on Multimedia,2014
[14]Yu F.,Liu D.,Kumar S.,et al.Svm for Learning with Label Proportions[C].In Proceedings of the 30rd International Conference on Machine Learning,2013
[15]Stolpe M.,Morik K.Learning from Label Proportions by Optimizing Cluster Model Selection[C].In Joint European Conference on Machine Learning and Knowledge Discovery in Databases,2011
[16]Hernández-González J.,Inza I.,Lozano J.A.Learning Bayesian Network Classifiers from Label Proportions[J].Pattern Recognition,2013,46(12):3425-3440
[17]Belkin M.,Niyogi P.,Sindhwani V.Manifold Regularization:A Geometric Framework for Learning from Labeled and Unlabeled Examples[J].Journal of Machine Learning Research,2006,7(Nov):2399-2434
[18]Chen L.,Tsang I.W.,Xu D.Laplacian Embedded Regression for Scalable Manifold Regularization[J].IEEE Transactions on Neural Networks and Learning Systems,2012,23(6):902-915
[19]Kück H.,de Freitas N.Learning About Individuals from Group Statistics[C].In Proceedings of the Twenty-First Conference on Uncertainty in Artificial Intelligence,2005
[20]Chen B.C.,Chen L.,Ramakrishnan R.,et al.Learning from Aggregate Views[C].In Proceedings of the 22nd IEEE International Conference on Data Engineering,2006
[21]Hinton G.E.,Dayan P.,Revow M.Modeling the Manifolds of Images of Handwritten Digits[J].IEEE Transactions on Neural Networks,1997,8(1):65-74
[22]Melacci S.,Belkin M.Laplacian Support Vector Machines Trained in the Primal[J].Journal of Machine Learning Research,2011,12(3):1149-1184
[23]Joachims T.Transductive Learning Via Spectral Graph Partitioning[C].In Proceedings of the 20th International Conference on Machine Learning,2003
[24]Zhou D.,Bousquet O.,Lal T.N.,et al.Learning with Local and Global Consistency[C].In Advances in Reural Information Processing Systems,2003