层次化视频语义标注与检索

英文题名：Multi-level Video Annotation and Retrieval
作者：袁勋
论文级别：博士
学科专业名称：信号与信息处理
中文关键词：基于内容的视频检索 ; 视频标注 ; 视频类型分类 ; 多实例学习 ; 特征选择半监督学习 ; 主动学习
英文关键词：Content-Based Video Retrieval (CBVR) ; Video Annotation ; Video Genre Categorization ; Multiple-Instance Learning (MIL) ; Feature Selection ; Semi-Supervised Learning (SSL) ; Active Learning
学位年度：2008
导师：吴秀清
学科代码：081002
学位授予单位：中国科学技术大学
论文提交日期：2008-05-01

摘要

随着多媒体、计算机和网络的发展,视频数据飞速增长。为了对这些海量视频数据进行存储、管理、和索引,需要研究高效的基于内容的方法对视频数据进行检索,而视频标注是视频索引和视频搜索的基础。本文研究如何利用机器学习和视频的特征,对视频进行多层次的、基于内容的标注。
     视频在结构上共分四个层次:视频(video)、场景(scene)、镜头(shot)、图象帧(frame)。通常视频标注主要在其中的视频层和镜头层中进行。视频层的标注是对整段视频标注其类型属性。镜头层的标注主要是依据从该镜头中提取的关键帧,标注其对应的语义概念。根据所标注的语义概念对应的是图象帧层次还是物体层次的,镜头层标注又可进一步分为图象帧层标注和物体层标注。本文研究在视频层、图象帧层、和物体层上进行视频标注时的关键问题,主要工作和创新之处归纳为以下几点:
     1.目前视频类型层标注的研究工作通常仅仅标注了几种简单的类型,或者是局限在电影、体育运动等某个特定的类型内标注其子类型,而且使用的分类器也过于简单。本文定义了一个相对完备的视频类型分层表示,分析并提取一系列与类型相关的时空域特征,并提出使用局部和全局优化的多类SVM二叉树提高分类精度。实验结果表明,本文提出的局部和全局优化的SVM二叉树与另外两种典型的SVM多类分类算法、以及现有的视频分类工作中使用的分类器相比,能够获得更高的精确度。
     2.当前的视频类型层标注工作都是采用被动监督学习的方法,需要大量的训练数据和费时费力的手工标注。本文将主动学习引入视频类型层标注,并提出使用后验概率来计算分类器对未标注样本的置信度,然后依据此置信度选择分类器最不确定的样本,也即最“有用”的样本提供给用户进行标注,从而用更少的训练样本获得和大量训练样本近似的分类效果,减轻用户标注训练数据的负担。实验结果表明,本文提出的基于后验概率的主动学习样本选择策略要略好于现有的基于变型空间的主动学习样本选择策略、以及被动学习的样本选择策略。
     3.对于图象帧层视频标注,本文考虑一种经常遇到的实际应用:仅拥有一小部分相关的正例,如何学习该目标概念的模型。此时进行视频标注主要存在下面两个问题:第一,对于仅有正例的训练数据,传统的区分型分类器如SVM等无法直接使用;第二,区分各种语义概念的底层特征有很大的不同,使用统一的特征无法适应各种语义概念的变化。本文提出一个基于流形排序的关键帧图象层视频标注框。对第一个问题,用流行排序解决仅有正例的不足,同时可以利用未标注数据的分布信息。对第二个问题,定义一个特征选择准则,引入特征选择为不同的语义概念选择不同的特征。此关键帧图象层视频标注框架支持新定义的目标概念和新特征的引入。
     4.在物体层视频标注中,传统的多实例学习表达忽略了各种语义概念之间的语义相关性。因此本文提出existence-based多实例表达来描述这种概念间的语义相关性,并根据existence-based表达设计一种新的多实例学习算法MI-AdaBoost。算法首先对训练数据中的每个包进行特征映射,转换成包级特征空间的一个特征矢量,从而将多实例学习转换为传统的监督学习。这种特征映射会为每个包建立一个包含大量噪声的高维特征矢量,可以用AdaBoost进行特征选择并构建分类器。
     5.不同的语义概念对应的底层特征有很大的不同,因此特征选择对视频标注是非常关键的一个问题。以前的研究工作在将多实例学习应用于视频标注时,都忽略了如何在多实例学习情况下做特征选择的问题。由于传统的单实例学习下的特征选择算法通常都无法在多实例学习中直接应用,本文提出了一种多实例学习下的特征选择算法EBMIL,能够在选择映射后的包级特征的同时,选择不同的特征源(颜色、纹理等),从而获得更好的视频标注效果。
With the development of multimedia, computer and internet, there is an explosive increasement of video data. For efficient storage, management, and indexing of these massive video data, we need to investigate more efficient Content-Based Video Retrieval (CBVR) algorithms. Video annotation is a preliminary step for video retrieval and search. In this dissertation, we will investigate how to utilize machine learning techniques and video features, to performan content-based video annotation at different video levels.
     There are totally four levels in video structure: video, scene, shot, and frame. Typically video annotation is performed in video-level and shot-level. Annotation in video-level is to assign video genre information for each video clip. Annotation in shot-level is to annotate the corresponding semantic concept for each shot, based on the key-frame extracted from the shot. Shot-level video annotation is further classified into image-level annotation and object-level annotation, according to the annotated concept belongs to image-level or object-level. In this dissertation, we investigate some prolems in video annotation in video-level, image-level, and object-level. The main contributions and innovations can be summarized as follows:
     1. For video-level annotation, current research works usually annotate several limited genres, or the sub-genres within a certain genre, and their classifiers are often too simple. We define a relatively comprehensive video genre ontology, analyze and extract a series of spatial and temporal features related to video genres. Furthermore, we propose to use a local optimal and global optimal SVM binary-tree for multi-class SVM classification to improve the classification accuracy.
     2. Current research works in video-level annotation usually adopt passive learning, which demand large-scale training data and time-consuming human labeling effort. We incorporate active learning into video genre classification, and propose an SVM active learning algorithm based on posterior probability. We first use posterior probability output by SVM classifier to calculate the confidence of each unlabeled sample, and then select the "most unconfident" samples of the classifier for users to label. The "most unconfident" samples always correspond to the "most valuable" samples for the classifier. Through this active learning strategy, we can use fewer training samples to obtain comparable classification accuray obtained by using large-scale training samples, thus alleviate users' labeling effort.
     3. For key-frame image-level video annotation, we discuss a typical case in video annotation: to learn the target concept using only a small number of positive samples. A novel manifold-ranking based scheme is proposed to tackle this problem. However, video annotation need large scale video data and large scale feature pool to get good performance. In this situation, applying manifold ranking will induce the following two problems: intractable computation cost and the curse of dimensinality. We incorporate two modules, i.e. pre-filtering and feature selection, to tackle the two problems respectively. This scheme is extensible and flexible in terms of adding new features into the feature pool, introducing human interaction on selecting features, and defining new concepts.
     4. In object-level video annotation, because the training data are usually labeled in image-level while the semantic concepts are in regional-level, typical single-instance supervised learning cannot learn the target concept directly. If we deem each image as a labeled bag of multiple instances, and the objects in the image as the instances in the bag, object-level video annotation becomes a typical multiple-instance learning (MIL) problem. However, conventianl multiple-instance learning in video annotation neglects the concept dependencies, i.e. the relationship between positive and negative concepts. Therefore, we propose the existence -based MIL formulation to model the concept dependencies, and present a MIL algorithm MI-AdaBoost according to the existence-based MIL formulation. MI-AdaBoost firstly maps each training bag into a feature vector in a new bag-level feature space, thus translating the MIL problem into a standard single-instance problem. This feature mapping would induce a high-dimensional feature vector with much noise for each bag. Therefore, we utilize AdaBoost to perform feature selection and build the final classifier.
     5. As there are usually large gaps between the effective features for different semantic concepts, feature selection is a key problem in video annotation. Typical feature selection algorithm under single-instance settings usually cannot be adapted directly under multi-instance settings. Previous works on MIL in video annotation often neglect the feature selection problem under MIL settings. We propose a feature selection algorithm named EBMIL under MIL settings. EBMIL is able to select different raw feature sources (color, texture, etc.) during selecting mapped bag-level features, thus achieve better performance in video annotation.

引文

[1]章毓晋,基于内容的视觉信息检索,科学出版设,2003
    [2]S.-W.Smoliar,and H.-J.Zhang.Content-Based Video Indexing and Retrieval.IEEE Multimeida,vol.1,no.2,pp.62-72,1994
    [3]C.-W.Ngo,H.-J.Zhang.and T.C.Pone.Recent Advances in Content-Based Video Analysis.International Journal of Image and Graphics,2001.
    [4]H.-J.Zhang.Content-Based Video Analysis,Retrieval,and Browsing.Book Chapter of Readings in Multimedia Computing and Networking.Academic Press,2002.
    [5]N.Dimitrova,H.-J.Zhang,B.Shahraray,M.I.Sezan,T.S.Huang,and A.Zakhor.Applications of video content analysis and retrieval.IEEE Multimedia,9(3):42-55,Jul-Sep,2002.
    [6]V.-N.Vapnik,Statistical Learning Theory,New York:Wiley,1998.
    [7]V.-N.Vapnik,The Nature of Statistical Learning Theory.Springer,1995.
    [8]边肇祺,张学工.模式识别.清华大学出版社
    [9]O.Duda,P.-E.Hart,D.G Stork,模式分类.
    [10]B.-T.Truong,S.Venkatesh,and C.Dorai,Automatic Genre Identification for Content-Based Video Categorization,In Proceedings of ICPR,pp.1-10,2000.
    [11]Y.Yuan,Q.-B.Song,and J.-Y.Shen,Automatic Video Classification Using Decision Tree Method,In Proceedings of ICMLC,pp.1153-1157,2002.
    [12]Z.Rasheed,Y.Sheikh,and M.Shah,On the use of computable features for film classification,In IEEE Trans.on Circuits and Systems for Video Technology,vol.15,pp.52-64,2005.
    [13]S.Takagi,S.Hattori,K.Yokoyama,A.Kodate,and H.Tominaga,Sports video categorizing method using camera motion parameters,In Proceedings of ICME, vol.2,pp.6-9,2003.
    [14]S.Fischer,R.Lienhart,and W.Effelsberg,Automatic Recognition of Film Genres,In Proceedings of ACM Multimedia,pp.295-304,1995.
    [15]J.H.Friedman.Another approach to polychotomous classification.Technical report,Stanford University,Department of Statistics,1996http://www-stat.stanford.,edu/reports/friedman/poly.ps.Z
    [16]H.-J.Zhang,A.Kankanhalli,and S.-W.Smoliar.Automatic Partitioning of Full-Motion Video.Multimeida System Journal,vol.1,pp.10-28,1993.
    [17]Dong-Jun Lan,Yu-Fei Ma,and Hong-Jiang Zhang.A Systematic Framework of Camera Motion Analysis for Home Video.IEEE International Conference on Image Processing(ICIP).2003.
    [18]J.C.Platt,N.Cristianini,and J.Shawe-Taylor.Large Margin DAGs for Multiclass Classification.In Advances in Neural Information Processing Systems,vol.12,pp.547-553,2000.
    [19]S.Liu,H.Yi,L.-T.Chia.Adaptive Hierarchical Multi-class SVM classifier for Texture-based Image Classification.In Proceedings of IEEE International Conference on Multimedia &Expo,2005.
    [20]S.Tong,and E.Chang.Support vector machine active learning for image retrieval.In proceedings of the 9th ACM international conference on Multimedia (ACM MM).2001.
    [21]S.Tong.Active Learning:Theory and Applications.Ph.D.dissertation,Stanford University.2001.
    [22]R.Yah,A.Hauptmann.Multi-class active learning for video semantic feature extraction.In proceedings of IEEE International Conference on Multimedia &Expo(ICME).2004.
    [23]R.Yan,J.Yang,and A.Hauptmann.Automatically labeling video data using multi-class active learning. In proceedings of IEEE International Conference on Computer Vision (ICCV). 2003.
    [24]T.-F. Wu, C.-J. Lin, R.-C. Weng. Probability Estimates for Multi-class Classification by Pairwise Coupling. The Journal of Machine Learning Research. 2004.
    [25]H.-T. Lin, C.-J. Lin, and R.-C. Weng. A note on Platt's probabilistic outputs for support vector machines. Technical report, Department of Computer Science,National Taiwan University, 2003.
    [26]E. Osuna, R. Freund, and F. Girosi. An improved training algorithm for support vector machines. Proceedings of IEEE NNSP. 1997.
    [27]C.-C. Chang and C.-J. Lin. LIBSVM: a library for support vector machines. 2001. Software available at http://www.csie.ntu.edu.tw/～cilin/libsvrn
    [28] K. Brinker. On Multiclass Active Learning with Support Vector Machines. In proceedings of European Conference on Artificial Intelligence (ECAI). 2004.
    [29]Qing Li, Jun Yang, Yueting Zhuang. Multimedia Information Retrieval, Encyclopedia of Multimedia Technology and Network, Volume II, Published by Idea Group, Margherita Pagani (eds.).

    [30] M. Stricker, M. Orengo. Similarity of color images. SPIE Storage and Retrieval for Image and Video Databases, vol. 2185, pp. 381-392, Feb. 1995.
    [31]J.-R. Smith, and Shih-Fu Chang. Tools and techniques for color image retrieval. In proceedings of SPIE: Storage and Retrieval for Image and Video database, vol. 2670. 1995.
    [32]G. Pass, and R. Zabih. Histogram refinement for content-based image retrieval. IEEE Workshop on Applications of Computer Vision, pp. 96-102. 1996.
    [33]J. Huang, et al. Image indexing using color correlogram. IEEE International Conference on Computer Visoin and Pattern Recognition, pp. 762-768,1997.
    [34]T. Chang, and C.-C. Jay Kuo. Texture analysis and classification with tree-structured wavelet transform. IEEE Trans. On Image Processing, vol. 2, no. 4, pp. 429-441. 1993.
    [35] A. Laine, and J. Fan. Texture classification by wavelet packet signatures. IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 15, no. 11, pp. 1186-1191.1993.
    [36]W.-Y. Ma, and B.-S. Manjunath. A comparison of wavelet features for texture annotation. In Proceedings of IEEE International Conference on ImageProcessing, vol. 2, pp. 256-259,1995.
    [37]H. Tamura, S. Mori, and T. Yamawaki. Texture features corresponding to visual perception. IEEE Trans. on Systems, Man, and Cybernetics, vol. Smc-8, no. 6. 1978.
    [38]J. Mao, and A.-K. Jain. Texture classification and segmentation using multiresolution simultaneous autoregressive models. Patter Recognition, vol. 25, no. 2, pp. 173-188.1992
    [39]R.-M. Haralick, K. Shanmugam, and I. Dinstein. Texture features for image classification. IEEE Trans. on System, Man, and Cybernetics, SMC-3(6):610-621. 1973.
    [40] C. Calvin, Gotlieb, and H.-E. Kreyszig. Texture descriptors based on co-occurrence matrices. Computer Vision, Graphics, and Image Processing, 51:70-86.1990.
    [41]J.-G. Daugman. Complete discrete 2D Gabor transforms by neural networks for image analysis and compression. IEEE Trans. on ASSP, vol. 36, pp. 1169-1179. 1998.
    [42]M.-H. Gross, R. Koch, L. Lippert, and A. Dreger. Multiscale image texture analysis in wavelet spaces. IEEE International Conference on Image Processing. 1994.
    [43]K.-S. Thyagarajan, T. Nguyen, and C. Persons. A maximum likelihood approach to texture classification using wavelet transform. IEEE International Conference on Image Processing. 1994.
    [44]M.-K. Hu. Visual pattern recognition by moment invariants. Computer Methods in Image Analysis. 1977.
    [45]L. Yang, and F. Algregtsen. Fast Computation of Invariant Geimetric Moments: A new Method Giving Correct Results. Proceedings of IEEE International Conference on Image Processing. 1994.
    [46] D. Kapur, Y.-N. Lakshman, and T. Saxena. Computing Invariants using elimination methods. Proceedings of IEEE International Conference on Image Processing. 1995.
    [47] D. Copper, and Z.-B. Lei. On representation and invariant recognition of complex objects based on patches and parts. In Springer Lecture Notes in Computer Science series, 3D Object Representation for Computer Vision. Pp. 139-153. 1995.
    [48]Yueting Zhuang. Intelligent Multimedia information Analysis and retrieval with Applications to Visual Design. Ph.D Thesis, Zhejiang University. 1998.

    [49]Xun Yuan, Xian-Sheng Hua, Meng Wang, Xiuqing Wu. Manifold-Ranking Based Video Concept Detection on Large Database and Feature Pool. ACM Multimedia 2006 (ACM SIGMM 2006), Santa Barbara, CA, USA, Oct 23-27 2006. (EI)

    [50]Xun Yuan, Wei Lai, Tao Mei, Xian-Sheng Hua, Xiuqing Wu. Automatic Video Genre Categorization Using Hierarchical SVM. The International Conference on Image Processing (ICIP 2006), Atlanta, GA, USA, Oct 8-11,2006. (EI)

    [51]Xun Yuan, Xian-Sheng Hua, Meng Wang, Guo-Jun Qi, Xiuqing Wu. A Novel Multiple Instance Learning Approach For Image Retrieval Based on Adaboost Feature Selection. IEEE International Conference on Multimedia and Expo (ICME 2007).Beijing,China.July 2-5,2007.(EI)
    [52]袁勋,吴秀清,洪日昌,宋研,华先胜.基于主动学习SVM分类器的视频分类.中国科学技术大学学报.(已录用)
    [53]Xun Yuan,Xian-Sheng Hua,Meng Wang,Tao Mei,Xiu-Qing Wu.Concept-Dependent Image Annotation via Existence-Based Multiple-Instance Learning.Multimedia Tools and Applications.(In submission)
    [54]Natsev,A.,Naphade,M.R.,Tesic,J.Learning the Semantics of Multimedia Queries and Concepts from a Small Number of Examples.Proceedings of ACM Multimedia,2005
    [55]O.Chapelle,A.Zien,and B.Scholkopf.Semi-supervised Learning.MIT Press,2006.
    [56]R.Yan and M.Naphade.Semi-supervised cross feature learning for semantic concept detection in videos.In IEEE Conference on Computer Vision and Pattern Recognition,July 2005.
    [57]M.Wang,X.-S.Hua,Y.Song,X.Yuan,S.Li,and H.-J.Zhang.Automatic video annotation by semi-supervised learning with kernel density estimation.In Proc.ACM Multimedia,2006.
    [58]A.Ghoshal,P.Arcing,and S.Khudanpur.Hidden markov models for automatic annotation and content-based retrieval of images and video.In ACM Conference on Research & Development on Information Retrieval,2005.
    [59]J.Tang,X.-S.Hua,T.Mei,G.-J.Qi,and X.Wu.Video annotation based on temporally consistent Gaussian random field.Electronics Letters,43(8),2007.
    [60]J.Tang,X.-S.Hua,G.-J.Qi,M.Wang,T.Mei,X.Wu.Structure-Sensitive Manifold Ranking for Video Concept Detection.ACM Multimedia(SIGMM),Augsburg,Germany,Sep.23-29,2007.
    [61]Beyer,K.,Ramakrishnan,R.,and Shaft,U.,When is "nearest neighbor"meaningful? Proceedings of Intertional Conference on Database Theory,1999
    [62]J. Boreczky, and L. Rowe. Comparison of video shot boundary detection techniques. In Proceedings of SPIE Storage and Retrieval for Image and Video Databases, 1996.
    [63]R. Lienhart. Comparison of automatic shot boundary detection algorithm. In Proceedings of SPIE Storage and Retrieval for Image and Video Databases. 1999.
    [64] A. hanjalic. Shot-boundary detection: Unraveled and resolved? IEEE Transactions on Circuits and Systems for Video Technology, 12(2), 2002.
    [65]H. Zhang, C. Low, and S. Smoliar. Video parsing and browsing using compressed data. Multimedia Tools and Applications, pp. 80-111, Mar. 1995.
    [66] S. Pei, and Y. Chou. Effective wipe detection in mpeg compressed video using macroblock type information. IEEE Transactions on Multimedia, 4(3), 2002.
    [67]Gunsel, B. and Tekalp, A. M. Content-Based video abstraction. In Proceedings of the ICIP Conference. 1998.
    [68]Lee, S. and Haye, M. An application for interactive video abstraction. In Proceedings of the ICASSP Conference. 2004.
    [69]Li, Y., Zhang, T., and Tretter, D. An overview of video abstraction techniques. Tech. Rep. HP-2001-191, HP Laboratory. July, 2001.
    [70] Wolf, W. Keyframe selection by motion analysis. In Proceedings of the ICASSP Conference, vol. 2. 1228—1231. 1996.
    [71]Kang, H.W., and Hua, X.S. To Learn Representativeness of Video Frames. Proceedings of the 13th annual ACM international conference on Multimedia, (Nov, 2005), pp: 423-426

    [72] Zhou, D., et al. Ranking on data manifolds. Proceedings of NIPS, 2003.
    [73] Zhou, D., et al. Learning with local and global consistency. Proceedings of NIPS, 2003.
    [74]X.J. Zhu. Semi-Supervised Learning Literature Survey. Computer Science, University of Wisconsin-Madison.
    [75]X.J. Zhu. Semi-Supervised Learning with Graphs. PhD Thesis, CMU-LTI-05-192, 2005.
    [76] A. Y. Ng, M. I. Jordan, and Y. Weiss. On Spectral Clustering: Analysis and an Algorithm. In NIPS, 2001.
    [77]Zhu, X., Ghahramani, Z. Learning from labeled and unlabeled data with label propagation. Technical Report. Carnegie Mellon University. 2002.
    [78]Argyriou, A. Efficient approximation methods for harmonic semi-supervised learning. Master's thesis. University College London. 2004.
    [79] Weiss, Y., Freeman, W.T. Correctness of belief propagation in Gaussian graphical models of arbitrary topology. Neural Computation. 2001.
    [80]R. Bekkerman, R. El-Yaniv, N. Tishby, and Y. Winter. Distributional word clusters vs. words for text categorization. JMLR, 3:1183-1208(this issue), 2003.
    [81] I. Dhillon, S. Mallela, and R. Kumar. A divisive information-theoretic feature clustering algorithm for text classification. JMLR, 3:1265-1287(this issue), 2003.
    [82] G. Forman. An extensive empirical study of feature selection metrics for text classification. JMLR, 3:1289-1306(this issue), 2003.
    [83] K. Torkkola. Feature extraction by non-parametric mutual information maximization. JMLR, 3:1415-1438(this issue), 2003.
    [84] R. Kohavi, and G. John. Wrappers for feature selection. Artificial Intelligence, 97(1-2): 273-324, December 1997.
    [85]L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone. Classification and Regression Trees. Wadsworth and Brooks, 1984.
    [86]J. Weston, A. Elisseff, B. Schoelkopf, and M. Tipping. Use of the zero norm with linear models and kernel methods. JMLR, 3:1439-1461 (this issue), 2003.
    [87]M.-R. Naphade, J.-R. Smith. Proceedings of the 12th annual ACM International Conference on Multimedia (ACM MM). 2004.
    [88]E. Amaldi and V. Kann. On the approximation of minimizing non zero variables or unsatisfied relations in linear systems. Theoretical Computer Science, 209:237-260, 1998.
    [89]T. Dietterich, r. Lathrop, and T. Lozano-perez. Solving the multiple instance problem with the axis-parallel rectangles. Artificial Intelligence, pages 31-71, 1997.
    [90]O. Maron, and T. Lozano-perez. A framework for multiple-instance learning. In proceedings of Neural Information Processing Systems (NIPS), pp: 570-576,1998.
    [91] J. Bi, Y. Chen and J. Wang. A sparse support vector machine approach to region-based image categorization. In Proceedings of IEEE Conf. on Computer Vision and Pattern Recognition, 2005.
    [92]X. Xu, and E. Frank. Logistic regression and boosting for labeled bags of instances. In Dai H, Srikant R, Zhang C (eds). Lecture Notes in Artificial Intelligence 3056, Springer, Berlin, pp 272-281,2004.
    [93] O. Maron, and A. Ratan. Multiple-instance learning for natural scene classification. In proceedings of 15th International Conference on Machine Learning (ICML), pp. 341-349,1998.
    [94]Q. Zhang, S. A. Goldman, W. Yu, and J. Fritts. Content-Based Image Retrieval Using Multiple-instance Learning. In proceedings of 19th International Conference on Machine Learning (ICML), pp. 682-689,2002.
    [95] Y. Chen, and J. Wang. Image categorization by learning and reasoning with regions. Journal of Machine Learning Research (JMLR), 5:913-939,2004.
    [96]Z. -H. Zhou, and M. -L. Zhang. Solving multi-instance problems with classifier ensemble based on constructive clustering. Knowledge and Information Systems, in press.
    [97]J. Wang, and J. -D. Zucker. Solving the multiple-instance problem: a lazy learning approach. In Proceedings of the 17th International Conference on Machine Leanring, pp. 1119-1125,2000.
    [98] S. Andrew, T. Hofmann, and I. Tsochantaridis. Multiple instance learning with generalized support vector machines. In Proceedings of the 18th National Conference on Artificial Intelligence, pp. 943-944, 2002.
    [99] S. Andrew, I. Tsochantaridis, and T. Hofmann. Support Vector Machine for Multiple-instance Learning. In proceedings of Neural Information Processing Systems (NIPS), 2002.
    [100] C. Yang, M. Dong, and J. Hua. Region-based image annotation using asymmetrical support vector machine-based multi-instance learning. In proceedings of IEEE International Conference on CVPR, 2006.
    [101] D. A. Spielman, and S. H. Teng. Smoothed Analysis of Algorithms: Why the Simplex Algorithm usually takes polynomial time. Annual ACM symp. OnTheory of Computing, 2001.
    [102] J. Friedman, T. Hastie, and R. Tibshirani. Additive logistic regression: A statistical view of boosting. Annals of Statistics, 2000.

    [103] Y. Chen, J. Bi, and J. Wang. MILES: Multiple-Instance Learning via Embedded Instance Selection. In IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 2006.
    [104] A. Makhorin. http://www.gnu.org/software/glpk/glpk.html. GLPK(GNU Linear Programming Kit).
    [105] G -J. Qi, X. -S. Hua, Y. Rui, T. Mei, J. -H. Tang, H. -J. Zhang. Concurrent multiple-instance learning for image categorization. In Proceedings of IEEE Conf. on Computer Vision and Pattern Recognition, 2007.
    [106] N. Weidmann, E. Frank, and B. Pfahringer. A two-level learning method for generalized multi-instance problem. Lecture Notes in Artificial Intelligence (LNAI), pp. 468-479,2003.
    [107] M. -L. Zhang, and Z. -H. Zhou. Improve multi-instance neural networks through feature selection. Neural Processing Letters, 2004.
    [108] J. Zhu, S. Rosset, T. Hastie, and R. Tibshirani. 1-norm SVM Support Vector Machines. Advances in Neural Information Processing Systems(NIPS), 16, 2004.
    [109] Y. Deng and B. S. Manjunath. Unsupervised segmentation of color-texture regions in images and video. IEEE Trans on Pattern Analysis and Machine Intelligence (PAMI). 2001.
    [110] R. Rahmani, S.-A. Goldman. MISSL:Multiple-instance Semi-Supervised Learning. Proceedings of the 23rd Internatinal Conference on Machine Learning(ICML). 2006.
    [111] M.-L. Zhang, and Z.-H. Zhou. Multi-Instance Clustering with Applications to Multi-Instance Prediction. Applied Intelligence.
    [112] B. Settles, and M. Craven. Multiple-Instance Active Learning. In proceedings of Neural Information Processing Systems (NIPS). 2007
    [113] R.-C. Bunescu, and R.-J. Mooney. Multiple-Instance Learning for Sparse Positive bags. Proceedings of Internatinal Conference on Machine Learning(ICML). 2007.
    [114] Z.-H. Zhou, and J.-M. Xu. On the Relation between Multi-Instance Learning and Semi-Supervised Learning. Proceedings of International Conference on Machine Learning(ICML). 2007.
    [115] S. Sonnenburg, G. Raetsch, C. Schaefer, & B. Scholkopf. Large Scale Multiple Kernel Learning. Journal of Machine Learning Research, 7, 1531-1565.
    [116] A. Rakotomamonjy, F. Bach, S. Canu, Y. Grandvalet. More Efficiency in Multiple Kernel Learning. Proceedings of International Conference on Machine Learning(ICML).2007.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700