互联网图像高效标注和解译的关键技术研究

英文题名：Research of Large-Scale Web Image Annotation and Interpretation
作者：夏丁胤
论文级别：博士
学科专业名称：计算机科学与技术
中文关键词：图像标注 ; 图像解译 ; 单词可见度 ; 近邻传播聚类 ; 深度学习 ; 数据驱动
英文关键词：Automatic Image Annotation ; Image Interpretation ; Word Visibility ; Data Clustering ; Deep Learning ; Data-Driven
学位年度：2010
导师：庄越挺
学科代码：081202
学位授予单位：浙江大学
论文提交日期：2010-04-01

摘要

作为支持互联网大规模图像检索的一种有效和实用方法,互联网图像自动标注和理解已成为学术界和产业界的热点问题而被深入研究。本文研究了图像视觉内容与伴随文本语义之间的潜在关联关系挖掘、图像解译、大规模数据聚类以及图像视觉特征深度学习等关键性问题。论文的主要工作有：
     一、提出了一种基于数据驱动的互联网图像自动标注和理解框架(Automatic Web Image Annotation and Interpretation, AWIAI)。在图像自动标注过程中,AWIAI框架先计算图像伴随文本中单词可见度属性来构建“图像-单词”关系矩阵,然后对该关系矩阵进行隐性文法分析以扩展备选标注单词,最后通过图像视觉内容的无监督学习和对单词两两共生关系进行分析和排序,得到图像标注最终结果。
     二、在图像自动标注结果的基础上,提出了图像解译的概念和具体实现方法。现有图像自动标注方法未能对标注单词之间存在的语法关系进行分析,因此得到的图像标注结果是若干离散单词,难以对图像所蕴含丰富语义进行自然语言的深层次描绘(如对图像产生“熊猫吃竹子”的分析结果)。该方法在AWIAI框架下得到图像标注单词后,分析标注单词之间的语句关系,产生句法群组,以自然语言方式对目标图像内容进行解译。
     三、对存在致密相似度关系的大规模数据,本文针对性提出了两种改进的近邻传播聚类的方法,即在聚类过程中通过局部信息传递来加快整体信息传递速度的方法,以及通过对局部采样数据进行信息传递,再将其它数据内嵌进去从而得到快速全局近似结果的方法。AWIAI框架以数据驱动为核心进行图像智能处理,因此需要解决大规模数据高效聚类这一难点问题。
     四、在AWIAI的图像理解过程中,本文提出了一种结合模型和数据驱动的深度学习方法(Deep Model-based and Data-driven, DMD)来提取图像理解中最具区别性的视觉特征。近来神经科学理论研究成果认为大脑对外界视觉信息感知是一个逐层学习过程。DMD方法通过一个从简单到复杂的深度学习流程来提取图像视觉特征,先以无监督学习方法获得特征并将其稀疏化,然后通过有监督学习方法实现图像语义理解和标注。
As one of practical and effective ways for large-scale web image retrieval, automatic web image annotation and understanding have been hot topics both in academic and industrial research areas. This dissertation mainly focuses on research issues such as mining of relevance relationship between visual features and surrounding text, image interpretation, large-scale data clustering and deep learning of image features.
     In order to resolve above mentioned issues, this dissertation proposes a data-driven automatic web image annotation and understanding framework (Automatic Web Image Annotation and Interpretation, AWIAI). For the sake of annotating images with suitable words, AWIAI first calculates the visibility of words in surrounding text to build the "image-word" matrix, then extends the initial annotation result by latent visual and semantic analysis, and the final annotated words are obtained by unsupervised learning of visual correlation and co-occurrence of annotation words.
     The current approaches of image annotation only utilizes several discrete words to describe the image semantics since those approaches neglect the statement-level syntactic correlation among the annotated words. As a result, those approaches are inability to render natural language interpretation for images such as "pandas eat bamboo". To solve this problem, "Image Interpretation" is proposed in this dissertation. The basic idea of image interpretation is to discover the statement-level syntactic correlation among annotated words, and produce interpretation results by natural language.
     AWIAI framework is a data-driven pipeline for image processing, which often encounters the problem of large-scale data clustering. This dissertation presents two kinds of clustering approaches for large-scale data with a dense similarity matrix. Partition Affinity Propagation (PAP) passes messages in the subsets of data first and then merges all of data together. PAP can effectively reduce the number of iterations of clustering. Landmark Affinity Propagation (LAP) passes messages between the landmark data first and then clusters other data. LAP is a large global approximation method to speed up clustering.
     Recent advancements in neuroscience have indicated that our human being brain perceives the outside world with a hierarchical learning process. Motivated by such research, a model-based and data-driven hybrid architecture (DMD) is proposed in AWIAI to boost image annotations by learning out discriminant features. DMD first selects a deep learning pipeline to progressively learn visual features from simple to complex. Then DMD integrates deep model-based learning and data-driven learning pipelines together. After the discriminant image representations are obtained by a sparse regularization from both pipelines in an unsupervised way, a supervised learning algorithm is conducted to predict image objects in images.

引文

[1]Wang, J., et al. Real-world image annotation and retrieval:An introduction to the special section. IEEE Transactions on pattern analysis and machine intelligence,2008,30(11):1873-1876
    [2]Wu, L., et al. Flickr distance. Proceeding of the 16th ACM international conference on Multimedia,2008:31-40
    [3]Prasad, B., et al. A Microcomputer-based image database management system. IEEE Transactions on Industrial Electronics,1987:83-88
    [4]Blott, S., et al. What's wrong with high-dimensional similarity search? Proceedings of the VLDB Endowment archive,2008,1(1):3
    [5]庄越挺,潘云鹤,吴飞.网上多媒体信息分析与检索.清华大学出版社,2002
    [6]Hughes, A., et al. Text or pictures? An eyetracking study of how people view digital video surrogates. Lecture Notes in Computer Science,2003:271-280
    [7]Berg, T. and D. Forsyth. Automatic ranking of iconic images. Technical report, UC Berkeley,2007
    [8]Berg, T., A. Berg, and S. Brook. Finding iconic images. The 2nd Internet Vision Workshop at IEEE CVPR,2009
    [9]Datta, R., et al. Image retrieval:Ideas, influences, and trends of the new age. ACM Computing Surveys,2008,40(2):1-60
    [10]Marr, D. Vision:A computational investigation into the human representation and processing of visual information. Henry Holt and Co., Inc. New York, NY, USA,1982
    [11]Stern berg, R. and J. Mio. Cognitive psychology. Wadsworth Pub Co.,2008
    [12]Csurka, G., et al. Visual categorization with bags of keypoints. Workshop on Statistical Learning in Computer Vision, ECCV,2004
    [13]Fei-Fei, L., R. Fergus, and A. Torralba. Recognizing and learning object categories. Short Course CVPR: http://people.csail.mit.edu/torralba/shortCourseRLOC/index. html,2007
    [14]Zhu, S. and D. Mumford. A stochastic grammar of images. Foundations and Trends in Computer Graphics and Vision,2006,2(4):259-362
    [15]MacQueen, J. Some methods for classification and analysis of multivariate observations. Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability,1967
    [16]Shi, J. and J. Malik. Normalized cuts and image segmentation. IEEE Transactions on pattern analysis and machine intelligence,2000,22(8): 888-905
    [17]Grady, L. Random walks for image segmentation. IEEE Transactions on pattern analysis and machine intelligence,2006,28(11):1768-1783
    [18]Wu, Z. and R. Leahy. An optimal graph theoretic approach to data clustering: Theory and its application to image segmentation. IEEE Transactions on pattern analysis and machine intelligence,1993,15(11):1101-1113
    [19]Thackray, B. and A. Nelson. Semi-automatic segmentation of vascular network images using arotating structuring element (ROSE) with mathematical morphology and dual feature thresholding. IEEE transactions on medical imaging,1993,12(3):385-392
    [20]Zhu, L., Y. Chen, and A. Yuille. Unsupervised learning of probabilistic grammar-markov models for object categories. IEEE Transactions on pattern analysis and machine intelligence,2009,31(1):114-128
    [21]Lafferty, J., A. McCallum, and F. Pereira. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proc.18th International Conf. on Machine Learning,2001
    [22]Fei-Fei, L., R. Fergus, and P. Perona. Learning generative visual models from few training examples:An incremental bayesian approach tested on 101 object categories. Computer Vision and Image Understanding,2007,106(1): 59-70
    [23]Wu, L., et al. Visual language modeling for image classification. Proceedings of the international workshop on Workshop on multimedia information retrieval,2007:115-124
    [24]Wu, L., et al. Scale-invariant visual language modeling for object categorization. IEEE Transactions on Multimedia,2009,11(2):286-294
    [25]Lowe, D. Towards a computational model for object recognition in IT cortex. Proceedings of the First IEEE International Workshop on Biologically Motivated Computer Vision,2000:20-31
    [26]Wallraven, C., B. Caputo, and A. Graf. Recognition with local features:the kernel recipe. Proceedings of the Ninth IEEE International Conference on Computer Vision,2003(2):257
    [27]Jamieson, M., et al. Using Language to Learn Structured Appearance Models for Image Annotation. IEEE Transactions on Pattern Analysis and Machine Intelligence,2010,32(1):148-164
    [28]Yang, J., et al. Evaluating bag-of-visual-words representations in scene classification. Proceedings of the international workshop on Workshop on multimedia information retrieval,2007:206
    [29]Agarwal, S., A. Awan, and D. Roth. Learning to detect objects in images via a sparse, part-based representation. IEEE Transactions on pattern analysis and machine intelligence,2004,26(11):1475-1490
    [30]Yuan, J., Y. Wu, and M. Yang. Discovery of collocation patterns:from visual words to visual phrases. IEEE Conference on Computer Vision and Pattern Recognition,2007:1-8
    [31]Lazebnik, S., C. Schmid, and J. Ponce. Beyond bags of features:Spatial pyramid matching for recognizing natural scene categories. Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition,2006(2):2169-2178
    [32]Fergus, R., P. Perona, and A. Zisserman. A sparse object category model for efficient learning and exhaustive recognition. IEEE Computer Society Conference on Computer Vision and Pattern Recognition,2005(1):387
    [33]Cai, D., et al. Block-based web search. Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval,2004:456-463
    [34]Cai, D., et al. Block-level link analysis. Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval,2004:440-447
    [35]Chakrabarti, D., R. Kumar, and K. Punera. A graph-theoretic approach to webpage segmentation. Proceeding of the 17th international conference on World Wide Web,2008:377-386
    [36]Ng, T., et al. Web-data augmented language models for Mandarin conversational speech recognition. IEEE International Conference on Acoustics, Speech, and Signal Processing,2005
    [37]Katz, S. Estimation of probabilities from sparse data for the language model component of a speech recognizer. IEEE Transactions on Acoustics, Speech and Signal Processing,1987,35(3):400-401
    [38]Chen, S. and J. Goodman. An empirical study of smoothing techniques for language modeling. Computer Speech and Language,1999,13(4):359-394
    [39]Jin, Y., L. Wang, and L. Khan. Improving Image Annotations using WordNet. Advances in multimedia information systems:11th international workshop, MIS 2005:115-130
    [40]Miller, G. WordNet:a lexical database for English. Communications of the ACM,1995,38:11-39
    [41]Cilibrasi, R. and P. Vitanyi. The google similarity distance. IEEE Transactions on knowledge and data engineering,2007,19(3):370-383
    [42]Xia, D., F. Wu, and Y. Zhuang. Search-Based Automatic Web Image Annotation Using Latent Visual and Semantic Analysis. Proceedings of the 9th Pacific Rim Conference on Multimedia,2008:842-845
    [43]Ma, W. and B. Manjunath. Texture features and learning similarity. IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 1996,425-430
    [44]Ma, W. and B. Manjunath. Netra:A toolbox for navigating large image databases. Multimedia Systems,1999,7(3):184-198
    [45]Rui, Y., et al. Relevance feedback:A power tool for interactive content-based image retrieval. IEEE Transactions on Circuits and Systems for Video Technology,1998,8(5):644-655
    [46]He, J., et al. Manifold-ranking based image retrieval. Proceedings of the 12th annual ACM international conference on Multimedia,2004,9-16
    [47]Belkin, M. and P. Niyogi. Laplacian eigenmaps and spectral techniques for embedding and clustering. Advances in neural information processing systems,2002,1:585-592
    [48]Chung, F. Spectral graph theory. CBMS Regional Conference Series in Mathematics,1997:92
    [49]He, X., et al. Neighborhood preserving embedding. Tenth IEEE International Conference on Computer Vision,2005,2
    [50]He, X., D. Cai, and P. Niyogi. Tensor subspace analysis. Advances in neural information processing systems,2006,18:499
    [51]He, X., et al. Face recognition using laplacianfaces. IEEE Transactions on pattern analysis and machine intelligence,2005:328-340
    [52]Cai, D., X. He, and J. Han. Isometric projection. Proceedings of the 22nd national conference on Artificial intelligence,2007,1:528-533
    [53]He, X. and P. Niyogi. Locality preserving projections. Advances in neural information processing systems,2003,16:153-160
    [54]Yu, C., W. Luk, and T. Cheung. A statistical model for relevance feedback in information retrieval. Journal of the ACM (JACM),1976,23(2):273-286
    [55]Rui, Y., T. Huang, and S. Chang. Image Retrieval:Current Techniques, Promising Directions, and Open Issues. Journal of visual communication and image representation,1999,10(1):39-62
    [56]Cox, I., et al. The Bayesian image retrieval system, PicHunter:theory, implementation, and psychophysical experiments. IEEE transactions on image processing,2000,9(1):20-37
    [57]Wu, Y., Q. Tian, and T. Huang. Discriminant-EM algorithm with application to image retrieval. IEEE Conference on Computer Vision and Pattern Recognition,2000,1
    [58]Tong, S. and E. Chang. Support vector machine active learning for image retrieval. Proceedings of the ninth ACM international conference on Multimedia,2001:107-118
    [59]He, X., et al. Learning a semantic space from user's relevance feedback for image retrieval. IEEE Transactions on Circuits and Systems for Video Technology,2003,13(1):39-48
    [60]He, X., W. Ma, and H. Zhang. Learning an image manifold for retrieval. Proceedings of the 12th annual ACM international conference on Multimedia, 2004:17-23
    [61]Dorai, C. and S. Venkatesh. Computational media aesthetics:Finding meaning beautiful. IEEE multimedia,2001,8(4):10-12
    [62]Adams, W., et al. Semantic indexing of multimedia content using visual, audio, and text cues. EURASIP Journal on Applied Signal Processing,2003, 2:170-185
    [63]Inoue, M. On the need for annotation-based image retrieval. Proceedings of the Workshop on Information Retrieval in Context (IRiX),2004:44-46
    [64]Mori, Y., H. Takahashi, and R. Oka. Image-to-word transformation based on dividing and vector quantizing images with words. First International Workshop on Multimedia Intelligent Storage and Retrieval Management, 1999
    [65]Duygulu, P., et al. Object recognition as machine translation:Learning a lexicon for a fixed image vocabulary. Proceedings of the 7th European Conference on Computer Vision,2002:97-112
    [66]Jeon, J., V. Lavrenko, and R. Manmatha. Automatic image annotation and retrieval using cross-media relevance models. Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval,2003:119-126
    [67]Gao, Y., et al. Automatic image annotation by incorporating feature hierarchy and boosting to scale up SVM classifiers. Proceedings of the 14th annual ACM international conference on Multimedia,2006:901-910
    [68]Li, J. and J. Wang. Real-time computerized annotation of pictures. Proceedings of the 14th annual ACM international conference on Multimedia, 2006:911-920
    [69]Goh, K., E. Chang, and K. Cheng. SVM binary classifier ensembles for image classification. Proceedings of the tenth international conference on Information and knowledge management,2001:395-402
    [70]Chang, E., et al. CBSA:content-based soft annotation for multimodal image retrieval using Bayes point machines. IEEE Transactions on Circuits and Systems for Video Technology,2003,13(1):26-38
    [71]Cusano, C., G. Ciocca, and R. Schettini. Image annotation using SVM. Society of Photo-Optical Instrumentation Engineers (SPIE) Conference Series,2004,5304:330-338
    [72]Osuna, E., R. Freund, and F. Girosit. Training support vector machines:an application to face detection. IEEE Computer Society Conference on Computer Vision and Pattern Recognition,1997:130-136
    [73]Forsyth, D. and M. Fleck. Body plans. Proceedings of the 1997 Conference on Computer Vision and Pattern Recognition,1997:678
    [74]Vailaya, A., A. Jain, and H. Zhang. On image classification:City images vs. landscapes. Pattern Recognition,1998,31(12):1921-1935
    [75]Kang, F., R. Jin, and R. Sukthankar. Correlated Label Propagation with Application to Multi-label Learning. Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2006:1719-1726
    [76]Carneiro, G., et al. Supervised learning of semantic classes for image annotation and retrieval. IEEE Transactions on pattern analysis and machine intelligence,2007,29(3):394-410
    [77]Yang, C., M. Dong, and J. Hua. Region-based image annotation using asymmetrical support vector machine-based multiple-instance learning. Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition,2006:2057-2063
    [78]Quelhas, P., et al. Modeling scenes with local descriptors and latent aspects. Proceedings of the Tenth IEEE International Conference on Computer Vision, 2005:883-890
    [79]Pentland, A., R. Picard, and S. Sclaroff. Photobook:Content-based manipulation of image databases. International. Journal of Computer Vision, 1996,18(3):233-254
    [80]Shen, H., B. Ooi, and K. Tan. Giving meanings to WWW images. Giving meanings to WWW images,2000:39-47
    [81]Letsche, T. and M. Berry. Large-scale information retrieval with latent semantic indexing. Information Sciences:an International Journal,1997, 100(1-4):105-137
    [82]Turney, P. Measuring semantic similarity by latent relational analysis. Proceedings of the 19th international joint conference on Artificial intelligence,2005:1136-1141
    [83]Hofmann, T. Probabilistic latent semantic indexing. Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval,1999:50-57
    [84]Kandola, J., J. Shawe-Taylor, and N. Cristianini, Learning semantic similarity. In Advances in Neural Information Processing Systems,2003
    [85]Brin, S. and L. Page. The anatomy of a large-scale hypertextual Web search engine. Computer networks and ISDN systems,1998,30(1-7):107-117
    [86]Kleinberg, J. Authoritative sources in a hyperlinked environment. Journal of the ACM (JACM),1999,46(5):604-632
    [87]Lempel, R. and A. Soffer. PicASHOW:Pictorial authority search by hyperlinks on the web. ACM Transactions on Information Systems (TOIS), 2002,20(1):1-24
    [88]He, X., et al. Clustering and searching WWW images using link and page layout analysis. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP),2007,3(2):10
    [89]Wu, X., et al. Real-time near-duplicate elimination for web video search with content and context. IEEE Transactions on Multimedia,2009,11(2):196-207
    [90]O'Hare, N. and A. Smeaton. Context-aware person identification in personal photo collections. IEEE Transactions on Multimedia,2009,11(2):220-228
    [91]Liu, J., et al. Dual cross-media relevance model for image annotation. Proceedings of the 15th international conference on Multimedia,2007: 605-614
    [92]Rui, X., et al. Bipartite graph reinforcement model for web image annotation. Proceedings of the 15th international conference on Multimedia,2007: 585-594
    [93]Zhang, H., Y. Zhuang, and F. Wu. Cross-modal correlation learning for clustering on image-audio dataset. Proceedings of the 15th international conference on Multimedia,2007:273-276
    [94]Ma, Q., A. Nadamoto, and K. Tanaka. Complementary information retrieval for cross-media news content. Information Systems,2006,31(7):659-678
    [95]Jiang, T. and A. Tan. Discovering image-text associations for cross-media web information fusion. Knowledge Discovery in Databases:PKDD,2006: 561-568
    [96]Hardoon, D. and J. Shawe-Taylor. KCCA for different level precision in content-based image retrieval. Proceedings of Third International Workshop on Content-Based Multimedia Indexing,2003
    [97]Hardoon, D., S. Szedmak, and J. Shawe-Taylor. Canonical correlation analysis:an overview with application to learning methods. Neural Computation,2004,16(12):2639-2664
    [98]Yang, Y., et al. Harmonizing Hierarchical Manifolds for Multimedia Document Semantics Understanding and Cross-Media Retrieval. IEEE Transactions on Multimedia,2008,10(3):437-446
    [99]Zhuang, Y., Y. Yang, and F. Wu. Mining semantic correlation of heterogeneous multimedia data for cross-media retrieval. IEEE Transactions on Multimedia,2008,10(2):221-229
    [100]Xia, D., et al. Image interpretation:mining the visible and syntactic correlation of annotated words. Journal of Zhejiang University-Science A, 2009,10(12):1759-1768
    [101]Wu, F., et al. Web image interpretation:semi-supervised mining annotated words. Proceedings of the 2009 IEEE international conference on Multimedia and Expo,2009:1512-1515
    [102]Zhu, X., et al. A Text-to-Picture synthesis system for augmenting communication. Proceedings of the 22nd national conference on Artificial intelligence,2007:1590-1595
    [103]Yeh, T., J. Lee, and T. Darrell. Photo-based question answering. Proceeding of the 16th ACM international conference on Multimedia,2008: 389-398
    [104]Pehcevski, J. and J. Thom. Evaluating focused retrieval tasks. Held in Amsterdam, The Netherlands,2007:33
    [105]Dietterich, T. Machine-learning research. AI magazine,1997,18(4):97
    [106]Zeng, H., et al. Learning to cluster web search results. Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval,2004:210-217
    [107]Cai, D., et al. Hierarchical clustering of WWW image search results using visual, textual and link information. Proceedings of the 12th annual ACM international conference on Multimedia,2004:952-959
    [108]Zhang, T., R. Ramakrishnan, and M. Livny. BIRCH:an efficient data clustering method for very large databases. ACM SIGMOD Record,1996, 25(2):103-114
    [109]Guha, S., R. Rastogi, and K. Shim. CURE:An efficient clustering algorithm for large databases. Information Systems,2001,26(1):35-58
    [110]Karypis, G., E. Han, and V. Kumar. Chameleon:Hierarchical clustering using dynamic modeling. Computer,1999,32(8):68-75
    [111]Lucasius, C., A. Dane, and G. Kateman. On k-medoid clustering of large data sets with the aid of a genetic algorithm:background, feasiblity and comparison. Analytica Chimica Acta,1993,282(3):647-669
    [112]Ng, R. and J. Han. CLARANS:A method for clustering objects for spatial data mining. IEEE transactions on knowledge and data engineering,2002: 1003-1016
    [113]Maila, M. and J. Shi. Learning segmentation with random walk. Advances in Neural Information Processing Systems,2001:873-879
    [114]Hinneburg, A. and D. Keim. An efficient approach to clustering in large multimedia databases with noise. Knowledge Discovery and Data Mining, 1998:58-65
    [115]Ankerst, M., et al. OPTICS:Ordering points to identify the clustering structure. ACM SIGMOD Record,1999,28(2):49-60
    [116]Ester, M., et al. A density-based algorithm for discovering clusters in large spatial databases with noise. Proceedings of the 2nd International Conference on Knowledge Discovery in Databases and Data Mining,1996
    [117]Sheikholeslami, G., S. Chatterjee, and A. Zhang. Wavecluster:A multi-resolution clustering approach for very large spatial databases. Proceedings of the 24rd International Conference on Very Large Data Bases, 1998:428-439
    [118]Zaki, M., et al. New algorithms for fast discovery of association rules. In 3rd Intl.Conf. on Knowledge Discovery and Data Mining,1997
    [119]Wang, W., J. Yang, and R. Muntz. STING:A statistical information grid approach to spatial data mining. Proceedings of the 23rd International Conference on Very Large Data Bases,1997:186-195
    [120]Qin, T., et al. Learning to rank relational objects and its application to web search. Proceeding of the 17th international conference on World Wide Web,2008:407-416
    [121]Zhou, K., et al. Learning to rank with ties. Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval,2008:275-282
    [122]Zhou, D., et al. Ranking on data manifolds. Advances in Neural Information Processing Systems 16:Proceedings of the 2003 Conference, 2004
    [123]Mjolsness, E. and D. DeCoste. Machine learning for science:state of the art and future prospects. Science (New York, NY),2001,293(5537): 2051-2055
    [124]Zhu, X., Z. Ghahramani, and J. Lafferty. Semi-supervised learning using Gaussian fields and harmonic functions. The 20th International Conference on Machine Learning (ICML),2003:912-919
    [125]Zhu, X. and A. Goldberg. Introduction to Semi-Supervised Learning. Synthesis Lectures on Artificial Intelligence and Machine Learning,2009, 3(1):1-130
    [126]Sinha, K. and M. Belkin. Semi-supervised Learning using Sparse Eigenfunction Bases. Advances in Neural Information Processing Systems, 2009
    [127]Shi, T., M. Belkin, and B. Yu. Data spectroscopy:Eigenspace of convolution operators and clustering. The Annals of Statistics,2009,37(6B): 3960-3984
    [128]Yan, S. and H. Wang. Semi-supervised learning by sparse representation. In Proc. SIAM Data Mining Conference,2009:792-801
    [129]Graf, H., et al. Parallel support vector machines:The cascade svm. Advances in neural information processing systems,2005,17(2):521-528
    [130]Song, Y., et al. Parallel spectral clustering. Machine Learning and Knowledge Discovery in Databases,2008:374-389
    [131]Drineas, P. and M. Mahoney. Approximating a Gram matrix for improved kernel-based learning. Learning Theory,2005:323-337
    [132]Kleinberg, J. Challenges in mining social network data:processes, privacy, and paradoxes. Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining,2007:4-5
    [133]Chang, E., H. Bai, and K. Zhu. Parallel algorithms for mining large-scale rich-media data. Proceedings of the seventeen ACM international conference on Multimedia,2009:917-918
    [134]Smeulders, A., et al. Content-based image retrieval at the end of the early years. IEEE Transactions on pattern analysis and machine intelligence,2000, 22(12):1349-1380
    [135]Ke, Y. and R. Sukthankar. PCA-SIFT:a more distinctive representation for local image descriptors. Computer Vision and Pattern Recognition,2004
    [136]Chapelle, O., P. Haffner, and V. Vapnik. SVMs for histogram-based image classification. IEEE transactions on Neural Networks,1999,10(5):1055
    [137]Blei, D., A. Ng, and M. Jordan. Latent dirichlet allocation. The Journal of Machine Learning Research,2003,3:993-1022
    [138]Bengio, Y. Learning deep architectures for AI. Machine Learning,2009, 2(1):1-127
    [139]Hubel, D. and T. Wiesel. Receptive fields and functional architecture of monkey striate cortex. The Journal of Physiology,1968,195(1):215
    [140]Salakhutdinov, R., A. Mnih, and G. Hinton. Restricted Boltzmann machines for collaborative filtering. Proceedings of the 24th international conference on Machine learning,2007:791-798
    [141]Serre, T., et al. Robust object recognition with cortex-like mechanisms. IEEE Transactions on pattern analysis and machine intelligence,2007,29(3): 411-426
    [142]Lee, H., et al. Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations. Proceedings of the 26th Annual International Conference on Machine Learning,2009:609-616
    [143]Jeon, J. and R. Manmatha. Using Maximum Entropy for Automatic Image. Image and video retrieval:third international conference, CIVR 2004: 2040-2041
    [144]Jin, R., J. Chai, and L. Si. Effective automatic image annotation via a coherent language model and active learning. Proceedings of the 12th annual ACM international conference on Multimedia,2004:892-899
    [145]Metzler, D. and R. Manmatha. An inference network approach to image retrieval. Image and video retrieval:third international conference, CIVR, 2004:42-50
    [146]Liu, Y. and F. Wu. Multi-modality video shot clustering with tensor representation. Multimedia Tools and Applications,2009,41(1):93-109
    [147]Liu, Y., et al. Active post-refined multimodality video semantic concept detection with tensor representation. Proceeding of the 16th ACM international conference on Multimedia,2008:91-100
    [148]Liu, Y., et al. Automatic search engine performance evaluation with click-through data analysis. Proceedings of the 16th international conference on World Wide Web,2007:1133-1134
    [149]Pedersen, T., S. Patwardhan, and J. Michelizzi. Wordnet:: similarity-measuring the relatedness of concepts. Proceedings of the National Conference on Artificial Intelligence,2004:1024-1025
    [150]Fergus, R., et al. Learning object categories from google's image search. Proceedings of the Tenth IEEE International Conference on Computer Vision, 2005:1816-1823
    [151]Deerwester, S., et al. Indexing by latent semantic analysis. Journal of the American society for information science,1990,41(6):391-407
    [152]Yan, R., A. Hauptmann, and R. Jin. Multimedia search with pseudo-relevance feedback. Image and video retrieval:Second International Conference, CIVR,2003:238-247
    [153]Li, M., et al. Dual cross-media relevance model for image annotation. Proceedings of the 15th international conference on Multimedia,2007: 605-614
    [154]Rui, X., et al. A Search-Based Web Image Annotation Method. IEEE International Conference on Multimedia and Expo,2007:655-658
    [155]Zhu, X., et al. Improving diversity in ranking using absorbing random walks. Proceedings of NAACL HLT,2007:97-104
    [156]Doyle, P. and J. Snell. Random walks and electric networks. Mathematical Assoc. of America,1984
    [157]Grangier, D. and S. Bengio. A discriminative kernel-based model to rank images from text queries. IEEE transactions on pattern analysis and machine intelligence,2008,30(8):1371-1384
    [158]Ben-Hur, A., et al. Support vector clustering. The Journal of Machine Learning Research,2002,2:125-137
    [159]Donath, W. and A. Hoffman. Lower bounds for the partitioning of graphs. IBM Journal of Research and Development,1973,17(5):420-425
    [160]Fiedler, M. Algebraic connectivity of graphs. Czechoslovak Mathematical Journal,1973,23(2):298-305
    [161]Pothen, A., H. Simon, and K. Liou. Partitioning sparse matrices with eigenvectors of graphs. SIAM Journal on Matrix Analysis and Applications, 1990,11:430-452
    [162]Enright, A., S. Van Dongen, and C. Ouzounis. An efficient algorithm for large-scale detection of protein families. Nucleic acids research,2002,30(7): 1575-1584
    [163]Frey, B. and D. Dueck. Clustering by passing messages between data points. Science,2007,315(5814):972-976
    [164]Frey, B. and D. Dueck. Mixture modeling by affinity propagation. Advances in neural information processing systems,2006,18:379-386
    [165]Kschischang, F., B. Frey, and H. Loeliger. Factor graphs and the sum-product algorithm. IEEE Transactions on information theory,2001, 47(2):498-519
    [166]Bell, R., Y. Koren, and C. Volinsky. Modeling relationships at multiple scales to improve accuracy of large recommender systems. Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining,2007:95-104
    [167]Xia, D., et al. Local and global approaches of affinity propagation clustering for large scale data. Journal of Zhejiang University-Science A, 2008,9(10):1373-1381
    [168]Zhang, X., et al. Partition Affinity Propagation for Clustering Large Scale of Data in Digital Library. International Conferences on the Universal Digital Library,2007
    [169]De Silva, V. and J. Tenenbaum. Global versus local methods in nonlinear dimensionality reduction. Advances in neural information processing systems, 2003:721-728
    [170]De Silva, V. and J. Tenenbaum. Sparse multidimensional scaling using landmark points. Technical Report, Stanford University,2004
    [171]Silva, J., J. Marques, and J. Lemos. Selecting landmark points for sparse manifold learning. Advances in neural information processing systems,2006, 18:1241-1248
    [172]Wittman, T. MANIfold learning Matlab demo. http://www.math.umn.edu/-wittman/mani/index.html,2005
    [173]Kanade, T., J. Cohn, and Y. Tian. Comprehensive database for facial expression analysis. International conference on automatic face and gesture recognition,2000:46-53
    [174]Zhuang, Y., et al. Retrieval of Chinese calligraphic character image. Advances in Multimedia Information Processing-PCM,2004:17-24
    [175]Wu, J., Y. Zhuang, and Y. Pan. Technical features in the Portal to CADAL. Journal of Zhejiang University (Science),2005,6(11)
    [176]Lowe, D. Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision,2004,60(2):91-110
    [177]Belongie, S., J. Malik, and J. Puzicha. Shape matching and object recognition using shape contexts. IEEE Transactions on pattern analysis and machine intelligence,2002:509-522
    [178]Dalai, N., et al. Histograms of oriented gradients for human detection. IEEE Computer Society Conference on Computer Vision and Pattern Recognition,2005
    [179]Grauman, K. and T. Darrell. The pyramid match kernel:Discriminative classification with sets of image features. Tenth IEEE International Conference on Computer Vision,2005
    [180]Goodfellow, I., et al. Measuring invariances in deep networks. Advances in neural information processing systems,2009,22:646-654
    [181]Yasuda, M., T. Banno, and H. Komatsu. Color selectivity of neurons in the posterior inferior temporal cortex of the macaque monkey. Cerebral Cortex,2009
    [182]Riesenhuber, M. and T. Poggio. Are cortical models really bound by the "binding problem". Neuron,1999,24(1):87-93
    [183]Hinton, G., S. Osindero, and Y. The. A fast learning algorithm for deep belief nets. Neural Computation,2006,18(7):1527-1554
    [184]Bengio, Y. and Y. LeCun. Scaling learning algorithms towards AI. Large-Scale Kernel Machines,2007
    [185]Loosli, G., S. Canu, and L. Bottou. Training invariant support vector machines using selective sampling. Large Scale Kernel Machines,2007: 301-320
    [186]Riesenhuber, M. and T. Poggio. Hierarchical models of object recognition in cortex. Nature neuroscience,1999,2:1019-1025
    [187]Serre, T., et al. A theory of object recognition:computations and circuits in the feedforward path of the ventral stream in primate visual cortex. AI Memo No.2005-036, MIT,2005
    [188]Tsunoda, K., et al. Complex objects are represented in macaque inferotemporal cortex by the combination of feature columns. Nature neuroscience,2001,4(8):832-838
    [189]Yamane, Y., et al. Representation of the spatial relationship among object parts by neurons in macaque inferotemporal cortex. Journal of neurophysiology,2006,96(6):3147-3156
    [190]Hubel, D. and T. Wiesel. Receptive fields, binocular interaction and functional architecture in the cat's visual cortex. The Journal of Physiology; 1962,160(1):106-154
    [191]Ranzato, M., et al. Unsupervised learning of invariant feature hierarchies with applications to object recognition. IEEE Conference on Computer Vision and Pattern Recognition,2007:1-8
    [192]Jones, J. and L. Palmer. An evaluation of the two-dimensional Gabor filter model of simple receptive fields in cat striate cortex. Journal of neurophysiology,1987,58(6):1233-1258
    [193]Lampl, I., et al. Intracellular measurements of spatial integration and the MAX operation in complex cells of the cat primary visual cortex. Journal of neurophysiology,2004,92(5):2704-2713
    [194]Gawne, T. and J. Martin. Responses of primate visual cortical V4 neurons to simultaneously presented stimuli. Journal of neurophysiology,2002,88(3): 1128-1135
    [195]Ojala, T., M. Pietikainen, and T. Maenpaa. Gray scale and rotation invariant texture classification with local binary patterns. Computer Vision-ECCV,2000,1842:404-420
    [196]Ojala, T., M. Pietikainen, and T. Maenpaa. Multiresolution gray scale and rotation invariant texture analysis with local binary patterns. IEEE Transactions on pattern analysis and machine intelligence,2002,24(7): 971-987
    [197]Core, S., C. Porac, and L. Ward, Sensation and perception 5th edition. Wiely,2003
    [198]Chang, E., B. Li, and C. Li. Toward perception-based image retrieval. IEEE Workshop on Content-based Access of Image and Video Libraries, 2000:101-105
    [199]Tamura, H., S. Mori, and T. Yamawaki. Textural features corresponding to visual perception. IEEE Transactions on Systems, Man and Cybernetics, 1978,8(6):460-473
    [200]Wu, P., et al. A texture descriptor for browsing and similarity retrieval. Signal Processing:Image Communication,2000,16(1-2):33-43
    [201]Deng, J., et al. ImageNet:a large-scale hierarchical image database. IEEE Conference on Computer Vision and Pattern Recognition,2009
    [202]Crammer, K. and Y. Singer. On the algorithmic implementation of multiclass kernel-based vector machines. The Journal of Machine Learning Research,2002,2:265-292
    [203]Chang, C. and C. Lin. LIBSVM:a library for support vector machines. Software available at http://www.csie.ntu.edu.tw/～cilin/libsvm/, 2001
    [204]Fan, R., et al. LIBLINEAR:A library for large linear classification. The Journal of Machine Learning Research,2008,9:1871-1874
    [205]Dean, J. and Ghemawat, S. MapReduce:simplified data processing on large clusters. Communications of the ACM,2008,51(1):107-113
    [206]吴飞,庄越挺.互联网跨媒体分析与检索：理论与算法.计算机辅助设计与图形学学报,2010,22(1)：1-9

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700