用户名: 密码: 验证码:
多模态特征融合和变量选择的视频语义理解
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
随着计算机技术及互联网应用的迅速发展,多媒体数据特别是视频数据呈海量趋势增长,如何有效存储、管理、传输、检索和使用这些多媒体数据,是摆在人们面前巨大的挑战和亟待解决的研究问题。视频数据蕴含了丰富的语义,同时视频又是时序数据,视频中存在图像、音频和文本三种媒质数据,并呈现时序关联共生特性。本文针对视频数据中多种模态之间的时序关联特性,通过特征融合和变量选择来进行视频语义分析与理解。
     在视频语义信息理解和挖掘中,充分利用图像、音频和文本等多模态媒质之间的交互关联是非常重要的研究方向。考虑到视频的多模态和时序关联共生特性,提出了一种基于多模态子空间相关性传递的语义概念检测方法来挖掘视频的语义信息。该方法对所提取视频镜头的多模态底层特征,根据共生数据嵌入和相似度融合进行多模态子空间相关性传递而得到镜头之间的相似度关系,接着通过局部不变投影对原始数据进行降维以获得低维语义空间内的坐标,再利用标注信息训练分类模型,从而可对训练集外的测试数据进行语义概念检测,实现视频语义信息挖掘。实验表明这一方法有较高的准确率。
     传统视频表达所采用的向量模型除了会产生高维向量而导致“维度灾难”问题外,同时在降维过程中,由于特征向量过高的维度及训练样本的数据不足,将不同类型特征进行拼合会引起“过压缩”问题,以致丢失大量信息。另外,不同类型特征通过简单向量拼接也在一定程度上减弱或忽略了视频中这些多种模态特征之间的时序关联共生性。为了解决这一问题,提出了一种基于高阶张量表示的视频语义分析与理解框架。在这个框架中,视频镜头首先被表示成由视频中所包含的文本、视觉和听觉等多模态数据构成的3阶张量;其次,基于此3阶张量表达及视频的时序关联共生特性设计了一种子空间嵌入降维方法,称为“张量镜头”;由于半监督学习从已知样本出发能对特定的未知样本进行学习和识别,最后在这个框架中提出了基于“张量镜头”的直推式支持张量机算法以及两种基于主动学习的后精化处理策略,其不仅保持了张量镜头所在的流形空间的本征结构,而且能将训练集合外数据直接映射到流形子空间,同时充分利用未标记样本改善分类器的学习性能。实验结果表明本方法能有效地进行视频镜头的语义概念检测。
     为了更加有效利用标记样本,基于压缩感知和稀疏表示理论,结合稀疏表达、非负矩阵分解和监督学习,提出了基于(非负)组稀疏表示的分类方法对图像和视频进行分类思路。其基本思想是将测试样本表示为训练样本的加权线性组合:即在非负l1正则化因子约束下,对每个训练样本求取一个回归系数,同时每一类别也求取加权系数,使得在训练过程中能基于稀疏系数对类别中所有样本同时选择或放弃。另外,非“负”回归加权系数使得视频和图像理解过程更加具有可解释性(interpretable)。基于(非负)组稀疏表示的分类方法优势在于能有效利用类别信息对视频和图像进行变量选择,不仅提高了语义分类精度,而且使得这一过程更具可解释性。
With the recent advances in computer technologies and Internet applications, the number of multimedia files and archives increase dramatically, and video data constitute the majority. Therefore, efficient and fast content-based video storage, management, indexing, browsing and retrieval have become important research topics. Video data comprises plentiful semantics.such as people.object, event and story.etc. In general, video data compose of three low level modalities namely the image, audio. and text modalities. These multiple modalities in video are in essence characteristic of temporal associated cooccurrence (TAC). Considering the TAC of the multiple modalities of video data, this paper proposes effective feature fusion and variable selection schemes to better analyze video semantic contents.
     Interaction and integration of multi-modality media types such as visual, audio and textual data in video are the essence of video content analysis. A great deal of research has been focused on utilizing multi-modality features for better understanding of video semantics. In this paper, we propose a new approach to detect semantic concept in video using Co-Occurrence Data Embedding (CODE), SimFusion, and Locality Preserving Projections (LPP) from temporal associated co-occurring multimodal media data in video. CODE is a method for embedding objects of different types into the same low dimension Euclidean space based on their co-occurrence statistics. SimFusion is an effective algorithm to reinforce or propagate the similarity relations between multiple modalities. LPP is an optimal combination of linear and nonlinear dimensionality reduction method. Our experiments show that by employing these key techniques, we can improve the performance of video semantic concept detection and get better video semantics mining results.
     Traditionally, the multimodal media features in video are preferred to be represented merely by concatenated vectors, whose high dimensionalities always cause the problem of "curse of dimensionality". Besides, over-compression problem will occur when the sample vector is very long and the number of training samples is small, which results in loss of information in the dimension reduction process. This paper proposes a higher-order tensor framework for video analysis and understanding. In this framework, we represent image frame, audio and text which are the three modalities in video shots as data points by the 3rd-order tensor. Then we propose a novel video representation and dimension reduction method which explicitly considers the manifold structure of the tensor space from temporal-sequenced associated co-occurring multimodal media data. We call it TensorShot approach. Semi-supervised learning used large amount of unlabeled data together with the labeled data, to build better classifiers. We propose a new transductive support tensor machines algorithm to train effective classifier and an active learning based contextual and temporal post-refining strategy to enhance detection accuracy. Our algorithm preserves the intrinsic structure of the submanifold where tensorshots are sampled, and is also able to map out-of-sample data points directly. Moreover. the utilization of unlabeled data builds better classifiers. Experiment results show that our method improves the performance of video semantic concept detection.
     Based on Compressive Sensing and Sparse Representation theories, also with the idea of nonnegative matrix factorization and supervised learning, this paper develops a novel approach to image and video representation, classification and retrieval, which we call group sparse representation. The basic idea is to represent a test image as a weighted combination of all the training images. In particular, we introduce two sets of weight coefficients, one of which is for each training image and another is for each class, which does varable selection at the class level. Moreover, due to the "nonnegative" features of image and video, we impose nonnegative constraints to the coefficients to make the classifier more interpretable and additive model. Specifically, we formulate our concern as a group nonnegative garrote model. The resulting representations are sparse, and they are appropriate for discriminant analysis.
引文
[1]中国国家统计局2004年统计数.2005-02-15.
    [2]Peter Lyman and Hal R. Varian. How much information. Retrieved from http://www.sims.berkeley.edu/how-much-info-2003,2007-01-03.
    [3]A. W. M. Smeulders. M. Worring. S. Santini. A. Gupta, and R. Jain. Content-based image retrieval at the end of the early years. IEEE Transaction on Pattern Analysis and Machine Intelligence,2000,22(12):1349-1379.
    [4]Stephen W. Smoliar and Hong-Jiang Zhang. Content based video indexing and retrieval. IEEE Multimedia,1994,1(2):62-72.
    [5]Tom Mitchell. Machine Learning.1997:McGraw Hill.
    [6]Cees G M. Snoek, Marcel Worring. Jan-Mark Geusebroek, Dennis C. Koelma. Frank J. Seinstra. and Arnold. W. M. Smeulders. The semantic pathfinder:Using an authoring metaphor for generic multimedia indexing. TPAMI,2006,28(10): 1678-1689.
    [7]Yanjun Qi. Alexander Er Hauptmann and Ting Liu. Supervised classification for video shot segmentation. Proceedings of 2003 IEEE International Conference on Multimedia & Expo,2003.
    [8]Lie Lu, Hao Jian and Hong-Jiang Zhang. A robust audio classification and segmentation method. In Proceedings of the ninth ACM international conference on Multimedia,2001,203-211.
    [9]Jan van Gemert. Jan M. Geusebroek, Cor J. Veenman, and Arnold. W. M. Smeulders. Kernel codebooks for scene categorization. In ECCV'08:Proceedings of the 10th European Conference on Computer Vision,2008:Springer-Verlag. 696-709.
    [10]Ling-Yu Duan, Min Xu, Qi Tian, Chang-Sheng Xu, and Jin J. S. A unified framework for semantic shot classification in sports video. IEEE Transactions on Multimedia,2005,7(6):1066-1083.
    [11]Yong Rui, Thomas S. Huang, Michael Ortega, and Sharad Mehrotra. Relevance feedback:a power tool in interactive content-based image retrieval. IEEE Transaction on Circuits and Systems for Video Technology.1998.8(5):644-655.
    [12]David D. Lewis and William A. Gale. A sequential algorithm for training text classifiers. In Proceedings of the 17th Annual ACM SIGIR Conference on Research and Development in Information Retrieval,1994.3-12.
    [13]Xiaojin Zhu,Rui Castro, Tim Rogers, Rober Nowak. Ruichen Qian. and Chuck Kalish. Human active learning. Advances in Neural Information Processing Systems (NIPS) 22,2008.
    [14]Ingemar J. Cox, Matthew L. Miller, Jeffrey Bloom, and Jeffrey A. Bloom. Digital watermarking.2002:Morgan Kaufmann.
    [15]Jiawei Han and Micheline Kamber. Data Mining:Concepts and Techniques.2nd edition.2006:Morgan Kaufmann.
    [16]Amit Singhal. Modern Information Retrieval:A Brief Overview. Bulletin of the IEEE Computer Society Technical Committee on Data Engineering,2001,24(4): 35-43.
    [17]Emmanuel J. Candes. Justin Romberg and Terence Tao. Robust uncertainty principles:Exact signal reconstruction from highly incomplete frequency information. IEEE Trans. on Information Theory,2006,52(2):489-509.
    [18]Emmanuel Candes. Compressive sampling. Proceedings of the International Congres of Mathematicians,2006.
    [19]N. Kwak. Principal Component Analysis Based on Ll-Norm Maximization. IEEE Transactions on Pattern Analysis and Machine Intelligence,2008,30(9): 1672-1680.
    [20]Michael Elad. Optimized projections for compressed sensing. IEEE Trans. on Signal Processing,2007,55(12):5695-5702.
    [21]Yi Ma, John Wright and Allen Yang. Robust face recognition via sparse representation. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2009.31(210-227).
    [22]Jianchao Yang, John Wright, Thomas Huang, and Yi Ma. Image Super-Resolution as Sparse Representation of Raw Image Patches. IEEE Conference on Computer Vision and Pattern Recognition (CVPR).2008.
    [23]Jianchao Yang. Hao Tang, Yi Ma. and Thomas Huang. Face Hallucination via Sparse Coding. IEEE International Conference on Image Processing (ICIP),2008.
    [24]Bruno A. Olshausen and David J. Field. Emergence of Simple-cell Receptive Field Properties by Learning a Sparse Code for Natural Images. Nature,1996,381: 607-609.
    [25]D. A. Forsyth and J. Ponce. Computer Vision:A Modern Approach.2002: Prentice Hall.
    [26]M. Swain and D. Ballard. Color indexing. International Journal of Computer Vision,1991,7(1):11-32.
    [27]M. Stricker and M. Orengo. Similarity of Color Images. In Proceedings of SPIE Storage and Retrieval for Still Image and Video Databases III,1995, San Jose, CA, USA, February,381-392.
    [28]John R. Smith and Shih-Fu Chang. Local color and texture extraction and spatial query. IEEE International Conference on Image Processing,1996.
    [29]Greg Pass, Ramin Zabih and Justin Miller. Comparing Images Using Color Coherence Vectors. In Proceedings of the fourth ACM international conference on Multimedia.1996,65-73.
    [30]Jing Huang, S. Ravi Kumar, Mandar Mitra. Wei-jing Zhu, and Ramin Zabih. Image Indexing Using Color Correlograms. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition,1997,762-768.
    [31]Robert M. haraljck, K. Shanmugam and Its'hak Dinstein. Textural features for image classification. IEEE Transactions on Systems, Man, and Cybernetics,1973, SMC-3(6):610-621.
    [32]H. Tamura. S. Mori and T. Yamawaki. Textural features corresponding to visual perception. IEEE Transactions on Systems, Man, and Cybernetics,1978, SMC-8(6):460-473.
    [33]A. Laine and J. Fan. Texture Classification by Wavelet Packet Signatures. IEEE Transactions on Pattern Analysis and Machine Intelligence,1993,15(11): 1186-1191.
    [34]Jianchang Mao and Anil K. Jain. Texture classification and segmentation using multiresolution simultaneous autoregressive models. Pattern Recognition.1992. 25(2):173-188.
    [35]R. O. Duda and P. E. Hart. Use of the Hough Transformation to Detect Lines and Curves in Pictures. Comm. ACM.1972.15:11-15.
    [36]Michael S. Lew. Nicu Sebe. Chabane Djeraba. and Ramesh Jain. Content-based Multimedia Information Retrieval:State of the Art and Challenges. ACM Transactions on Multimedia Computing. Communications, and Applications,2006, 2(1):1-19.
    [37]John Boreczky And, John S. Boreczky and Lawrence A. Rowe. Comparison of video shot Boundary detection techniques. In Storage and Retrieval for Still Image and Video Databases IV. Proc. SPIE 2664,1996,170-179.
    [38]Yong Rui. Thomas S. Huang and Sharad Mehrotra. Exploring Video Structure beyond the shots. In Proceedings of IEEE Conference on Multimedia Computing and Systems,1998,237-240.
    [39]潘云鹤,庄越挺,吴翌.视频目录——视频结构化的新方法.模式识别与人工智能,1999.
    [40]Babaguchi N. Kawai Y and Kitahashi T. Event based indexing of broadcast sports video by intermodal collaboration. IEEE Trans. on Multimedia,2002,4(1):68-75.
    [41]C. G. M. Snoek and M. Worring. Multimedia event-based video indexing using time intervals. IEEE Transactions on Multimedia,2005.7(4):638-647.
    [42]胡楠,王英武,吕凝.基于内容的视频多模式检索方法.吉林大学学报(信息科学版),2006,24(3):265-270.
    [43]Winston Hsu, Shih-Fu Chang. Chih-Wei Huang, Lyndon Kennedy, Ching-Yung Lin, and Giridharan lyengar. Discovery and Fusion of Salient Multi-modal Features towards News Story Segmentation. IS&T/SPIE Symposium on Electronic Imaging:Science and Technology - SPIE Storage and Retrieval of Image/Video Database. San Jose, CA. USA,(?)2004.
    [44]Jincheng Huang, Zhu Liu and Yao Wang. Joint Video Scene Segmentation and Classification based on Hidden Markov Model. In IEEE International Conference on Multimedia and Expo,2000,1551-1554.
    [45]Yi Wu. Yueting Zhuang and Yunhe Pan. Relevance Feedback of Video Retrieval (In Chinese). Journal of Computer Research and Development,2001,38(5): 546-551.
    [46]A. Amir, M. Berg, S. F. Chang, W. Hsu, G lyengar, C. Y. Lin, M. Naphade, A. Natsev, C. Neti, H. Nock,J. R. Smith, B. Tseng, Y. Wu, and D. Zhang, IBM Research TRECVID-2003 Video Retrieval System. TREC Video Retrieval Workshop (TRECVID'03), NIST, Gaithersburg. MD,2003.
    [47]C. G. M. Snoek. M. Worring and Arnold. W. M. Smeulders. Early versus late fusion in semantic video analysis. In Proc of the 13th annual ACM International Conference on Multimedia. New York:ACM Press,2005,399-402.
    [48]Thijs Westerveld, Arjen P. de Vries, Alex van Ballegooij, Franciska de Jong, and Djoerd Hiemstra. A Probabilistic Multimedia Retrieval Model and its Evaluation. EURASIP Journal on Applied Signal Processing,2003,186-198.
    [49]Lexing Xie. Peng Xu, Shih-Fu Chang, Ajay Divakaran. and Huifang Sun. Structure analysis of soccer vieo with domain knowledge and hidden markov models. Pattern Recognition Letters,2004,25(7):767-775.
    [50]Jun Wu, Xian-Sheng Hua, Hong-Jiang Zhang, and Bo Zhang. An online-optimized incremental learning framework for video semantic classification. In ACM Multimedia,2004,320-323.
    [51]Jianping Fan, A. K. Elmagarmid, Xingquan Zhu, W. G. Aref, and Lide Wu. Classview:hierarchical video shot clasificaiton, indexing and accessing. IEEE Trans. Multimedia,2004,1(6):70-86.
    [52]Jinhui Yuan, Jianmin Li and Bo Zhang. Learning concepts from large scale imbalanced data sets using support cluster machines. In ACM Multimedia,2006, 441-450.
    [53]I. T. Jolliffe. Principal Component Analysis.2002:Springer, New York,2nd edition.
    [54]Borg P and I. Groenen. Modern Multidimensional Scaling, Theory and Applications,1997.
    [55]S. Balakrishnama and A. Ganapathiraju. Linear discriminant analysis - a brief tutorial.1998.
    [56]T. Kohonen. Self-organizing maps.2001:Springer.
    [57]J. B. Tenenbaum. V. Silva and J. C. Langford. A global geometric framework for nonlinear dimensionality reduction. Science.2000.290:2319-2323.
    [58]Sam T. Roweis and Lawrence K. Saul. Nonlinear Dimensionality Reduction by Locally Linear Embedding. Science.2000.290 (5500):2323-2326.
    [59]M. Belkin and P. Niyogi. Laplacian Eigenmaps and spectral techniques for embedding and clustering. Advances in Neural Information Processing Systems 14. Vancouver. British Columbia. Canada.2002.
    [60]Xiaofei He and Partha Niyogi. Locality Preserving Projections. In Advances in Neural Information Processing Systems.2003.
    [61]李国辉.代科学,付畅俭.武德峰.视频挖掘:概念、技术与应用.计算机应用研究.2006.23(1):1-4.
    [62]庄越挺.张鸿,吴飞.跨媒体相关性推理与检索研究.计算机研究与发展.2008.45(5):869-876.
    [63]Hong Zhang, Yueting Zhuang and Fei Wu. Cross-modal correlation learning for clustering on image-audio dataset. In Proceedings of ACM International Conference on Multimedia.2007,273-276.
    [64]Amir Globerson, Gal Chechik. Fernando Pereira, and Naftali Tishby. Euclidean embedding of co-occurrence data. Journal of Machine Learning Research.2007, 8:2265-2295.
    [65]Micheal J. Greenacre. Theory and applications of correspondence analysis.1984, London:Academic Press.
    [66]Yueting Zhuang. Yi Yang and Fei Wu. Mining Semantic correlation of heterogeneous multimedia data for cross-media retrieval. IEEE Transactions on Multimedia,2008,10(2):221-229.
    [67]Wensi Xi, Edward A. Fox, Weiguo Fan, Benyu Zhang, Zheng Chen, Jun Yang. and Dong Zhuang. SimFusion:measuring similarity using unified relationship matrix. In Proc of the 28th Annual International ACM SIGIR Conference on Research and development in information retrieval. New York:ACM Press.,2005, 130-137.
    [68]S. T. Dumais, G. W. Furnas, T. K. Landauer, Deerwester S., and Harshman R. Using latent semantic analysis to improve access to textual information. In Proc of the SIGCHl conference on Human factors in computing systems,1988,281-285.
    [69]Christopher J. C. Burges. A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery,1998,2(2):121-167.
    [70]TREVID. http://www-nlpir.nist.gov/projects/trevid/.
    [71]Tom Fawcet. An introduction to ROC analysis. Pattern Recognition Letters,2006, 27(8):861-874.
    [72]Hongchuan Yu and Bennamoun M.1D-PCA,2D-PCA to nD-PCA. In Proc. of the 18th International Conf. on Pattern Recognition,2006,181-184.
    [73]M. A. O. Vasilescu and Demetri Terzopoulos. Multilinear analysis of image ensembles:TensorFaces. In Proc. of the 7th European Conference on Computer Vision,2002,2350:447-460.
    [74]M. Turk and A. Pentland. Face recognition using eigenfaces. In Proc. of IEEE Conference on Computer Vision and Pattern Recognition,1991,586-591.
    [75]Wojciech Matusik, Hanspeter Pfister, Matt Brand, and Leonard McMillan. A data-driven reflectance model. Proceedings of SIGGRAPH,2003.
    [76]Xiaofei He, Wei-Ying Ma and Hong-Jiang Zhang. Learning an image manifold for retrieval. In Proceedings of the ACM Conference on Multimedia,2004,17-23.
    [77]Steven C. H. Hoi and Michael R. Lyu. A multimodal and multilevel ranking scheme for large-scale video retrieval. IEEE Transactions on Multimedia,2008, 10(4):607-619.
    [78]Jinhui Tang, Xian-Sheng Hua, Guo-Jun Qi, Meng Wang, Tao Mei, and Xiuqing Wu. Structure-sensitive manifold ranking for video concept detection. In Proceedings of the ACM Conference on Multimedia,2007,852-861.
    [79]Xiaofei He, Deng Cai, Haifeng Liu, and Jiawei Han. Image clustering with tensor representation. In Proceedings of the ACM Conference on Multimedia,2005. 132-140.
    [80]Lathauwer. Moor BD and Vandewalle J. A multilinear singular value decomposition,2000.
    [81]Dacheng Tao, Xuelong Li, Windong Hu, and Stephen J. Maybank. Supervised tensor learning. Knowledge and Information Systems,2007,13(1):1-42.
    [82]Xiaojin Zhu. Semi-supervised learning literature survey. Computer Science. University of Wisconsin-Madison.2005.
    [83]Joachims T. Transductive inference for text classification using support vector machines. In Proceedings of the 16th International Conference on Machine Learning. San Francisco,1999,200-209.
    [84]Lathauwer LD. Signal processing based on multilinear algebra[Ph.D. Thesis] Belgium:Katholieke Universiteit Leuven.1997.
    [85]Chung FRK. Spectral Graph Theory, volume 92 of Regional Conference Series in Mathematics,1997.
    [86]Brett W. Bade and Tamara G. Kolda. Efficient MATLAB Computations with Sparse and Factored Tensors. SIAM Journal on Scientific Computing,2006,30(1): 205-231.
    [87]Simon Tong and Daphne Koller. Support vector machine active learning with applications to text classification. Journal of Machine Learning Research,2002,2: 45-66.
    [88]H. S. Seung, M. Opper and H. Sompolinsky. Query by committee. In Proceedings of the 5th Workshop on Computational Learning Theory,1992,287-294.
    [89]Simon Tong and Edward Chang. Support vector machine active learning for image retrieval. In Proceedings of ACM International Conference on Multimedia. 2001,107-118.
    [90]Jun Yang and Alexander G. Hauptmann. Exploring temporal consistency for video analysis and retrieval.2006.
    [91]Zheng-Jun Zha. Tao Mei. Zengfu Wang, and Xian-Sheng Hua. Building a comprehensive ontology to refine video concept detection. In Proceedings of ACM International Workshop on Multimedia Information Retrieval,2007. 227-236.
    [92]陈毅松,汪国平,董士海.基于支持向量机的渐进直推式分类学习算法.软件学报.2003,14(3):451-460.
    [93]Cees G. M. Snoek. Marcel Worring, Jan C. van Gemert. Jan-Mark Geusebroek. and Arnold W. M. Smeulders. The challenge problem for automated detection of 101 semantic concepts in multimedia. In Proceedings of ACM International Conference on Multimedia,2006,421-430.
    [94]Guo-Jun Qi, Xian-Sheng Hua and Yong Rui. Correlative multi-label video annotation. In Proceedings of the ACM Conference on Multimedia,2007,17-26.
    [95]Hui Zou and Trevor Hastie. Rularization and variable selection via the elastic net. Journal of the Royal Statistical Society:Series B,2005.67:301-320.
    [96]Emmanuel Candes and Terence Tao. Near optimal signal recovery from random projections:Universal encoding strategies? IEEE Transactions on Information Theory,2006.52(12):5406-5425.
    [97]David Donoho. Compressed sensing. IEEE Transactions on Information Theory, 2006,52(4):1289-1306.
    [98]Shriram Sarvotham, Dror Baron and Richard Baraniuk. Measurements vs. bits: Compressed sensing meets information theory. Allerton Conference on Communication, Control, and Computing, Monticello, IL,2006.
    [99]Waheed Bajwa, Jarvis Haupt. Akbar Sayeed,and Rob Nowak. Compressive wireless sensing. Int. Conference on Information Processing in Sensor Networks (IPSN), Nashville, Tennessee,2006.
    [100]David Donoho and Yaakov Tsaig. Extensions of compressed sensing. Signal Processing,2006,86(3):533-548.
    [101]Wei Lu and Namrata Vaswani. Modified Compressive Sensing for Real-time Dynamic MR Imaging. IEEE international conference on Image Processing,2009.
    [102]Behnam Jafarpour, Vivek K. Goyal, Dennis B. McLaughlin, and William T. Freeman. Transform-domain sparsity regularization for inverse problems in geosciences. Geophysics,2009,74(5).
    [103]Richard Baraniuk and Philippe Steeghs. Compressive radar imaging. IEEE Radar Conference, Waltham, Massachusetts,2007.
    [104]Marco F. Duarte. Shriram Sarvotham, Dror Baron. Michael B. Wakin. and Richard G. Baraniuk. Distributed compressive sensing of jointly sparse signals. Proceedings of the 2005 Asilomar Conference on Signals. Systems, and Computers,2005, Pacific Grove. CA.
    [105]Petros Boufounos and Richard G. Baraniuk.1-Bit compressive sensing. Conf. on Info. Sciences and Systems (CISS).2008. Princeton. New Jersey.
    [106]Shihao Ji, Ya Xue and Lawrence Carin. Bayesian compressive sensing. IEEE Transactions on Signal Processing,2008.56(6):2346-2356.
    [107]Michael Elad. Optimized projections for compressed sensing. IEEE Transactions on Signal Processing,2007,55(12):5695-5702.
    [108]Alexandre D'Aspremont. Laurent E. Ghaoui. Michael I. Jordan, and Gert R. G. Lanckriet. A direct formulation of sparse PCA using semidefinite programming. SIAM Review,2007,49(3):434-448.
    [109]K. Huang and S. Aviyente. Sparse representation for signal classification. Neural Information Processing Systems (NIPS).2006.
    [110]V. Vapnik. The Nature of Statistical Learning Theory,2000:Springer.
    [111]T. Cover. Geometrical and statistical properties of systems of linear inequalities with applications in pattern recognition. IEEE Transactions on Electronic Computers,1965.14(3):326-334.
    [112]Hubel D. H. and Wiesel T. N. Receptive fields of single neurons in the cat's striate cortex. Journal of Physiology,1959,148:574-591.
    [113]Hubel D. H. and Wiesel T. N. Receptive fields and functional architecture of monkey striate cortex. Journal of Physiology,1968,195:215-243.
    [114]Hubel D. H. and Wiesel T. N. Receptive fields, binocular interaction and functional architecture in the cat's striate cortex. Journal of Physiology.1962,160: 106-154.
    [115]Bruno A. Olshausen and David J. Field. Sparse coding with an onvercomplete basis set:A strategy employed by Vl?. Vision Research,1997,37:3311-3325.
    [116]David Donoho. For most large underdetermined systems of linear equations the minimal 11-norm solution is also the sparsest solution. Communications on Pure and Applied Math,2006,59(6):797-829.
    [117]Robert Tibshirani. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B,1996.58:267-288.
    [118]Nicolai Meinshausen and Peter Bühlmann. High-dimensional graphs and variable selection with the Lasso. Annual Statistics,2006,34(3):1436-1462.
    [119]Vittorio Ferrari. Tinne Tuytelaars and Luc Van Gool. Simultaneous object recognition and segmentation by image exploration,2004, Proceedings of 8th European Conference on Computer Vision,40-54.
    [120]Grauman K. and Darrell T. Efficient image matching with distributions of local invariant features. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR).2005.627-634.
    [121]Lazebnik S., Schmid C. and Ponce J. A sparse texture representation using local affine regions. IEEE Trans. Pattern Analysis and Machine Intelligence,2005. 27(8):1265-1278.
    [122]Schmid C. and Mohr R. Local grayscale invariants for image retrieval. IEEE Trans. Pattern Analysis and Machine Intelligence,1997,19(5):530-534.
    [123]R. Fergus, P. Perona and A. Zisserman. Object class recognition by unsupervised scale-invariant learning. In Proceedings of Conference on Computer Vision and Pattern Recognition,2003,264-271.
    [124]G. Dorko and C. Schmid. Selection of scale-invariant parts for object class recognition. In Proceedings of IEEE International Conference on Computer Vision (ICCV),2003,634-640.
    [125]Mikolajczyk K. and Schmid C. A performance evaluation of local descriptors. IEEE Trans. Pattern Analysis and Machine intelligence,2005,27(10):1615-1630.
    [126]David Lowe. Distinctive image features from scale-invariant keypoints. In International Journal of Computer Vision,2004,1150-1157.
    [127]Serge Belongie, Jitendra Malik and Jan Puzicha. Shape matching and object recognition using shape contexts. IEEE Trans. Pattern Analysis and Machine intelligence,2002,2(4):509-522.
    [128]Yan Ke and R. Sukthankar. PCA-SIFT:a more distinctive representation for local image descriptors. In Proceedings of Conference on Computer Vision and Pattern Recognition (CVPR),2004,506-513.
    [129]William T. Freeman and Edward H. Adelson. The design and use of steerable filters. IEEE Trans. Pattern Analysis and Machine intelligence,1991,13(9): 891-906.
    [130]Koenderink J. and Doom A. V. Representation of local geometry in the visual system.1987.
    [131]Jun Yang. Yugang Jiang. Alexander G. Hauptmann. and Chong-Wah Ngo. Evaluating bag-of-visual-words representations in scene classification. In ACM Multimedia Information Retrieval (MIR).2007,197-206.
    [132]Lazebnik S., Schmid C. and Pronce J. Beyond bags of features:spatial pyramid matching for recognizing natural scene categories. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR).2006. 2169-2178.
    [133]Li F F. and Pietro P. A bayesian hierarchical model for learning natural scene categories. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR).2005.524-531.
    [134]Sivic J. and Zisserman A. Video google:a text retrieval approach to object matching in videos.2003.
    [135]Eric Nowak. Frederic Jurie and Bill Triggs. Sampling strategies for bag-of-features image classification. In European Conference on Computer Vision (ECCV).2006,490-503.
    [136]Brendan J. Frey and Delbert Dueck. Clustering by passing messages between data points. Science,2007,315:972-976.
    [137]David Lowe. Object recognition from local scale-invariant features. In Proc. Of the Intnl.Conf.on Computer Vision,1999,1150-1157.
    [138]Patricio Loncomilla and Javier Ruiz-del-Solar. Improving SIFT-based object recognition for robot applications. In International Conference on Image Analysis and Processing (ICIAP),2005,1084-1092.
    [139]Jing Xing and Zhenjiang Miao. An improved algorithm on image stitching based on SIFT features. In Proceedings of the Second International Conference on Innovative Computing,Information and Control (ICICIC 2007),2007,453.
    [140]Changchang Wu, Clipp B., Xiaowei Li, J. M. Farahm. and M. Pollefeys.3D model matching with viewpoint-invariant patches. IEEE Conference on Computer Vision and Pattern Recognition,2008.
    [141]Tae-Kyun Kim and Roberto Cipolla. Gesture recognition under small sample size. In 8th Asian Conference on Computer Vision (ACCV 2007), Part I, LNCS 4843.2007,335-344.
    [142]Sebastiano Battiato, Giovanni Gallo, Giovanni Puglisi, and Salvatore Scellato. SIFT features tracking for video stabilization. In Proceedings of the 14th International Conference on Image Analysis and Processing (ICIAP 2007),2007, 825-830.
    [143]Trevor Hastie, Robert Tibshirani and Jerome Friedman. The Elements of Statistical Learning:Data Mining, Inference, and Prediction. (Second Edition). 2009, New York:Springer-Verlag.
    [144]Emmanuel J. Candes and Justin Romberg.11-magic:Recovery of sparse signals via convex programming.http://www.acm.caltech.edu/11magic/,2005.
    [145]Ming Yuan and Yi Lin. On the non-negative garrotte estimator. Journal of the Royal Statistical Society:Series B (Statistical Methodology),2007,69(2): 143-161.
    [146]Bradley Efron, Trevor Hastie, Iain Johnstone, and Robert Tibshirani. Least angle regression. The Annals of Statistics,2004,32(2):407-499.
    [147]Jerome Friedman, Trevor Hastie, Holger Hofling, and Robert Tibshirani. Pathwise coordinate optimization. The Annals of Applied Statistics,2007,2(1): 302-332.
    [148]Leo Breiman. Better subset regression using the nonnegative garrote. Technometrics,1995,37(4):373-384.
    [149]Li Fei-Fei, Rob Fergus and Pietro Perona. Learning generative visual models from few training examples:an incremental Bayesian approach tested on 101 object categories. IEEE CVPR 2004, Workshop on Generative-Model Based Vision.2004.
    [150]M. Everingham. L. Van Goo. C. K. I. Williams. J. Winn. and A. Zisserman. The PASCAL Visual Object Classes Challenge 2008 (VOC2008) Results. http://www.pascal-network.org/challenges/VOC/voc2008/workshop/index.html. 2008.
    [151]David D. Lewis. Na(?)ve Bayes at forty:the independence assumption in information retrieval. In Machine Learning:ECML-98.1998.1611-3349.
    [152]Daniel D. Lee and H. Sebastian Seung. Learning the parts of objects by nonnegative matrix factorization. Nature,1999,401:788-791.
    [153]Daniel D. Lee and H. Sebastian Seung. Algorithms for non-negative matrix factorization. In NIPS,2001,556-562.
    [154]Changhu Wang. Shuicheng Yan. Lei Zhang, and Hong-Jiang Zhang. Multi-label sparse coding for automatic image annotation. IEEE Conference on Computer Vision and Pattern Recognition (CVPR).2009.
    [155]M. A. T. Figueiredo. Adaptive sparseness for supervised learning. IEEE Transactions on Pattern Analysis and Machine Intelligence,2003,25(9):1150-1159.
    [156]Ehsan Elhamifar and Rene Vidal. Sparse subspace clustering. IEEE Conference on Computer Vision and Pattern Recognition (CVPR),2009.
    [157]Xiaoxing Li, Tao Jia and Hao Zhang. Expression-insensitive 3D face recognition using sparse representation. IEEE Conference on Computer Vision and Pattern Recognition,2009.
    [158]Ming Yuan and Yi Lin. Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society:Series B (Statistical Methodology),2006,68(1):49-67.
    [159]F. S. Samaria and A. C. Harter. Parameterisation of a stochastic model for human face identification. In In 2nd IEEE Workshop on Applications of Computer Vision. 1994,138-142.
    [160]Athinodoros S. Georghiades, Peter N. Belhumeur and David J. Kriegman. From few to many:Illumination cone models for face recognition under variable lighting and pose. IEEE TPAMI,2001,23(6):643-660.
    [161]A. M. Martinez and R. Benavente. The AR face database.1998.
    [162]T. Sim, S. Baker and M. Bsat. The CMU pose, illumination, and expression database. IEEE Transactions on Pattern Analysis and Machine Intelligence,2003. 25(12):1615-1618.
    [163]Naoki Saito, Brons M. Larson and Bertrand Benichou. Sparsity vs. statistical independence from a best-basis viewpoint. Proceedings of SPIE. the International Society for Optical Engineering,2000,4119(1):474-48.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700