视频搜索及语义提取
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
伴随着网络和多媒体技术的发展,视频信息急剧膨胀。如何对海量的视频信息进行有效地检索和查询,已经成为目前迫切需要解决的问题。因此,基于内容的视频检索(Content-Based Video Retrieval, CBVR)技术受到广泛关注。
     本文分别从低层视觉特征提取、高层语义特征提取以及语义视频搜索三个层次就基于内容的视频检索进行研究,提出了一些新的算法和框架,主要内容如下:
     在低层视觉特征的选择和提取方面,全面分析和比较了基于关键点、纹理、边缘和颜色信息的四大类视觉特征在概念检测中的性能。首先采用基于Bag-of-Visual-Words的关键点投影算法,有效地量化高维关键点特征;其次改进了采用不同检测子的SIFT、SURF特征进行特征级融合的方法,最后在TRECVID数据集上,测试了不同视觉特征的检测性能。实验结果显示,经过融合后的SIFT、SURF特征较融合前原始特征的性能有显著提高。
     在高层语义特征的提取方面,提出了一种视频语义概念的提取框架。使用颜色、Gabor小波、边缘直方图和SIFT四种视觉特征,为每种视觉特征训练支持向量机作为分类器,经过分类器的决策级融合后,得到概念检测结果。随后提出了多种决策级融合算法,并在自测实验中进行测试。实验结果表明,混合各概念最佳融合算法构成的混合融合算法,对性能提高最大。TRECVID 2008高层特征提取的评测结果显示,本系统的整体性能高于所有参赛队伍的平均值。
     在视频搜索方面,提出了基于语义的视频搜索框架。分析了基于示例样本的搜索方式和基于语义概念的搜索方式,并分别采用基于语义相似性的方法和基于样本相关性的方法建立概念与语义查询的映射关系,实现了语义信息的自动提取,完成用户查询请求。在TRECVID 2009自动视频搜索评测中排名第一,充分验证了本文算法的有效性。
With the development of network and multimedia technologies,video data is expanding rapidly. So how to effectively retrieve the interested video information from the large-scale dataset has become an urgent issue. Therefore, the Content-Based Video Retrieval has received great attention.
     In this paper, Content-Based Video Retrieval is studied at three different levels:low-level visual feature extraction,high-level semantic feature extraction and content-based video search,and some novel algorithms and frameworks are put forward.The major tasks in this paper are:
     In the low-level feature selection and extraction,a large number of visual features are extracted and analyzed in this paper, which can be summed up in four categories:key-point feature, texture feature, edge feature and color feature.First of all, the Bag-of-Visual-Words algorithm is proposed to effectively quantify the high-dimensional key-point feature. Then, the feature fusion strategy between SIFT and SURF is explored.At last, experiments are performed on TRECVID datasets to evaluate performance of different visual features.The experiment results show that the fusion between SIFT and SURF can significantly improve retrieval performance.
     In the high-level semantic feature extraction, a novel framework for video semantic concept detection is proposed, in which the color, Gabor wavelet, edge histogram and SIFT are used as visual descriptors and a support vector machine is trained for each feature as classifier. After decision-level fusion among classifiers, conceptual test results are acquired.Then various decision-level fusion strategies are put forward in this paper, and are evaluated in self-test experiment, which shows that the mix fusion strategy improves the retrieval performance best by mixing best fusion strategy in each concept.The evaluation results of TRECVID 2008 HLF show that the system's overall detection performance is higher than the average detection performance of all the participants.
     In the video search,the semantic-based video search framework is proposed, in which the visual example based search approach and the semantic concept based search approach are analyzed.Additionally, the semantic similarity based method and the example correlation based method,are used respectively to establish the mapping relations between concepts and semantic queries, so that the semantic information could be extracted automatically and the video search task is completed.In the TRECVID 2009 automatic video search evaluation, the performance of our framework ranked the first place among all participants,fully verifying the effectiveness of our algorithm.
引文
[1]Bimbo A., "Visual Information Retrieval", MorganKaufmann, Inc.1999
    [2]Mei Mei, Zhicheng Zhao et al, "Rapid Search Scheme for Video Copy Detection in Large Databases",ICIS2009, China, Nov.20-22, vol 3, pp 448-452
    [3]Zhicheng Zhao, Xing Zeng, Tao Liu, Anni Cai. BUPT at TRECVID 2007:Shot boundary detection. In TRECVID 2007 Workshop, Gaithersburg, MD, US, Nov.5-6, 2007, pp:48-51.
    [4]L. Xie, P.Xu, S.F.Chang, A. Divakaran et al, "Structure analysis of soccer video with domain knowledge and hidden markov models", Pattern Recognition Letters, 25(7):767-775,2004.
    [5]Jianrong Cao, Anni Cai.A method for classification of scenery documentary using MPEG-7 edge histogram descriptor. Proc. IEEE Int. Workshop on VLSI Design and Video Technology, Suzhuo, China, May 28-30,2005, pp.105-108.
    [6]J. Fan, H. Luo, and L. Xiao, "Semantic video classification by integrating flexible mixture model with adaptive em algorithm", In ACM SIGMM International Workshop on Multimedia Retrieval,2003.
    [7]D. Zhong and S.F. Chang, "Structure analysis of sports video using domain models", In Proceedings of International Conference on Multimedia and Expo, pp 713-716,2001.
    [8]David G Lowe, "Object Recognition from Local Scale-Invariant Features", Proc. of the International Conference on Computer Vision, Corfu, September,1999.
    [9]Brown,M.and Lowe.D.G, "Invariant features from interest point groups", British Machine Vision Conference, BMVC 2002,Cardiff, Wales:British Machine Vision Association,2002, pp 656-665.
    [10]Frank Dellaert, "Local invariant Features", Slides Adapted from Cordelia Schmid and D. Lowe's Short Course at CVPR 2003.
    [11]D. G Lowe, "Distinctive Image Features from Scale Invariant Keypoints", International Journal of Computer Vision.2004,60(2):91.
    [12]Herbert Bay, Tinne Tuytelaars and Luc Van Gool"SURF:Speeded Up Robust Features",Proceedings of the 9th European Conference on Computer Vision, Springer LNCS volume 3951, part 1,pp 404-417,2006.
    [13]Xiaoming Nan, Zhicheng Zhao et al, "A Novel Framework for Semantic-based Video Retrieval",ICIS 2009, China, Nov.20-22, Vol 4, pp 415-419
    [14]R. Mehrotra,"Gabor filter-based edge detection", PaRem Recognition.1992, 25(12):pp1479-1494
    [15]J.Daugman, "Two-dimensional spectral analysis of cortical receptive field profiles", Vision Research,20:847-856,1980.
    [16]M. Lades, J. Vorbruggen, J. Buhmann et al, "Distortion invariant object recognition in the dynamic link architecture", IEEE Trans. Computers,42:300-311,1993.
    [17]Duy-Dinh Le and Shin'ichi Satoh,"Concept Detection Using Local Binary Patterns and SVM", In:Proceedings TRECVID 2006 Workshop, USA, Nov.2006
    [18]Jianrong Cao and Anni Cai, "A method for classification of scenery documentary using MPEG-7 edge histogram descriptor,"IEEE Int. Workshop VLSI Design & Video Tech. Suzhou, China, May 28-30,2005.
    [19]Dalal N. and Triggs B.,"Histograms of oriented gradients for human detection", IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2005. CVPR 2005.
    [20]Dalal N., "Finding People in Images and Videos", Unpublished doctoral dissertation. Institut National Polytechnique Grenoble,2006.
    [21]Huang J, Kumar S R et al, "Image indexing using color correlograms", In:IEEE Conference on Computer Vision and Pattern Recognition[C].1997,762-768
    [22]K. Mikolajczyk and C. Schmid, "A performance evaluation of local descriptors", IEEE Transactions on Pattern Analysis and Machine Intelligence,27 (10):pp 1615-1630,2005.
    [23]Jun Yang, Yu-Gang Jiang et al, "Evaluating Bag-of-Visual-Words Representations in Scene Classification", In Proc. ACM MIR'07, Sep.28-29,2007, Germany
    [24]J. Zhang, M. Marszalek, S. Lazebnik, C. Schmid, "Local features and kernels for classification of texture and object categories:a comprehensive study", IJCV, vol. 73, no.2, pp.213-238,2007.
    [25]Shih-Fu Chang, Junfeng He, Yu-Gang Jiang et al, "Columbia University/VIREO-CityU/IRIT TRECVID2008 High-Level Feature Extraction and Interactive Video Search", In:Proceedings of TRECVID 2007 Workshop.
    [26]Herbert Bay, Tinne Tuytelaars, and Luc Van Gool, "SURF:Speeded Up Robust Features", ECCV 2006. Presented by Wyman, Oct.14,2006
    [27]Zan Gao, Zhicheng Zhao, Tao Liu, Xiaoming Nan, Mei Mei et al, "BUPT at TRECVID 2008", In:Proceedings of TRECVID 2008 Workshop.
    [28]Gao Zan, Nan Xiaoming, Liu Tao et al, "A new framework for high-level feature extraction", Industrial Electronics and Applications, China, May.2009, pp. 2118-2122.
    [29]Milind R. Naphade and John R. Smith, "On the Detection of Semantic Concepts at TRECVID", In:Proceedings of ACM MM'04
    [30]Naphade M R, Smith J R et al, "IBM Research TRECVID-2004 Video Retrieval System", In:Proceedings of TRECVID 2004 Workshop.
    [31]Naphade M R, Smith J R et al, "IBM Research TRECVID-2005 Video Retrieval System," In:Proceedings of TRECVID 2005 Workshop.
    [32]IBM Video AnnEx MPEG-7 Video Annotation Tool, http://www.research.ibm.com/ VideoAnnEx/
    [33]Zhang Ling, Zhang Bo, "Relationship Between Support Vector Set and Kernel Functions in SVM", Journal of Computer Science and Technology.2002,17(5): 549-555.
    [34]周奇,”对支持向量机几种常用核函数和参数选择的比较研究”,福建电脑,2009,vol 6.
    [35]Chang, C.C., Lin, C.J.:LIBSVM:a library for support vector machines. (2001) Software available at http://www.csie.ntu.edu.tw/-cjlin/libsvm.
    [36]D. F. Specht, "Probabilistic neural networks and the polynomial Adaline as complementary techniques for classification", IEEE Trans. Neural Network,1990,1: 111-121.
    [37]L. A. Zadeh, "Fuzzy sets", Information and Control,1965,8:338-353.
    [38]J.Smith et al.,"Multimedia semantic indexing using model vectors," IEEE Proc. ICME, vol.2, pp 445-448,2003.
    [39]Aytar Y, Orhan O.B, Shah, M.,"Improving Semantic Concept Detection and Retrieval using Contextual Estimates",2007 IEEE International Conference on Multimedia and Expo, pp.536-539.2-5 July 2007
    [40]Le Chen, Dayong Ding, Dong Wang, Lin Fuzong, Zhang Bo, "AP-based Borda Voting Method for Feature Extraction in TRECVID-2004,"Advances in Information Retrieval-27th European Conference on IR Research, ECIR 2005, pp.568-570,2005.
    [41]J.A. Aslam, V.Pavlu and E. Yilmaz, "Statistical Method for System Evaluation Using Incomplete Judgments", In:Proceedings of the 29th ACM SIGIR Conference, Seattle,2006
    [42]Zhicheng Zhao, Yanyun Zhao, Zan Gao, Xiaoming Nan, Mei Mei et al, "BUPT-MCPRL at TRECVID 2009", In:Proceedings of TRECVID 2009 Workshop.
    [43]TRECVID,http://www-nlpir.nist.gov/projects/trecvid/
    [44]Apostol Natsev, Alexander Haubold et al, "Semantic Concept-Based Query Expansion and Re-ranking for Multimedia Retrieval", In:Proceedings of MM'07
    [45]M. Campbell, A. Haubold et al, "IBM Research TRECVID-2007 Video Retrieval System",In:Proceedings of TRECVID 2007 Workshop.
    [46]P.F.Brown, P.V.deSouza et al, "Class-based n-gram models of natural language," Comput. Linguist., vol.18, no.4, pp.467-479,1992.
    [47]WordNet,http://wordnet.princeton.edu/
    [48]T C.GM. Snoek, K.E.A. van de Sande et al, "MediaMill TRECVID 2008 Semantic Video Search Engine"
    [49]FaceBook, http://www.facebook.com/press/info.php?statistics
    [50]J.Dean and S.Ghemawat. Mapreduce:simplified data processing on large clusters. Commun.ACM,51(1):107-113,2008.
NGLC 2004-2010.National Geological Library of China All Rights Reserved.
Add:29 Xueyuan Rd,Haidian District,Beijing,PRC. Mail Add: 8324 mailbox 100083
For exchange or info please contact us via email.