用户名: 密码: 验证码:
故事视频的语义分析与提取
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
科技的发展使得数字视频潮水般涌入人们的日常生活。视频内容的丰富性和多样性、以及特征数据特有的时空高维结构,使得如何有效地对海量视频进行表达、存储和管理,以便人们快速地浏览和检索,成为一个亟待解决的重大课题。传统的数据管理与检索技术已远不能适应这种急速的变化和需求,因此基于内容的视频检索(CBVR)应运而生,相关的研究迅速在各国展开。
     目前,CBVR在多个方面取得了长足的进步,视频中语义信息的提取成为研究的热点,少数基于语义检索的原型系统也已出现。然而,由于语义对象的提取、语义的分析和理解等仍存在较大问题,大规模的应用还没实现。本文针对语义提取这个热点和难点从感知和认知的视角,结合电影理论和社会学等跨领域内容进行了较系统和循序渐进的研究,提出了一些新的框架和算法,主要内容如下:
     在视觉内容的表达方面,针对颜色、纹理等静态特征只能表示图像的内部特性,不能刻画序列图像的时间关系的问题,提出了一个压缩域全局运动特征的估计方法,并描述了视频内容在时域上的变化以及上下文关系。首先通过简化一个六参数运动模型估计出全局运动参数;随后提出基于滑动窗的视频运动分割算法,完成视频的全局运动分割和关键词注释,并运用特征点序列对运动信息进行了描述;最后,为了验证所提取运动特征的有效性,提出一个基于全局运动的视频检索框架。试验结果表明,该算法能准确地对视频进行全局运动分割,视频的全局运动检索也能获得较高的准确率,还实现了基于Xquery的关键词查询。
     镜头边界检测(SBD)是CBVR的基础,处于视频结构分析的底层,它的性能将直接影响其它视频分析的结果。为此提出了一个基于多层次特征描述和SVM的SBD算法。影响SBD性能的因素较多,本文将其总结为视觉内容的表达不够理想、序列图像的上下文联系不够紧密和分类器性能有待提高三方面,并提出了相应的解决办法:针对第一点,提出应该兼顾特征的敏感性和不变性,因此采用了从像素到全局的多层次特征描述方法;对第二点,运用一个变长滑动窗来建立特征矢量间的上下文联系;针对第三点,采用SVM分类器,通过主动学习和交叉验证分别来选择正负样本的比例和训练参数。此外,还提出边缘、运动等独立的检测子来对SVM分类结果中的误检加以修正。从TRECVID 2007的测评来看,我们的算法在15个参赛组中取得了较为满意的结果。
     在语义对象的提取方面,提出了一个基于视觉注意模型的语义对象的选择性提取算法。基于对象的语义提取是视频分析中的一个难点,对象的有效提取能够明显提高语义概念检测的准确性。对象的提取面临颜色的量化、图像的分割、语义对象的确定等诸多困难。针对这些问题,本文首先提出一个颜色的矢量量化算法完成彩色图像的量化;其次综合考虑图像的颜色和空间分布特性,提出一种基于图模型和区域组合的方法来分割图像;随后建立一个视觉注意模型来确定图像的视觉注意中心和转移顺序;接着在Gestalt准则下融合颜色、纹理以及边界特征来描述图像的同质特性;最后根据注意中心的转移顺序来提取图像的多个视觉显著对象。实验表明,在Corel图像库和TREC等视频上提取的显著对象获得了较高的主观评价。
     在视频摘要方面,提出了一个基于电影结构模型和感知线索的分层视频摘要产生框架,以及一套完整的模型算法。现有的视频摘要算法主要针对新闻、体育等非故事性结构而且时长较短的视频类型,不适用于全长度的电影。为此,首先提出一个故事结构模型—NP模型,将电影分解成幕、情节和场景三个层次,同时给出了场景的分割与分类算法;随后,构造一个基于情感刺激量的场景“重要性”函数来计算每个场景、情节、幕的重要性,以此来分配提取的关键帧和缩略的数目和长度;此外建立一个注意力模型来将重要的电影元素量化并融合成一条注意力曲线;最后将电影结构模型、情感模型和注意力模型有机地融合起来,提出了一个多层次的视频摘要框架,分别产生静态关键帧和动态视频缩略。七部好莱坞影片验证了框架的有效性和通用性,实验结果在信息量和愉悦度上都优于代表性的Ma提出的算法。
     在视频语义的提取方面,提出了一个基于社会网络分析(SNA)和电影本体(Ontology)的影片内容理解框架和一套语义提取算法。目前语义的研究主要集中在新闻、体育、医学等场景较为简单的视频类型,电影的自动理解则缺乏系统的研究。电影远比新闻等复杂,传统的语义分析方法难以缩小影片的语义鸿沟。本文从一个全新的视角提出通过SNA和建立电影本体来分析影片的故事内容。将电影看成一个特殊的社会网络,利用SNA来确定角色的社区结构和角色间的关系,并结合电影结构模型分析出故事的发展线索;其次,构造了一个电影本体,根据本体建立起角色的身份、职业以及政府各机构之间的联系;第三,提出一个分层的基于时间线索的高层动作事件检测方法,以及一个基于语义图的对话事件的摘要算法完成影片的语义分析。两部好莱坞电影验证了提出框架的可行性,其结果基本上满足语义视频检索的需求。
With the technology development of computer, internet, telecommunication, and compression of video and audio, digital video is streaming into the common life. However, since the diversity of video content and the high-dimensionally spatiotemporal structure of video data, it becomes crucial for efficient organization, management, storing, rapid retrieval and browsing of video data, while the traditional data management and retrieval method become unsuccessful. As a result, the content-based video retrieval (CBVR) emerges.
     Recently, CBVR has made great achievements in many aspects, so the extraction of semantic information has become the research focus and several prototype systems based on semantic video retrieval appear. However, since some important problems remain unresolved in such field as the extraction of semantic object, the understanding of video content, etc., large-scale applications have not come true. As a result, in this paper, we propose some new frameworks and methods on the basis of some perceptive clues, film theory and cross-domain analysis. The main contents follow:
     The representation of visual content is the basis of CBVR. Since the static features such as color, texture etc. can only represent the internal characters of the image, but can not depict the temporal clue of the image sequence, an algorithm of global motion (GM) estimation is proposed in the compressed domain to dynamically describe the context of visual contents. The GM parameters are firstly extracted according to a 6-parameter motion model; and then a motion segmentation method is presented to segment and annotate videos according to GMs and the motion information is described by a feature-point sequence; finally, in order to validate the effectiveness of extracted motion feature, a video retrieval framework based on GM is proposed. The experiment results show that the algorithm could exactly segment videos into motion sub-segments, and a high precision of video retrieval could be obtained. At the same time, the query-by-keywords is realized based on Xquery engine.
     Shot boundary detection (SBD) is the basic task of structure analysis in CBVR. This paper develops a fast and high-performance SBD system based on three important factors: representation of visual content, construction of continuity feature signal, pattern classification and recognition. For each effecting factor, corresponding resolutions are proposed. For example, for the first problem, we concentrate on analyzing the tradeoff between the invariance and the sensitivity of various visual features; for the second problem, the context of feature signal should be taken into account; for the last problem, support vector machines (SVM) are used to detect the cut shot and the gradual shot. In addition, some independent detectors such as the edge detector, the motion detector are developed to improve the overall performance. According to the TRECVID 2007 SBD evaluation, our system achieves a satisfying result among 15 participators from the world.
     The extraction of semantic objects in videos is another difficulty of CBVR. An algorithm for selective extraction of visual saliency objects from color images and videos is presented in this paper. Color quantization based on vector quantization (VQ) is firstly performed; And then, quantized image is segmented according to its color and spatial distribution; thirdly, based on the visual attention model, focuses of attention (FOAs) of objects are selected; finally, in term of shifts of FOAs and the Gestalt principle, saliency objects are extracted by merging color, texture, boundary and homogeneity features. Experimental results on the Corel image database and the TREC videos demonstrate the effectiveness of the proposed approach after the subjective evaluation.
     In order to generate the video summarization for movie, we propose a computable film-structure model—the nine-plot (NP) model, which parses an entire film into three hierarchical semantic levels: act, NP, and scene. The model has been motivated by systemic analysis on "Hollywood mode" and generic narrative structure of a story. A set of modeling methods for film-making rules and grammars based on scene segmentation and classification are also proposed. As an important application of the model, a hierarchical video summarization framework, including static key-frame extraction and dynamic video skimming, is established by combining a perceptual attention model and an emotion model. To be concrete, firstly, an attention model is set up based on quantitating and integrating of visual-aural dramatic elements of film. Secondly, affective arousal is extracted to estimate the importance of scene so as to adaptively allot proportion extracted for video summary. The promising experimental results on 7 full-length Hollywood movies demonstrate the effectiveness and generality of our proposed framework.
     Semantic analysis and extraction of the video is a challenging problem. In a new viewpoint, we present a content understanding framework of the movie based on social network analysis and the film ontology. Firstly, we summarize the difficulties of semantic analysis, and find a latent resolution by using the social network analysis and constructing the film ontology to shorten the semantic gap of automatic content understanding of the movie. Secondly, a set of modeling algorithms are also set up, that is, we parse a full-length movie into a series of causal action events and dialogue events, and then a hierarchical high-level action event detection method is proposed according to temporal clue and context of basic events, and finally, a semantic graph of the dialogue is built up to summarize dialog events. At the same time, some important semantic information such as the social community and career classification are extracted. Two Hollywood action movies demonstrate the feasibility of our proposed framework, and several basic semantic elements of "who", "when", "where" and "what" could be obtained, which are always the most important work to catch a visual information.
引文
[1]J.Lewis.M.Matters.Hollywood in the Corporate Era.The New American Cinema,Duke University Press,1998.
    [2]路春艳.全球化时代美国类型电影的发展趋势.北京社会科学.PP:109-112.2006.
    [3]蔡安妮,孙景鳌 编著.多媒体通信技术基础.北京:电子工业出版社,2000.
    [4]钟玉琢,王琪,赵黎,扬小勤 编译.MPEG-2运动图象压缩编码国际标准及MPEG的新进展.北京:清华大学出版社,2002.
    [5]章毓晋 著.基于内容的视觉信息检索.北京:科学出版社,2003.
    [6]R.Lienhart.Comparison of automatic shot boundary detection algorithms,in Proc.SPIE Image Video Process.Ⅶ,Jan.vol.3656,pp:290-301,1999.
    [7]H.J.Zhang,C.Y.Low,S.W.Smoliar.Video parsing and browsing using compressed data.Multimedia Tools and Applications vol.1,pp:89-111,March 1995.
    [8]B.Shahraray.Scene change detection and content-based sampling of video sequences.SPIE Digital Video Compression,Algorithm and Technologies,vol.2419,pp:2-13,1995.
    [9]H.J.Zhang,A.Kankanhalli,S.W.Smoliar.Automatic partitioning of full-motion video.Multimedia Systems,vol.1,pp:10-28,1993.
    [10]T.Truong,C.Dorai,and S.Venkatesh.New enhancements to cut,fade,and dissolve detection processes in video segmentation,in Proc.ACM Multimedia,Los Angeles,CA,pp:219-227,2000.
    [11]C.W.Ngo.A robust dissolve detector by support vector machine,in Proc.ACM Multimedia,pp:283-286.2003.
    [12]T.S.Chua,H.Feng,C.A.An unified framework for shot boundary detection via active learning," in Proc.ICASSP,Hong Kong,Apr.vol.2,pp:845-848.2003.
    [13]H.Feng,W.Fang,S.Liu,Y.Fang.A new general framework for shot boundary detection and key-frame extraction,in Proc.7th ACM SIGMM Multimedia.pp:121-126.2005.
    [14]Z.Liu,E.Zavesky,D.Gibbon.AT&T RESEARCH AT TRECVID 2007,TRECVID Workshop at NIST,Gaithersburg,MD,2007.
    [15]J.H.Yuan,H.Y.Wang,L.Xiao.A Formal Study of Shot Boundary Detection.IEEE Trans.Circuits Syst.Video Techn.vol.17(2):pp:168-186,2007.
    [16]I.Koprinska and S.Carrato.Temporal video segmentation:A survey.EURASIP Sig.Proc.Image Communication,vol.16(5),pp:477-500,2001.
    [17]H.J.Zhang,C.Y.Low,Y.H.Gong.Video parsing using compressed data.Proc.SPIE Conf.Image and Video Processing Ⅱ,San Jose,CA,pp:142-149,1994.
    [18]J.Meng,Y.Juan,S.F.Chang.Scene change detection in a MPEG compressed video sequence.Proc.IS&T/SPIE Int.Symp.Electronic Imaging,vol.2417,San Jose,pp:14-25,1999.
    [19]D.Lelescu,D.Schonfeld.Statistical sequential analysis for real-time video scene change detection on compressed multimedia bitstream.IEEE Trans.on Mutimedia,vol.5,pp:106-117,2003.
    [20]G.Boccignone,M.De Santo Automated threshold selection for the detection of dissolves in MPEG videos.IEEE Inte.Conf.Multimedia and Expo.,New York,July,pp:1535-1538,2000.
    [21]M.R.Naphade,R.Mehrotra,T.S.Huang.A high-performance shot boundary detection algorithm using multiple cues," in IEEE Inte.Conf.Image Process.,pp:884-887,1998.
    [22] M. Cooper. Video segmentation combining similarity analysis and classification. in Proc. ACM Multimedia 2004, Oct., pp: 252-255, 2004.
    
    [23] Z.C. Zhao, A.N. Cai. Shot Boundary Detection Algorithm in Compressed Domain Based on Adaboost and Fuzzy Theory. Lncs vol.4222, pp: 617-626, 2006.
    
    [24] H. J. Zhang, J. Wu, D. Zhong. An integrated system for content-based video retrieval and browsing. Pattern Recognit., vol. 30, no. 4, pp: 643-658,1997.
    
    [25] M. Markos, P. Alexandra. Keyframe extraction algorithm using entropy difference. Proc. of ACM SIGMM on Multimedia, US, pp: 39-45, 2004.
    
    [26] T.M. Liu, H.J. Zhang, F.H. Qi. A novel video key-frame extraction algorithm based on perceived motion energy mode. IEEE Trans. Circuits Syst. Video Technol., vol. 13, pp: 1006-1013. 2004.
    
    [27] W. Wolf, keyframe selection by motion analysis. Proc. IEEE Int. Conf. Acoust, Speech, and Signal Proc. 1996.
    
    [28] Y. Zhang, Y. Rui, T.S. Huang. Adaptive key frame extraction using unsupervised clustering. in Proc. On Image Processing, Chicago, IL, Oct. pp 886-870,1998.
    
    [29] A. Hanjalic, H.J. Zhang. An integrated scheme for automated video abstraction based on unsupervised cluster-validity analysis. IEEE Trans. CSVT, vol.9, pp: 1280-1289,1999.
    
    [30] C. Kim, J.N. Hwang. An integrated scheme for object-based video abstraction. in Proc. 8th ACM Int. Conf. Multimedia, Los Angeles, CA, Oct. 30-Nov. 4, pp: 303-311. 2001.
    
    [31] L.J. Liu, GL. Fan. Combined Key-Frame Extraction and Object-Based Video Segmentation. IEEE Trans. Circuits Syst. Video Technol., vol. 15, no7, pp: 869-884. 2006.
    
    [32] A. Hanjalic, R.L. Lagendijk, J. Bimond. Automated high-level movie segmentation for advanced video-retrieval systems. IEEE Trans. Circuits Syst. Video Technol. pp: 580-588,1999.
    
    [33] M. Yeung, B. L. Yeo. Time-constrained clustering for segmentation of video into story units. in Proc. ICPR, vol. C, Vienna, Austria, Aug. pp: 375-380. 1996.
    
    [34] M. Yeung, B.-L. Yeo, and B. Liu. Segmentation of video by clustering and graph analysis. Comput. Vis. Image Understanding.vol. 71(1), pp: 94-109,1998.
    
    [35] C.W. Ngo, T.C. Pong, H.J. Zhang. Motion-based video representation for scene change detection. Int. J. Comput. Vis., vol. 50(2), pp: 127-142, 2002.
    
    [36] Y. Rui, T.S. Huang, S. Mehrotra. Constructing table-of-content for videos. ACM Multimedia Syst. J., Special Issue Multimedia Systems on Video Libraries. vol. 7(5), pp:359-368,1999.
    
    [37] W. Tavanapong, J.Y. Zhou. Shot clustering techniques for story browsing. IEEE Transaction on Multimedia. vol. 6(4), pp:517-527, 2005.
    
    [38] H. Sundaram, S.F. Chang. Computable Scenes and structures in Films. IEEE Trans. Multimedia, vol.4, no. 4, pp: 482-491, 2002.
    
    [39] M.A. Smith, T. Kanade. Video skimming and characterization through the combination of image and language understanding techniques. Proc. of Computer Vision and Pattern Recognition, 1997.
    
    [40] R. Lienhart. Dynamic video summarization of home video. SPIE, vol. 397, pp: 378-389, 2000.
    
    [41] A. Hanjalic. Adaptive Extraction of Highlights From a Sport Video Based on Excitement Modeling. IEEE Trans. Multimedia, vol.7, no. 6, pp: 1114-1122, 2005.
    
    [42] H. Luo, Y. Gao, X. Xue. Incorporating Feature Hierarchy and Boosting for Concept-Oriented Video Summarization and Skimming, ACM Trans. on Multimedia Computing Communications and Applications, vol. 4, no.1, pp: 1-25, 2008.
    
    [43] Y. F. Ma, X. S. Hua, L. Lu. A generic framework of user attention model and its application in video summarization. IEEE Trans. Multimedia, vol. 7, no. 5, Oct. pp: 907-919, 2005.
    [44] J.Y. You, G.Z. Liu. A Multiple Visual Models Based Perceptive Analysis Framework for Multilevel Video Summarization. IEEE Trans. Circuits Syst. Video Techn., vol. 17, pp: 273-285, 2007.
    
    [45] D. Dementhon, V. Kobla, D. Doermann. Video summarization by curve simplification. ACM Multimedia, pp: 211-218,1998.
    
    [46] A.W.M. Smeulders, M. Worring, S. Santini.Content based image retrieval at the end of the early years. IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 22, pp: 1349-1380, 2000.
    
    [47] J. Fan, Y. Gao, H. Luo, Integrating Concept Ontology and Multi-Task Learning to Achieve More Effective Classifier Training for Multi-Level Image Annotation", IEEE Trans. on Image Processing, vol. 17, no.3, pp: 407-426, 2008.
    
    [48] Laura Hollink. Semantic Annotation for Retrieval of Visual Resources. SIKS Dissertation Series No. 2006-24.
    
    [49] D. Zhong, H.J. Zhang, S.F. Chang. Clustering Methods for Video Browsing and Annotation. SPIE, pp: 393-246, 2000.
    
    [50] S. L. Feng, R. Manmatha, V. Lavrenko, Multiple Bernoulli Relevance Models for Image and Video Annotation. CVPR'40, 2005.
    
    [51] [Online] Available: http://gmazars.info/conf/cvpr2008.html.
    
    [52] J. Fan, Y. Gao, H. Luo. Mining Multi-Level Image Semantics via Hierarchical Classification. IEEE Trans. on Multimedia, vol. 10, no.1, pp: 167-187, 2008.
    
    [53] A. Bosch, X. Munioz, R. Mart. A review: Which is the best way to organize/classify images by content? Image and Vision Computing, 2006.
    
    [54] K. Rapantzikos, Y. Avrithis. On the Use of Spatiotemporal Visual Attention for Video Classifiation. MMSP, 2005.
    
    [55] T. Giannakopoulos, A. Pikrakis. A Multi-Class Audio Classification Method With Respect To Violent Content in Movies Using Bayesian Networks. IEEE MMSP'07, pp: 90-93, 2007.
    
    [56] Fan, A. K. Elmagarmid, X. Zhu. ClassView: Hierarchical Video Shot Classification, Indexing, and Accessing. IEEE Trans. on Multimedia, vol. 6, pp: 648-666, 2004.
    
    [57] X. Zhu, J. Fan, W.G. Aref. ClassMiner: Mining Medical Video Content Structure and Events Towards Efficient Access and Scalable Skimming," Proc. ACM SIGMOD Workshop, pp: 9-16, 2002.
    
    [58] Y. Matsuo, K. Shirahama, K. Uehara. Video Data Mining: Extracting Cinematic Rules from Movie. Proc. Int'l Workshop Multimedia Data Management (MDM-KDD), 2003.
    
    [59] L. Xie, S.F. Chang, A. Divakaran. Unsupervised Mining of Statistical Temporal Structures in Video, Video Mining, 2003.
    
    [60] I. Laptev, P. Perez. Retrieving actions in movies, ICCV'07, 2007.
    
    [61] P. Chang, M. Han, Y. Gong. Extract highlights from baseball video game video with Hidden Markov Model. in Proc. IEEE ICIP, 2002.
    
    [62] Z. Xiong, R. Radhakrishnan, T.S. Huang. Audio events detection based highlights extraction from baseball, golf and soccer games in a unified framework. in Proc. IEEE Int. Conf. Acoustics, Speech and Signal Processing (ICASSP), vol. 5, pp: 632-635, 2003.
    
    [63] G. Xu, Y.F. Ma, H.J. Zhang. An HMM-Based Framework for Video Semantic Analysis. IEEE Trans. on Circuit sys. and Video Techno., vol.15, pp: 1422-1433, 2005.
    
    [64] X. Sun, G. Jin, M. Huang. Bayesian network based soccer video event detection and retrieval, in Multispectral Image Processing and Pattern Recognition, Beijing, China, Oct. 2003.
    
    [65] F. Wang, Y.F. Ma, H.J. Zhang. A Generic Framework for Semantic Sports Video Analysis Using Dynamic Bayesian Networks. International Conference on Multimedia Modelling. pp: 115-122. 2005.
    
    [66] C.L. Huang, Se, H.C. Shih. Semantic Analysis of Soccer Video Using Dynamic Bayesian Network. IEEE Trans. Multimedia, vol. 8, pp: 749-760, 2006.
    
    [67] C.C. Chang, C. J. Lin. LIBSVM: a library for support vector machine. 2001. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm, 2008.
    
    [68] T. Joachims. SVMlight is an implementation of Support Vector Machines (SVMs). 2004. Software available at http://www.cs.cornell.edu/people/tj/svmlight/, 2008.
    
    [69] R.E. Schapire. The Boosting Approach to Machine Learning an Overview. MSRI. Workshop on Nonlinear Estimation and Classification, 2002.
    
    [70] Y. Freund, R.E. Schapire. A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting. Journal of Computer and System Science. pp: 119-139,1997.
    
    [71] Z.J. Zha, T. Mei, Z.F. Wang, Building a Comprehensive Ontology to Refine Video Concept Detection," ACM SIGMM Multimedia Information Retrieval, Augsburg, Germany, Sept. 2007.
    
    [72] X.Y. Wei, C.W. Ngo, Ontology-Enriched Semantic Space for Video Search, ACM Multimedia (MM'07), Augsburg, Germany, Sep. 2007.
    
    [73] M. Naphade, J. R. Smith, S.F. Chang. Large-scale concept ontology for multimedia. IEEE Multimedia, 2006.
    
    [74] C.G.M. Snoek, B. Huurnink, L. Hollin. Adding semantics to detectors for video retrieval. IEEE Trans. Multimedia, 2007.
    
    [75] M. Koskela, A. F. Smeaton, J. Laaksonen. Measuring concept similarities in multimedia ontologies: Analysis and evaluations. IEEE Trans. Multimedia, 2007.
    
    [76] Christel, M. Carnegie Mellon University Traditional Informedia Digital Video Retrieval System. Proc. CIVR'07, Amsterdam, pp: 647-653, 2007.
    
    [77] A.G. Hauptmann. Towards a large scale concept ontology for broadcast video. in Proc. CIVR, pp: 674-675. 2004.
    
    [78] A.B. Benitez, S.F. Chang, J.R. Smith. IMKA: A multimedia organization system combining perceptual and semantic knowledge. in ACM Multimedia, 2001.
    
    [79] C. Fellbaum. WordNet: an electronic lexical database. Cambridge, MA: The MIT Press, 1998.
    
    [80] H. Liu, P. Singh, Conceptnet: A practical commonsense reasoning toolkit, BT Technology Journal, vol. 22, no. 4, pp: 211-226, 2004.
    
    [81] X.Q. Zhu, X.D. Wu. Video Data Mining: Semantic Indexing and Event Detection from the Association Perspective. IEEE Trans. Knowledge and Data Engineer. vol. 17, pp: 665-677, 2005.
    
    [82] E. Izquierdo, K. Chandramouli, M. Grzegorzek. K-Space Content Management and Retrieval System 2007. ICIAPW 2007. Sept. 10-13. pp: 131-136. 2007.
    
    [83] I. Kompatsiaris, Y. Avrithis. Achieving integration of knowledge and content technologies: The AceMedia Project. Proc. Euorpean Workshop on the Integration of Knowledge, Semantics and Digital Media Technology, Royal Statistical Society, London, UK, Nov.2004.
    
    [84] J. Calic, N. Cambell, S. Dasiopoulou. A survey on multimodal video representation for semantic retrieval. Proc. of IEEE Int. Conf. Computer as a Tool. vol. 1 pp: 135-138, 2005.
    
    [85] J. Calic, N. Cambel, M. Mirmehdi. ICBR-multimedia management system for intelligent content-based retrieval. In International Conference on Image and Video Retrieval, CIVR'04, Lncs 3115, pp: 601-609, 2004.
    
    [86] X.S. Zhou, T.S. Huang. Relevance feedback in image retrieval: a comprehensive review. Multimedia Syst. vol. 8(6), pp: 536-544, 2003.
    [87] M.E.J. Wood, N.W. Cambell, B.T. Thomas. Iterative refinement by relevance feedback content-based digital image retrieval. In ACM Multimedia 98, pp: 13-20,1998.
    
    [88] W. Zhou, A. Vellaikal, C.C. J. Kuo. Rule-based video classification system for basketball video indexing. in ACM Multimedia Conf., Los Angeles, CA, Nov. 2000.
    
    [89] J.A. Lay, G. Ling. Semantic retrieval of multimedia by concept languages: treating semantic concepts like words. IEEE Signal Processing Magazine. pp: 115-123, 2006.
    
    [90] C. Breiteneder, S. Gibbs, D. Tsichritzis. Modeling of audio/video data. Proc. International Conference on Entity-Relationship Approach. pp: 322-339,1992.
    
    [91] J. F. Allen. Maintaining knowledge about temporal intervals. Communications of the ACM. vol. 26 (11), pp: 832-843,1983.
    
    [92] E. Oomoto, K. Tanaka. OVID: design and implementation of a video-object database system. IEEE Knowledge and Data Engineering. vol. 5(4), pp: 629-643. 1993.
    
    [93] R. Weiss, A. Duba, D. Gifford Content-based access to algebraic video. Proc. of Int. Conf. on Multimedia Computing and Systems, ICMCS, pp: 140-151,1994.
    
    [94] H. Jiang, A. Elmagarmid. Spatial and temporal content-based access to hypervideo databases. VLDB Journal, No. 7, pp: 226-238,1998.
    
    [95] V. Gaede, O. Gunther. Multidimensional access methods. ACM Computing Surveys. vol. 30 (2), pp: 170-231,1998.
    
    [96] W. Niblack. The QBIC Project: Querying Images by Content Using Color, Texture and Shape. Proc. SPIE. on Storage and Retrieval for Image and Video Databases, San Jose, CA, 1993.
    
    [97] [Online] Available: http://www.ctr.columbia.edu/advent/, 2008.
    
    [98] S.F. Chang, W. Chen, H. Meng. VideoQ: An Automated Content-Based Video Search System Using Visual Cues. ACM 5th Multimedia Conference, Seattle, WA, Nov. 1997.
    
    [99] J.R. Smith, S.F. Chang. VisualSEEk: A Fully Automated Content-Based Image Query System. ACM Multimedia Conference, Boston, MA, November 1996.
    
    [100] [Online] Available: http://www.ctr.columbia.edu/webseek, 2008.
    
    [101] A. Pentland, R.W. Picard, S. Sclaroff. Photobook: Content-Based Manipulation of Image Databases. Computer Vision, vol. 18 (3), pp: 233-254,1995.
    
    [102] A. Amir, G. Ashour, S. Srinivasan. Towards automatic real time preparation of on- line video proceedings for conference talks and presentations. In: Hawaii Proc. on System Sciences, Los Alamitos, California, pp: 1-8, 2001.
    
    [103] [Online] http://www.informatik.uni-mannheim.de/informatik/pi4/projects/MoCA/, 2008
    
    [104] C.G.M. Snoek, M.Worring, D.C. Koelma. A learned lexicon-driven paradigm for interactive video retrieval.IEEE Trans. Multimedia, vol. 9(2), pp:280-292, 2007.
    
    [105] [Online] Available: http://www.k-space.eu/, 2008.
    
    [106] [Online] Available: http://www.cost292.org/, 2008.
    
    [107] TREC Video Retrieval Evaluation (TRECVID): http://www-nlpir.nist.gov/projects/ trecvid/.
    [1]V.Kobla,D.Doermann.Identifying sports videos using replay,text and camera motion features.SPIE.vol.3972,pp:332-345.2000.
    [2]E.Sahouria,A.Zakhor.Content analysis of video using principal components.IEEE Trans.CSVT,vol.9,pp:1290-1298,1999.
    [3]A.Smolic,T.Sikora.Long-term global motion estimation and its application for sprite coding,content description,and segmentation.IEEE Trans.on Circuits and Systems for Video Technology,pp:1227-1242,2002.
    [4]S.Cheo,S.H.Lim,B.K.Sin.Tracking non-rigid objects using probabilistic Hausdorff distance matching Pattern Recognition,pp:2373-2384,2005.
    [5]B.C.Shen,H.C.Shih;C.L.Huang.Real-time human motion capturing system.IEEE Conference on Image Processing,pp:1322-1325,2005.
    [6]P.R.Giaccone,D.Greenhill.Creating virtual scenes using active contours and global motion estimation.In:Proc.Pattern Recognition,Australia,pp:1505-1507,1998.
    [7]Y.Lu,W.Gao,F.Wu.Efficient background video coding with static sprite generation and arbitrary-shape spatial prediction techniques.IEEE Trans.on Circuits and Systems for Video Techn.,pp:394-405,2004.
    [8]J.F.Chen,H.Y.Mark.Fast video retrieval via the statistics of motion.IEEE Int.Conf.on Acoustics,Speech,and Signal Processing.March 18-23,USA,pp:437-440,2005.
    [9]Z.Li,A.K.Katsaggelos and B.Gandhi.Fast video shot retrieval based on trace geometry matching,IEE Proc.-Vis.Image Signal Process,vol.152(3),pp:367-373,2005.
    [10]俞天力,章毓晋.基于全局运动信息的视频检索技术.电子学报,vol.29,pp:1794-1798,2001.
    [11]H.Alzoubi,W.D.Pan.Very Fast Global Motion Estimation using Partial Data.in Proc.of the Conference on Acoustics,Speech,and Signal Processing(ICASSP),Hawaii,April,2007.
    [12]F.Zhou,R.Wu,Z.Bao.Approach for single channel SAR ground moving targetimaging and motion parameter estimation. IET Radar Sonar Navig, pp: 59-66, 2007.
    
    [13] P.K.Mbonye, F.P. Ferrie. Attentive Visual Servoing in the compressed domain for Un-calibrated Motion Parameter Estimation of Road Traffic. Proc. Pattern Recognition, pp: 908-911, 2006.
    [14] P.M. Kuhn. Camera motion estimation using feature points in MPEG compressed domain. Proc. ICIP'00, pp: 596-599, 2000.
    
    [15] Tan Y.P, Drew D. Saur, Sanjeev R. Kulkarni. Rapid Estimation of Camera Motion from Compressed Video with Application to Video Annotation. IEEE Trans on Circuits and Systems for Video Technology, pp: 133-146, 2000.
    
    [16] T.L. Yu, Y.J. Zhang. Motion feature extraction for content-based video sequence retrieval. SPIE Proceedings vol. 4311, San Jose, CA, USA, Jun, pp: 378-388, 2001.
    
    [17] Smolic A. Ohm J. R. Robust global motion estimation using a simplified M-estimator approach. Proc. ICIP, pp: 868-871, 2001.
    
    [18] S.F. Chang, W. Chen. Semantic visual templates: linking visual features to semantics. Proc. ICIP'98,531-535,1998.
    
    [19] A. Divakaran. H.Sun. Descriptor for spatial distribution of motion activity for compressed video. SPIE. pp:392-398, 2000.
    
    [20] G.B. Rath, A.Makur. Iterative least squares and compensation based estimations for four-parameters linear global motion modeln. IEEE TCSVT, pp: 175-199, 1999.
    [21] J.Y. Wang. Representation moving images with layers. IEEE trans. IP, 1 pp: 645-638. 1994.
    [22] J.K. Paik, Y.C. Park .An edge detection approach to digital image stabilization based on t ri-state adaptive linear neurons. IEEE Transactions on Consumer Electronics, pp: 521-530, 1991.
    [23] Y.W. He, B. Feng, S.Q. Yang. Fast global motion estimation for global motion compensation coding. In: Proce. IEEE International Symposium on Circuit and Systems, pp: 233-236, 2001.
    [24] P. Over, T. Ianeva, W. Kraaij, A. Smeaton. TRECVID 2005 an overview. In Proceedings of TRECVID 2005. NIST, 2005.
    
    [25] S Dagtas. Models for Motion-Based Video Indexing and Retrieval IEEE Trans. on Image Processing, pp: 1057-1071, 2000.
    
    [26] Y.P. Tan, S.R. Kulkarni, P.J. Ramadge. A new method for camera motion parameter estimation. In. IEEE International Conference on Image Processing, ICIP'95, pp: 406-409, 1995.
    [27] R. Ewerth, M. Schwalb. Estimation of Arbitrary Camera Motion in MPEG Videos. IEEE Proceedings of the 17th International Conference on Pattern Recognition, pp: 1051-1061, 2004.
    [28] [Online] Available: http://www.w3.org/XML/, 2008.
    [29] [Online] Available: http://www.w3.org/TR/xquery/, 2008.
    [30] Clarkware Consulting. Hunter Digital Ventures, LLC. Getting Started With BumbleBee. 2003.
    [1]http://www-nlpir.nist.gov/projects/trecvid/,2008.
    [2]O.Paul,A.George,K.Wessel.TRECVID 2007-Overview.In:TRECVID Workshop at NIST,Gaithersburg,MD,2007.
    [3]R.Lienhart.Reliable transition detection in videos:a survey and practitioner's guide.Int.J.Image Graph,vol.1,no.3,pp:469-486,2001.
    [4]A.Hanjalic.Shot boundary detection:unraveled and resolved? IEEE Trans.Circuits Syst.Video Technol.,vol.12,no.2,pp:90-105,Feb.2002.
    [5]U.Gargi,R.Kasturi,and S.H.Strayer.Performance characterization of video-shot-change detection methods.IEEE Trans.Circuits Syst.Video Technol.,vol.10,no.1,pp:1-13,Feb.2000.
    [6]J.S.Boreczky,L.A.Rowe.Comparison of video shot boundary detection techniques,in Proc.SPIE Storage Retrieval Image Video Databases Ⅳ,Jan.vol.2664,pp:170-179,1996.
    [7]R.Lienhart,"Comparison of automatic shot boundary detection algorithms," in Proc.SPIE Image Video Process.Ⅶ,Jan.vol.3656,pp:290-301,1999.
    [8]S.Lef(?)vre,J.Holler,N.Vincent.A review of real-time segmentation of uncompressed video sequences for content-based search and retrieval.Real-Time Imag.,vol.9,no.1,pp:73-98,2003.
    [9] T. Kikukawa, S. Kawafuchi. Development of an automatic summary editing system for the audio visual resources. Trans. IEICE, vol. J75-A, no. 2, pp: 204-212,1992.
    
    [10] S. K. Choubey, V. V. Raghavan. Generic and fully automatic content-based image retrieval using color. Pattern Recog. Lett., vol. 18, pp: 1233-1240,1997.
    
    [11] H.J. Zhang, C.Y. Low, S.W. Smoliar. Video parsing and browsing using compressed data. Multimedia Tools Appl., vol. 1, no. 1, pp: 89-111,1995.
    
    [12] J.a Tesic, B.S. Manjunath. Nearest neighbor search for relevance feedback. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'03), 2003.
    
    [13] A.D.Doulamis, N.D. Doulamis, S.D. Kollias. Relevance feedback for content-based retrieval in video databases: a neural network approach. Proceedings of ICECS'99, vol.3, pp: 1745-1748,1999.
    
    [14] J.F. Chen, H.Y.M. Liao. Fast video retrieval via the statistics of motion. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing. Philadelphia, PA, USA, pp :437-440, 2005.
    
    [15] R. Zabih, J. Miller, K. Mai. A feature-based algorithm for detecting and classifying scene breaks. in Proc. ACM Multimedia, San Francisco, CA, Nov. pp: 189-200, 1995.
    
    [16] R. Zabih, J. Miler, K. Mai. A feature-based algorithm for detecting and classifying production effects. Multimedia Systems 7, pp: 119-128, 1999.
    
    [17] P. Bouthemy, M. Gelgon, F. Ganansia. A unified approach to shot change detection and camera motion characterization. IEEE Trans.Circuits Syst. Video Technol., vol. 9, pp: 1030-1044, 1999.
    
    [18] P. Aigrain, P. Joly. The automatic real-time analysis of film editing and transition effects and its applications. Computers and Graphics, vol. 18(1), pp: 93-103, 1994.
    
    [19] C.W. Ngo, T.C. Pong, R.T. Chin.Video partitioning by temporal slice coherency. IEEE Trans. Circuits Syst. Video Technol., vol. 11, no. 8, pp: 941-953, 2001.
    
    [20] M. J. Swain. Interactive indexing into image databases. Proc. SPIE Conf. Storage and Retrieval in Image and Video Databases, pp: 173-187, 1993.
    
    [21] M. Ahmed, A. Karmouch, S. Abu-Hakima. Key frame extraction and indexing for multimedia databases. in Proc. Vis. Interface Conf., pp: 506-511. 1999.
    
    [22] G. Pass, R. Zabih. Comparing images using joint histograms. Multimedia Systems, 1999.
    
    [23] U. Gargi, S. Oswald, D. Kosiba. Evaluation of video sequence indexing and hierarchical video indexing. SPIE Conf. Storage and Retrieval in Image and Video Databases, pp: 1522-1530, 1995.
    
    [24] R. Zabih, J. Miler, K. Mai. A feature-based algorithm for detecting and classifying production effects. Multimedia Systems, pp: 119-128., 1999.
    
    [25] W.J. Heng, K.N. Ngan. High accuracy flashlight scene determination for shot boundary detection," Signal Process.: Image Commun., vol. 18, no. 3, pp: 203-219, 2004.
    
    [26] B. Shahraray. Scene change detection and content-based sampling of video sequences. Proc. IS&T/SPIE 2419, pp: 2-13,1995.
    
    [27] M. Leszczuk, Z. Papir. Accuracy versus speed tradeoff in detecting of shots in video content for abstracting digital video libraries. in Lncs 2515, London, U.K, vol., pp: 176-18, 2003.
    
    [28] S.C. Jun, S.H. Park. An automatic cut detection algorithm using median filter and neural network. in Proc. ITC-CSCC'OO, pp: 1049-1052.2000.
    
    [29] M. Cooper, "Video segmentation combining similarity analysis and classification," in Proc. ACM Multimedia 2004, Oct. pp: 252-255, 2004.
    
    [30] T.Volkmer, S.M.M.Tahaghoghi, H.Williams. RMITuniversity at trecvid 2004. in Proc.TRECVID Workshop, 2004 [Online].Available: http://www-nlpir.nist.gov/projects/tvpubs/ tvpapers04/rmit.pdf
    
    [31] T. Truong, C. Dorai, and S. Venkatesh. New enhancements to cut, fade, and dissolve detection processes in video segmentation. in ProcACM Multimedia, Los Angeles, CA, pp: 219-227. 2000.
    
    [32] B.L. Yeo, B. Liu. Rapid scene analysis on compressed video. IEEE Trans. Circuits Syst. Video Technol., vol. 5, no. 6, pp: 533-544, Dec. 1995.
    
    [33] N. Vasconcelos, A. Lippman. Statistical models of video structure for content analysis and characterization. IEEE Trans. Image Process., vol. 9, no. 1, pp: 3-19, Jan. 2000.
    
    [34] M.R. Naphade, R. Mehrotra, A. Ferman. A high-performance shot boundary detection algorithm using multiple cues," in IEEE Inte. Conf. Image Process., pp: 884-887. 1998.
    
    [35] C.W. Ngo. A robust dissolve detector by support vector machine. in Proc. ACM Multimedia, pp: 283-286. 2003.
    
    [36] T.S. Chua, H. Feng, C. A. An unified framework for shot boundary detection via active learning. in Proc. ICASSP, Hong Kong, Apr., vol. 2, pp: 845-848. 2003.
    
    [37] H. Feng, W. Fang, S. Liu. A new general framework for shot boundary detection and key-frame extraction. in Proc. 7th ACM SIGMM Int. Workshop Multimedia. pp: 121-126, 2005.
    
    [38] Z. Liu, E. Zavesky, D. Gibbon. AT&T RESEARCH AT TRECVID 2007, TRECVID Workshop at NIST, Gaithersburg, MD, 2007.
    
    [39] J.H.i Yuan, H.Y. Wang, L. Xiao. A Formal Study of Shot Boundary Detection. IEEE Trans. Circuits Syst. Video Techn. vol.17(2): 168-186, 2007.
    
    [40] K.Matsumoto, M.Naito, K.Hoash. SVM-Based Shot Boundary Detection with Novel Feature. In Proc. IEEE Int. Conf. Multimedia and Expo, pp:1837-1840, 2006.
    
    [41] Z.C. Zhao, A.N. Cai. Shot Boundary Detection Algorithm in Compressed Domain Based on Adaboost and Fuzzy Theory. Lncs vol.4222, pp: 617- 626, 2006.
    
    [42] J. Yuan, J. Li, F. Lin. A unified shot boundary detection framework based on graph partition model. in Proc. ACM Multimedia'05, Nov. pp: 539-542,2005.
    
    [43] H. Feng, W. Fang, S. Liu. A new general framework for shot boundary detection and key-frame extraction. in Proc. 7th SIGMM Multimedia. pp: 121-126, 2005.
    
    [44] R. Lienhart. Reliable dissolve detection. in Proc. SPIE Storage Retrieval Media Database, Jan. vol. 4315, pp:219-230, 2001.
    
    [45] W. Zheng, J. Yuan, H. Wang. A novel shot boundary detection framework. in Proc. SPIE Vis. Commun. Image Process., Jun., vol. 5960, pp: 410-420. 2005.
    
    [46] A. Hampapur, R. Jain, T.Weymouth. Digital video segmentation. in Proc. ACM Multimedia, pp: 357-364. 1994.
    
    [47] A.M. Alattar. Detecting, compressing dissolve regions in video sequences with a DVI multimedia image compression algorithm. in Proc. IEEE ISCAS, May, vol. 1, pp: 13-16, 1993.
    
    [48] H. Zhang, A. Kankanhalli, S.W. Smoliar.Automatic partitioning of full-motion video. Multimedia Syst., vol. 1, no. 1, pp: 10-28, Jun. 1993.
    
    [49] Y. Lin, M.S. Kankanhalli, T.S. Chua. Temporal multiresolution analysis for video segmentation. in Proc. SPIE Conf. Storage Retrieval Media Database, vol. 3972, pp: 494-505. 2000.
    
    [50] D. N.Bhat, S.K.Nayar. Ordinal measures for image correspondence, IEEE Trans. on Pattern analy. and machine Intell.,vol.20, no.4, pp: 415-423, 1998.
    
    [51] C.S. Won, D.K. Park, S.J. Park. Efficient Use of MPEG-7 Edge Histogram Descriptor. ETRI Journal, vol.24, no.1, Feb. pp:23-30, 2002.
    
    [52] A.M. Ferman, A.M. Tekalp, R. Mehrotra. Robust color histogram descriptors for video segment retrieval and identification. IEEE Trans. Image Process. vol.11(5), pp: 497-508, May 2002.
    
    [53] C.C. Chang, C. J. Lin LIBSVM: a library for support vector machine. 2001. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm, 2008.
    
    [54] V.Vpanik, The Nature of Statistical Learning Theory. New York: Springer Verlag, 1995.
    [1]J.Jeon,R.Manmatha.Automatic Image Annotation of News Images with Large Vocabularies and Low Quality Training Data.ACM Multimedia,2004.
    [2]Rong Jin,Joyce Y.Chai.Effective Automatic Image Annotation Via A Coherent Language Model and Active Learning.ACM MM'2004.
    [3]Mischa M.Tuffield,Stephen Harris.Image annotation with Photocopain.In Proceedings First International Workshop on Semantic Web Annotations for Multimedia(SWAMM 2006).
    [4]X.J Wang,L.Zhang.Image Annotation Using Search and Mining Technologies.ACM Multimedia,2006.
    [5]R.F.Zhang,Z.F.Zhang.A Probabilistic Semantic Model for Image Annotation and Multi-Modal Image Retrieval.IEEE International Conference on Computer Vision(ICCV'05).
    [6]S.L.Feng,R.Manmatha,and V.Lavrenko.Multiple bernoulli relevance models for image and video annotation.In The International Conference onComputer Vision and Pattern Recognition,Washington,DC,June,2004.
    [7]E.Cuevas,D.Zaldivar,R.Rojas,Kalman Filter for vision tracking,Technical Report B 05-12,Freie Universit(a|¨)t Berlin,Fachbereich Mathematik und Informatik,2005.
    [8]C.Stauffer,W.E.L.Grimson.Adaptive background mixture models for real-time tracking.Proc Computer Vision and Pattern Recognition,CVPR'99.
    [9]M.Arulampalam,S.Maskell,N.Gordon.A Tutorial on Particle Filters for Online Nonlinear/Non-Gaussian Bayesian Tracking.IEEE Trans.on Signal Process.,vol.50,pp:174-189,2002.
    [10]D.Comaniciu,P.Meet.Mean shift:A robust approach toward feature space analysis.IEEE Trans.on Pattern Analysis and Machine Intelligence.pp:603-619,2002.
    [11]M.Treisman.A feature-integration theory of attention.Journal Psychol.pp:27-34,1969.
    [12]W.James.The Principles of Psychology.Harvard University Press,1890.
    [13]D.E.Broadbent,Perception and communication.Pergamon Press,Oxford,1958.
    [14]J.Deutsch,D.Deutsch,Attention:Some theoretical considerations.Psychological Review,vol.70,pp:80-90,1963.
    [15]A.M.Treisman,G.Gelade.A Feature-Integration Theory of Attention.Cognitive Psychology,vol.12,no.1,pp:97-136,Jan.1980.
    [16]Treisman.Perception of features and objects.In:Visual Attention,Oxford University Press,New York,1998.
    [17]Koch,S.Ullman.Shifts in selective visual attention:towards the underlying neural circuitry.Human Neurobiology,vol.4,pp:219-227,1985.
    [18]F.Crik,C.Koch.Some reflections on visual awareness.In:Proc.of the Cold Spring Harbor Symposia on Quantitative Biology,vol.LV,Cold Spring Harbor Laboratory Press,1990.
    [19] L. Itti, C. Koch, E. Niebur. A model of saliency-based visual attention for rapid scene analysis. IEEE Trans. Pattern Anal. Machine Intell., vol. 20, no. 11, pp: 1254-1259, Nov. 1998.
    
    [20] R. Carmi, L. Itti. Causal Saliency Effects During Natural Vision. In: Proc. ACM Eye Tracking Research and Applications, pp: 11-18,173.2006.
    
    [21] L. Itti. Quantitative modelling of perceptual salience at human eye position. Visual Cognition, 14, pp: 959-984, 2006.
    
    [22] L. Itti. Modeling primate visual attention. In J. Feng (Ed.), Computational neuro-science: A comprehensive approach. Boca Raton, FL: CRC Press, pp: 635-655, 2003.
    
    [23] L. Itti.Automatic foveation for video compression using a neurobiological model of visual attention. IEEE Transactions on Image Processing, vol. 13(10), pp: 1304-1318, 2004.
    
    [24] Y.F. Ma, L. Lu, H. J. Zhang. A user attention model for video summarization. in Proc. ACM Int. Conf. Multimedia, France, Dec, pp: 533-542, 2005.
    
    [25] A.A. Salah, E. Alpaydin, L. Akarun. A selective attention-based method for visual pattern recognition with application to handwritten digit recognition and face recognition. IEEE Trans. Pattern Anal. Machine Intell., vol. 24, no. 3, pp: 420-425, Mar. 2002.
    
    [26] R. Mantiuk, K. Myszkowski. Attention Guided MPEG Compression for Computer Animations. Proceedings of Spring Conference in Computer Graphics (SCCG'03), pp 262-267, 2003.
    
    [27] L. Chen, X. Xie, X. Fan. A visual attention model for adapting images on small displays. Multimedia Syst., pp: 353-364. 2003.
    
    [28] M. Kass, A. Witkin, D. Terzopoulos. Snakes: Active contour models. ICCV, pp: 259-267,1988.
    
    [29] P.L. Palmer, H. Dabis, J. Kittler. Prformance measure for boundary detection algorithms, omput. Vis. Image Understand., vol. 63, pp: 476-94,1996.
    
    [30] S. Mahamud, L. R. Williams. Segmentation of multiple salient closed contours from real images. IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 25(4), pp: 433-444, 2003.
    
    [31] L. H Staib, J. S. Duncan. Boundary finding with Parametric Deformable Models. IEEE Trans. On Pattern Analysis and Machine Intelligence, vol. 14, pp: 161-175,1992.
    
    [32] Y. Deng, B.S. Manjunath. Unsupervised segmentation of color-texture regions in images and video. IEEE Transactions on Pattern Analysis and Machine Intell., vol.23, pp: 800-810, 2001.
    
    [33] V. Mezaris, I. Kompatsiaris, M.G. Strintzis. Still Image Segmentation Tools for Object-based Multimedia Applications. Pattern Recognition and Artificial Intelligence, pp: 701-725, 2004.
    
    [34] R. Adams. L. Bischof. Seeded region growing. IEEE Trans. Pattern Anal. Machine Intell., vol. 16, pp: 641-647,1994.
    
    [35] J. P. Fan, D. K. Y. Yau. Automatic image segmentation by integrating color-edge extraction and seeded region growing. IEEE Trans. on Image Processing, vol. 10, pp: 1454-1466, Oct. 2001.
    
    [36] Y.L. Chang, X. Li. Adaptive image region growing. IEEE Trans. on Image, pp: 868-872,1994.
    
    [37] D. Geiger, A. Yuille. A Common Framework for Image Segmentation. Int. Journal of Computer Vision, vol. 6, pp: 227-243,1991.
    
    [38] S. A. Hijjatoleslami, J. Kitter. Region growing: A new approach. IEEE Trans. On Image Proc, vol. 7, pp: 1079-1084,1998.
    
    [39] Kompatsiaris, M. G. Strintzis. Content-based Representation of Colour Image Sequences. IEEE Int. Conf. on Acoustics, Speech and Signal Proc. 2001 (ICASSP), USA, 2001.
    [40] J.W. Han, M.J. Li, H.J Zhang. Unsupervised extraction of visual attention objects in color images. IEEE Trans. Circuits Syst. Video Techn. vol. 16(1): 141-145, 2006.
    
    [41] J. Luo, C. Guo. Perceptual grouping of segmented regions in color images. Pattern Recognit., vol. 36, pp:2781-2792, 2003.
    
    [42] J. Morris, J. Lee, A. G. Constantinides. Graph theory for image analysis: An approach based on the shortest spanning tree. Proc. Inst. Electr. Eng., F, vol. 133, no. 2, pp: 146-152,1986.
    
    [43] M. Wertheimer. Laws of organization in perceptual forms (partial translation), in Sourcebook of Gestalt Psychology, W. B. Ellis, Ed. Orlando, FL: Harcourt Brace Jovanovich, pp: 71-88. 1938.
    
    [44] J. Shi, J. Malik. Normalized cuts and image segmentation. IEEE Trans. Pattern Anal. Mach. Intell., vol. 22, no. 8, pp: 888-905, Aug. 2000.
    
    [45] S. Wang and J. M. Siskind. Image segmentation with ratio cut. IEEE Trans. Pattern Anal. Mach. Intell., vol. 25, no. 6, pp: 675-690, Jun. 2003.
    
    [46] B. Sumengen, B. S. Manjunath. Graph partitioning active contours (GPAC) for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell., vol. 28, no. 4, pp: 509-521, Apr. 2006.
    
    [47] W.B. Tao, H. J, Y.M. Zhang. Color Image Segmentation Based on Mean Shift and Normailized Cuts. IEEE Transactions on Systems Man and Cybernetics Part B-Cybernetics. vol 37, No.5, October, pp: 1382-1389, 2007.
    
    [48] P. Scheunders. A genetic approach towards optimal color image quantization. Image Processing, vol. 7(5), pp: 1031-1034, 1996.
    
    [49] M. Cicreo. Optimal Image Quantization, Perception and the Median Cut Algorithm. Acad Bras. Cienc, pp: 303-317, 2001.
    
    [50] Gerrautz M, Purgathofer W. A simple method for color quantization: octree quantization. Proc. of ICG'98, vol. 8(6), pp: 219-230,1998.
    
    [51] R. Arthur, G. Weeks. Color Segmentation in the HIS color space using the k-means algorithm. SPIE vol. 9(6): 143-154, 1997.
    
    [52] Y.W. Lim, S.U.Lee. On the Color Image Segmentation Algorithm Based on the Thresholding and the Fuzzy c-Means Techniques. Pattern Recognition, vol. 15 (9): 935-952,1990.
    
    [53] K.J. Yoon. Human perception based color image quantization. Proc. of ICPR, pp: 664-667, 2004.
    
    [54] X.R. Hu, T.Z. Wang. A new approach of color quantization based on ant colony clustering algorithm. Proc. of ITCC, vol. 6(1), pp: 102-108, 2005.
    
    [55] Y.N. Deng, K. Charles. Peer group filtering and perceptual color image quantization. Proc. of IEEE ISCS, vol. 9(7): 21-24, 1999.
    
    [56] G. Sharma, H.J.Trussell. Digital Color Image, IEEE Transaction on Image Processing, vol. 23(6), pp: 901-932. 1997.
    
    [57] D. Martin, C. Fowlkes, J. Malik. Learning to detect natural image boundaries using local brightness, color and texture cues. IEEE Trans. PAMI. 26, 2004.
    
    [58] A.r Wardhani, Automatic Object Identification for Content Based Retrieval. [Online] Available: http://www.int.gu.edu. au/research/MICTR/aster/aster.html.
    
    [59] Y. Sun, R. Fisher. Object-based visual attention for computer vision. Artificial Intelligence, vol. 20 (11), pp: 77-123, 2003.
    [1]A.Smeaton,P.Over.TRECVID 2007-An Overview.TRECVID Workshop,US,Nov.,2007.
    [2]Z.C.Zhao,A.N.Cai.Selective Extraction of Visual Saliency Objects in Images and Videos Proc of the IEEE Int.Conf.ⅡHMSP'07,Taiwan,November,2007.
    [3]C.Yang,M.Dong,J.Hua.Region-based Image Annotation using Asymmetrical Support Vector Machine-based Multiple-Instance Learning.Proc of the IEEE int.CPVR'06,vol.2,June 17-22,2006.
    [4]F.F.Li,R.Fergus,P.Perona.One-Shot Learning of Object Categories.IEEE Trans.Pattern Analysis and Machine Intelligence,vol.28,no.4,pp:594-611,2006.
    [5] Y.G. Jiang, C.W. Ngo, J.Yang. Towards Optimal Bag-of-Features for Object Categorization and Semantic Video Retrieval. ACM CIVR'07, Amsterdam, Netherlands, 2007.
    
    [6] H.J. Zhang, C.Y. Low, S. W. Smoliar. Video parsing and browsing using compressed data. Multimedia Tools Applicat., vol. 1, pp: 89-111,1995.
    
    [7] H.J. Zhang, J. Wu, D. Zhong. An integrated system for content-based video retrieval and browsing. Pattern Recognit, vol. 30, no. 4, pp: 643-658,1997.
    
    [8] M. Markos, P. Alexandra. Keyframe extraction algorithm using entropy difference. Proc. of ACM SIGMM on Multimedia, US, pp: 39-45, 2004.
    
    [9] T.M. Liu, H.J. Zhang, F.H. Qi. A novel video key-frame extraction algorithm based on perceived motion energy mode. IEEE Trans. Circuits Syst. Video Technol., vol. 13, no10, pp: 1006-1013, 2003.
    
    [10] A. Hanjalic, H.J. Zhang. An integrated scheme for automated video abstraction based on unsupervised cluster-validity analysis. IEEE TCSVT, vol.9, pp: 1280-1289,1999.
    
    [11] Y. Zhuang. Adaptive key-frame extraction using unsupervised clustering. in Proc. IEEE Int. Conf. Image Processing, Chicago, IL, Oct. pp: 866-870,1998.
    
    [12] C. Kim, J.N. Hwang. An integrated scheme for object-based video abstraction. in Proc. 8th ACM Int. Conf. Multimedia, Los Angeles, CA, Oct. pp: 303-311, 2000.
    
    [13] L.J Liu, G.L Fan. Combined Key-Frame Extraction and Object-Based Video Segmentation. IEEE Trans. Circuits Syst. Video Technol., vol. 15, no7, pp: 869-884, 2005.
    
    [14] M.A. Smith. Video skimming and characterization through the combination of image and language understanding techniques. Proc. of Computer Vision and Pattern Recognition, 1997.
    
    [15] R. Lienhart.Dynamic video summarization of home video. SPIE, vol. 3972, pp: 378-389, 2000.
    
    [16] H. Sundaram, L. Xie, S.F. Chang. A utility framework for the automatic generation of audio-visual skims. in Proc. 10th ACM Int. Conf. Multimedia, pp: 189-198, 2002.
    
    [17] Y.F. Ma, H.J. Zhang. A model of motion attention for video skimming. in Proc. Int. Conf. Image Process., vol. 1, pp: 129-132, 2002.
    
    [18] Y.F. Ma, X.S. Hua, L. Lu. A generic framework of user attention model and its application in video summarization. IEEE Trans. Multimedia, vol. 7, no. 5, Oct. pp: 907-919, 2005.
    
    [19] J.Y. You, G.Z. Liu. A Multiple Visual Models Based Perceptive Analysis Framework for Multilevel Video Summarization. IEEE TCSVT, vol. 17, pp: 273-285, 2007.
    
    [20] C.W. Ngo, Y.F. Ma, H.J. Zhang. Video Summarization and Scene Detection by Graph Modeling. IEEE Trans. Circuits Syst. Video Technol., vol. 15, no2, pp: 296-305,2005.
    
    [21] S. Field, Screenplay: The Foundations of Screenwriting, China Film Press, Beijing, 2002. [22] M. Martin, Le langage cinematographique, China Film Press, Beijing, 1995.
    
    [23] K. Bruce, How movies work, University of California Press, Berkeley, 1992.
    
    [24] L. Itti, C. Koch. Feature combination strategies for saliency-based visual attention systems. Journal of Electronic Imaging, vol.10, pp: 161-169, 2001.
    
    [25] S. Ahmad. VISIT: a neural model of covert attention. in Advances in Neural Information Processing Systems. San Mateo, CA: Morgan Kaufmann, vol. 4, pp: 420-427,1991.
    [26] C. Osgood, G. Suci. The Measurement of Meaning. Urbana, Univ. Illinois Press, 1957.
    
    [27] J. Russell, A. Mehrabian. Evidence for a three-factor theory of emotions. J. Res. Personality, vol. 11, pp: 273-294,1977.
    
    [28] A. Hanjalic, L.Q. Xu. Affective video content representation and modeling. IEEE Trans. Multimedia, vol. 15, no. 1, pp: 143-154, 2005.
    
    [29] H.L. Wang, L.F. Cheong. Affective understanding in. film. IEEE Trans. Circuits Syst. Video Technol., vol. 16, no. 6, pp: 689-704,2006.
    
    [30] A. Hanjalic. Adaptive Extraction of Highlights From a Sport Video Based on Excitement Modeling. IEEE Trans. Multimedia, vol.7, no. 6, pp: 1114-1122, 2005.
    
    [31] M. Yeung, B. L. Yeo. Time-constrained clustering for segmentation of video into story units. in Proc. ICPR, vol. C, Vienna, Austria, Aug. pp: 375-380,1996.
    
    [32] A. Hanjalic, R. L. Lagendijk, J. Biemond. Automated high-level movie segmentation for advanced video-retrieval systems. IEEE TCSVT, vol. 9, pp: 580-88,1999.
    
    [33] H. Sundaram, S.F. Chang. Audio Scene Segmentation Using Multiple Features, Models And Time Scales. in Proc. ICASSP, Istanbul, Turkey, Jun. 2000.
    
    [34] S. Subramaniam. Toward robust features for classifying audio in the cuevideo system. in Proc. ACM Multimedia, Orlando, FL, Nov. pp: 393-400,1999.
    
    [35] J. R. Kender, B. L. Yeo. Video Scene Segmentation Via Continuous Video Coherence. in Proc. CVPR, Santa Barbara, CA, Jun. 1998.
    
    [36] H. Sundaram, S.F. Chang. Computable Scenes and structures in Films. IEEE Trans. Multimedia, vol.4, no. 4, pp: 482-491, 2002.
    
    [37] B. Adams, C. Dorai. Towards automatic extraction of expressive elements from motionpictures: tempo. IEEE Int. Conf. on Multimedia and Expo, ICME'00. vol. 2, pp: 641-644. 2000.
    
    [38] Y. Sun, R. Fisher. Object-based visual attention for computer vision. Artificial Intelligence, vol. 146, pp: 77-123, 2003.
    
    [39] J.P. Fan, K. David. Automatic image segmentation by integrating color-edge extraction and seeded region growing. IEEE Trans. on Image Processing, vol.10, no.10, pp: 1454-1466, 2001.
    
    [40] C. Koch, S. Ullman. Shifts in selective visual attention: towards the underlying neural circuitry. Human Neurobiology, vol. 4, pp: 219-227,1985.
    
    [41] A.M. Treisman, G. Gelade. A feature-integration theory of attention. Cognitive Psychology, vol. 12, Jan, pp: 97-136,1980.
    
    [42] Z.C. Zhao, A.N. Cai. A video retrieval scheme based on global motion for scenery videos. Journal of BUPT, vol.29, pp: 18-23, 2006.
    
    [43] Z.C. Zhao, X. Zen, T.Liu, A.N. Cai. BUPT at TRECVID 2007: Shot boundary detection. In TRECVID 2007 Workshop, Gaithersburg, MD, US, November, 2007.
    
    [44] M. G. Christel. Evolving video skims into useful multimedia abstractions. in Proc. SIGCHI Conf. Human Factors in Computing Systems,Los Angeles, CA, pp: 171-178,1998.
    
    [45] P. Ward. Picture of Composition for Film and Television (2nd edition). Focal Press, Oct., 2002.
    [1]Z.C.Zhao,A.N.Cai.Semantic Extraction of Hollywood Movie Based on Social Network Analysis and Film Ontology.Submitted to IEEE Trans.on Multimedia.
    [2]J.Lewis.M.Matters.Hollywood in the Corporate Era.The New American Cinema,Duke University Press,1998.
    [3]J.T.法雷尔.商业文化中的好莱坞语言.世界电影,1996.
    [4]路春艳.全球化时代美国类型电影的发展趋势.北京社会科学.pp:109-112,2006.
    [5]A.W.M.Smeulders,M.Worring,S.Santini,A.Gupta,and R.Jain,.Content based image retrieval at the end of the early years.IEEE Trans.Pattern Analysis and Machine Intelligence,vol.22,no.12,pp:1349-1380,2000.
    [6]J.Fan,Y.Gao,H.Luo,Integrating Concept Ontology and Multi-Task Learning to Achieve More Effective Classifier Training for Multi-Level Image Annotation",IEEE Trans.on Image Processing,vol.17,no.3,pp:407-426,2008.
    [7]L.Hollink.Semantic Annotation for Retrieval of Visual Resources.SIKS Dissertation.2006.
    [8]D.Zhong,H.J.Zhang,S.F.Chang.Clustering Methods for Video Browsing and Annotation.SPIE,2000,pp:283-296.
    [9]S.L.Feng,R.Manmatha and V.Lavrenko,Multiple Bernoulli Relevance Models for Image and Video Annotation.CVPR'40,2004.
    [10]M.R.Naphade,L.Kennedy,J.R.Kender,S.-F.Chang,J.R.Smith,P.Over,and A.Hauptmann.A Light Scale Concept Ontology for Multimedia Understanding for TRECVID 2005(LSCOM-Lite),IBM Research Technical Report,2005.
    [11]A.Dorado,J.Calic,E.Izquierdo.A Rule-Based Video Annotation System.IEEE TCSTV,vol.14,No.5,pp:622-633,2004.
    [12] J.P. Fan, H.Z Luo, Y.L. Gao. Incorporating Concept Ontology for Hierarchical Video Classification, Annotation, and Visualization. IEEE Trans. Multimedia, vol. 9, pp: 939-957, 2007.
    
    [13] D. Saur, Y.-P. Tan, S. Kulkarni. Automated analysis and annotation of basketball video. In Symp. Electronic Imaging: Science and Technology: Storage and Retrieval for Image and Video Databases, vol.3022, pp: 176-187,1997.
    
    [14] W.L Zhang Y, Yaginuma, M. Sakauchi. A Video Movie Annotation System: Annotation Movie with its Script. ICSP'00, pp: 1362-1366. 2000.
    
    [15] R.F Zhang, Z.F Zhang. A Probabilistic Semantic Model for Image Annotation and Multi-Modal Image Retrieval. IEEE International Conference on Computer Vision (ICCV'05), 2005.
    
    [16] D.T Chen, K. Shearer. Video OCR for Sport Video Annotation and Retrieval. 2005.
    
    [17] TREC Video Retrieval Evaluation (TRECVID): http://www-nlpir.nist.gov/projects/trecvid/.
    
    [18] [Online]. Available: http://gmazars.info/conf/cvpr2008.html.
    
    [19] [Online]. Available: http://gmazars.info/conf/eccv2008.html.
    
    [20] J. Pan and C. Faloutsos, "VideoCube: A Novel Tool for Video Mining and Classification," Proc. Int'l Conf. Asian Digital Libraries (ICADL), pp: 194-205, 2002.
    
    [21] J. Fan, Y. Gao, H. Luo. "Mining Multi-Level Image Semantics via Hierarchical Classification", IEEE Trans. on Multimedia, special issue on Multimedia Data Mining, vol. 10, pp:167-187, 2008.
    
    [22] Anna Bosch, Xavier Munoz, Robert Mart. A review: Which is the best way to organize/classify images by content? Image and Vision Computing, 2006.
    
    [23] K. Rapantzikos, Y. Avrithis, S. Kollias. On the Use of Spatiotemporal Visual Attention for Video Classifiation. MMSP, 2005.
    
    [24] T. Giannakopoulos, D. Kosmopoulos, A. Aristidou. Violence Content Classification using Audio Features, Hellenic. Artificial Intelligence Conference, LNAI 3955, pp: 502-507, 2006.
    
    [25] J. Pan, C. Faloutsos, "GeoPlot: Spatial Data Mining on Video Libraries," Proc. Int. Conf. Information and Knowledge Management, pp: 405-412, 2002.
    
    [26] Theodoros Giannakopoulos, Aggelos Pikrakis. A Multi-Class Audio Classification Method With Respect To Violent Content in Movies Using Bayesian Networks. IEEE MMSP'07, pp: 90-93, 2007.
    
    [27] Y. Wu, L. Tseng. Ontology-based Multi-Classification Learning for Video Concept Detection, in Proc. IEEE ICME'04, pp: 1003-1006, 2004.
    
    [28] C.G. M. Snoek, M.Worring, J. Geusebroek, D. C. Koelma. "The semantic pathfinder: Using an authoring metaphor for generic multimedia indexing," IEEE Trans. Pattern Anal. Machine Intell., vol. 28, pp:1678-1689, 2006.
    
    [29] C. Jorgensen, A. Jaimes, A. B. Benitez, S.F. Chang, A conceptual framework and research for classifying visual descriptors, J. Amer. Soc. Information Science (JASIS), pp: 938-947, 2001.
    
    [30] W. Zhou, A. Vellaikal, C.C. J. Kuo, "Rule-based video classification system for basketball video indexing," in ACM Multimedia Conf., Los Angeles, CA, Nov. 2000.
    
    [31] Y. Gong, M. Han, W. Hua. "Maximum entropy model-based baseball highlight detection and classification," Comput. Vis. Image Understand., vol. 26, pp: 181-199, 2004.
    
    [32] J. Fan, A. K. Elmagarmid, X. Zhu, "ClassView: Hierarchical Video Shot Classification, Indexing, and Accessing," IEEE Trans. on Multimedia, vol. 6, pp: 648-666, 2004.
    
    [33] Y. Matsuo, K. Shirahama, and K. Uehara, "Video Data Mining: Extracting Cinematic Rules from Movie," Proc. Int'l Workshop Multimedia Data Management (MDM-KDD), 2003.
    
    [34] L. Xie, S.F. Chang, A. Divakaran. Unsupervised Mining of Statistical Temporal Structures in Video, Video Mining, 2003.
    
    [35] I. Laptev, P. Perez; Retrieving actions in movies, ICCV'07, 2007.
    [36] M. Blank, L. Gorelick. Actions as space-time shapes. ICCV'05, pp: 1395-1402, 2005.
    
    [37] C. Schuldt, I. Laptev, B. Caputo. Recognizing Human Actions: A Local SVM Approach, ICPR'04,2004.
    
    [38] Y. Ke, R. Sukthankar, and M. Hebert. Efficient visual event detection using volumetric features. In Proc. ICCV'05, pp: 166-173, 2005.
    
    [39] P. Scovanner, S. Ali, M. Shah. A 3-Dimensional SIFT Descriptor and its Application to Action Recognition. MM'07, September 23-28, 2007, Augsburg, Bavaria, Germany.
    
    [40] J. C. Niebles et al., Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words," BMVC, 2006.
    
    [41] A. Yilmaz. Actions Sketch: A Novel Action Representation. CVPR'05, 2005.
    
    [42] Niebles, F.F. Li. A Hierarchical Model of Shape and Appearance for Human Action Classification CVPR'07, 2007.
    
    [43] K.Shirahama, K. Ideno, K. Uehara. Video Data Mining: Mining Semantic Patterns with temporal constraints from Movies. IEEE International Symposium on Multimedia (ISM'05). 2005.
    
    [44] A. Hanjalic. Adaptive Extraction of Highlights from a Sport Video Based on Excitement Modeling. IEEE Trans. on Multimedia, vol. 7, pp: 1114-1122, 2005.
    
    [45] Z. Xiong, R. Radhakrishnan, T. S. Huang, "Audio events detection based highlights extraction from baseball, golf and soccer games in a unified framework," in Proc. IEEE Int. Conf. Acoustics, Speech and Signal Processing (ICASSP), vol. 5, pp: 632-635,2003.
    
    [46] G. Xu, Y.F. Ma, H.J. Zhang. An HMM-Based Framework for Video Semantic Analysis. IEEE Trans. on Circuit sys. and Video Techno., vol.15, pp: 1422-1433, 2005.
    
    [47] F. V. Jensen. An Introduction to Bayesian Networks. New York, Springer, 1996.
    
    [48] X. Sun, G. Jin, M. Huang, and G. Xu, Bayesian network based soccer video event detection and retrieval, in Multispectral Image Processing and Pattern Recognition, Beijing, China, Oct. 2003.
    
    [49] F. Wang, Y.F. Ma, H.J. Zhang. A Generic Framework for Semantic Sports Video Analysis Using Dynamic Bayesian Networks. International Conference on Multimedia Modelling. 2005, pp: 115-122.
    
    [50] C.L. Huang, S. H. Chi, Semantic Analysis of Soccer Video Using Dynamic Bayesian Network. IEEE Trans. Multimedia, vol. 8, pp: 749-760,2006.
    
    [51] K. Yu, A. Schwaighofor, V. Tresp. Collaborative ensemble learning: Combining content-based information filtering via hierarchical Bayes. Int. Conf. Uncertainty in Artificial Intelligence, 2003.
    
    [52] T. Giannakopoulos, A. Pikrakis, S. Theodoridis. Violent Content in Movies Using Bayesian Networks. MMSP'07, pp: 90-93, 2007.
    
    [53] G. Guo, H.J Zhang, S.Z Li. A Multi-Class Audio Classification Method. ICME'01, 2001.
    
    [54] C.C. Chang, C. J. Lin LIBSVM: a library for support vector machine. 2001. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm.
    
    [55] T. Joachims. SVMlight is an implementation of Support Vector Machines (SVMs). 2004. Software available at http://www.cs.cornell.edu/people/tj/svmlight/.
    
    [56] Y.Freund, R.E. Schapire. A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting. Journal of Computer and System Science. pp: 119-139,1997.
    
    [57] R.E. Schapire. The Boosting Approach to Machine Learning an Overview. MSRI. Workshop on Nonlinear Estimation and Classification, 2002.
    
    [58] S. Chang, W. Jiang, A. Yanagawa. Analysis of Cross-Domain Learning Methods for High-Level Visual Concept Detection. in Proc. TRECV1D Workshop, N1ST Special Publication, Gaithersburg,2007.
    
    [59] M Wang, X.S Hua, Y. Song. Multi-Concept Multi-Modality Active Learning for Interactive Video Annotation. IEEE International Conference on Semantic Computing (ICSC), Irvine, California, USA, September, 2007.
    
    [60] Ontology Alignment [Online]. Available: http://oaei.ontologymatching.org/.
    
    [61] A. D. Maedche, Ontology Learning for the SemanticWeb. New York, Springer-Verlag, 2002.
    
    [62] Z.J Zha, T. Mei, Z.F Wang. "Building a Comprehensive Ontology to Refine Video Concept Detection," ACM SIGMM, Augsburg, Germany, Sept. 2007.
    
    [63] X.Y Wei, C.W Ngo. Ontology-Enriched Semantic Space for Video Search, ACM Multimedia (MM'07), Augsburg, Germany, Sep. 2007.
    
    [64] M. Naphade, J. R. Smith, J. Tesic, S.-F. Chang, W. Hsu, L. Kennedy, A. Hauptmann, and J. Curtis, "Large-scale concept ontology for multimedia," IEEE Multimedia, 2006.
    
    [65] A. Jaimes, J. R. Smith, "Semi-automatic, data-driven construction of multimedia ontologies," in Proc. IEEE ICME, 2003.
    
    [66] C.A. Lindley, "A multiple-interpretation framework for modeling video semantics," in ER-97 Workshop on Conceptual. Modeling in Multimedia Information Seeking, 1997.
    
    [67] J. Hunter, "Enhancing the semantic interoperability of multimedia through a core ontology," IEEE Trans. Circuits, Syst., Video Technol., vol. 13, pp: 49-58, Jan. 2003.
    
    [68] C. G. M. Snoek, B. Huurnink, L. Hollink, M. de Rijke, G. Schreiber, and M. Worring, "Adding semantics to detectors for video retrieval," IEEE Trans. Multimedia, 2007.
    
    [69] M. Koskela, A. F. Smeaton, and J. Laaksonen, "Measuring concept similarities in multimedia ontologies: Analysis and evaluations," IEEE Trans. Multimedia, 2007.
    
    [70] A.G. Hauptmann, "Towards a large scale concept ontology for broadcast video," in Proc. CIVR, 2004, pp:674-675.
    
    [71] Y.Wu, B. Tseng, and J. R. Smith, "Ontology-based multi-classification learning for video concept detection," in Proc. IEEE ICME, 2004.
    
    [72] L. Hollink, M.Worring, and G. Schreiber, "Building a visual ontology for video retrieval," in Proc. ACM Multimedia, 2005.
    
    [73] M. Christel, Carnegie Mellon University Traditional Informedia Digital Video Retrieval System. Proc. CIVR'07, Amsterdam, p. 647-653, 2007.
    
    [74] M.A. Smith, T. Kanade, "Video skimming and characterization through the combination of image and language understanding techniques," Proc. of Computer Vision, 1997.
    
    [75] A.G. Hauptmann, "Towards a large scale concept ontology for broadcast video," in Proc. CIVR, 2004, pp: 674-675.
    
    [76] Informedia. [Online] Available: http://www.informedia.cs.cmu.edu/.
    
    [77] A.B. Benitez, S.-F. Chang, and J. R. Smith, "IMKA: A multimedia organization system combining perceptual and semantic knowledge," in ACM Multimedia, 2001.
    
    [78] A.B. Benitez, J. R. Smith, and S.-F. Chang, "MediaNet: A multimedia information network for knowledge representation," Proc. SPIE, vol. 4210, 2000.
    
    [79] C.G. M. Snoek, M. Worring, J. C. van Gemert, J.-M. Geusebroek, and A. W. M. Smeulders, .The challenge problem for automated detection of 101 semantic concepts in multimedia,. in Proc. ACM Multimedia, Santa Barbara, CA, 2006.
    
    [80] R. Nevatia, T. Zhao, S. Hengeng, "Hierarchical language-based representation of events in video streams," in Proc. IEEE CVPR Workshop on Event Mining, 2003.
    
    [81] M. Naphade J. R. Smith, "On the detection of semantic concepts at TRECVID," in Proc. ACM Multimedia, 2004.
    
    [82] [Online]. Available: http://dublincore.org/.
    [83] C. G. M. Snoek, M. Worring, and A. G. Hauptmann, "Learning rich semantics from news video archives by style analysis," ACM Trans. Multimedia Comput., Commun., Applicat., vol. 2, 2006.
    
    [84] Y.Wu, B. Tseng, and J. R. Smith, "Ontology-based multi-classification learning for video concept detection," in Proc. IEEE ICME, 2004.
    
    [85] [Online]. Available: http://www.science.uva.nl/research/mediamill/demo/index.php.
    
    [86] Cees G. M. Snoek, I. Everts. The MediaMill TRECVID 2007 semantic video search engine. In Proceedings of the 5th TRECVID Workshop. Gaithersburg, USA, November 2007.
    
    [87] C.G.M. Snoek, M.Worring, J. Geusebroek. The semantic pathfinder: Using an authoring metaphor for generic multimedia indexing. IEEE Trans. Pattern Anal. Machine Intell., vol. 28, pp: 1678-1689, 2006.
    
    [88] M. R. Naphade, I. Kozintsev, and T. S. Huang, Factor graph framework for semantic video indexing, IEEE Trans. Circuits Syst. Video Technol., vol. 12, no. 1, pp: 40-52, 2002.
    
    [89] M. Naphade, T. S. Huang, A probabilistic framework for semantic video indexing, filtering and retrieval, IEEE Trans. Multimedia, vol. 3, no. 1, pp: 141-151, 2001.
    
    [90] C. Fellbaum, Ed., WordNet: an electronic lexical database. Cambridge. The MIT Press, 1998.
    
    [91] D. Lenat, R. Guha, Building Large Knowledge-based Systems: Representation and Inference in the Cyc Project. Reading, MA: Addison-Wesley, 1990.
    
    [92] H. Liu, P. Singh. Conceptnet: A practical commonsense reasoning toolkit, BT Technology Journal, vol. 22, no. 4, pp: 211-226, 2004.
    
    [93] X.Q. Zhu, X.D. Wu. Video Data Mining: Semantic Indexing and Event Detection from the Association Perspective. IEEE Trans. Knowledge and Data Engineer. vol. 17, pp: 665-677, 2005.
    
    [94] E. Izquierdo, K. Chandramouli, M. Grzegorzek. K-Space Content Management and Retrieval System. ICIAPW'07. Sept., pp: 131-136,2007.
    
    [95] J. Calic, N. Cambell, S.Dasiopoulou. A survey on multimodal video representation for semantic retrieval. IEEE Int. Conf. Computer as a Tool. EUROCON.vol.l, pp: 135-138, 2005.
    
    [96] D.J. Watts, Small Worlds: The Dynamics of Networks between Order and Randomness. Princeton Univ. Press, 1999.
    
    [97] D.J. Watts, S.H. Strogatz. Collective dynamics of small-world networks. Nature, pp: 440-442, 1998.
    
    [98] L. BarabasiA, R. Albert. Emergence of scaling in random networks. Science, pp: 509-512,1999.
    
    [99] Z. Shen, K.L. Ma. Visual Analysis of Large Heterogeneous Social Networks by Semantic and Structural Abstraction. IEEE Trans. Visualization and Compu. Graph. pp: 1427-1439,2006.
    
    [100] S. Hettich, S.D. Bay, The UCI KDD Archive, http://kdd.ics.uci.edu, Univ. of California, Irvine, Dept. of Information and Computer Science, 1999.
    
    [101] MIPT Terrorism Knowledge Base, http://www.tkb.org/, 2008.
    
    [102] J. Scripps, P.N. Tan, A.H. Esfahanian. Exploration of Link Structure and Community-based Node Roles in Network Analysis. ICDM. pp: 649-654, 2007.
    
    [103] M. Huisman, M.A.J. V. Duijn. Software for Social Network Analysis," Models and Methods in Social Network Analysis, P.J. Carrington eds., Cambridge Univ. Press, pp: 270-316, 2005.
    
    [104] B.W. Herr, W. Ke. Movies and Actors: Mapping the Internet Movie Database. IV'07,2007.
    
    [105] C.Y. Weng. Movie Analysis Based on Roles' Social Network. ICME'07, pp: 1403-1406, 2007.
    
    [106] J. Golbeck, J. Hendler. FilmTrust: movie recommendations using trust in web-based social networks. ICCNC. pp: 282-286, 2006.
    
    [107] Z. Rasheed, M. Shah. Movie genre classification by exploiting audio-visual features of previews. ICPR'02, pp: 1086-1089, 2002.
    [108] Z. Rasheed, Y. Sheikh, M. Shah. On the Use of Computable Features for Film Classification. IEEE TCSVT, vol. 1, pp: 1-11, 2003.
    
    [109] S. Fischer, R. Lienhart, W. Effelsberg. Automatic Recognition of Film Genres. ACM on Multimedia pp: 295-304,1995.
    
    [110] X. Y, W. Lai. Automatic Video Genre Categorization Using Hierarchical SVM, ICIP, 2006.
    
    [111] S. Moncrieff, S. Venkatesh, and C. Dorai Horror film genre typing and scene labeling via audio analysis, ICME03, pp: 193-196, 2003.
    
    [112] M. Yeung, B. L. Yeo. Time-constrained clustering for segmentation of video into story units, in Proc. ICPR, vol. C, Vienna, Austria, Aug. pp: 375-380, 1998.
    
    [113] A. Hanjalic, R.L. Lagendijk. Automated high-level movie segmentation for advanced video-retrieval systems. IEEE Trans. Circuits Syst. Video Technol., vol. 9, pp: 580-88, 1999.
    
    [114] Z. Rasheed, M. Shah. Scene detection in Hollywood movies and TV shows. in Proc. CVPR'03, vol. 2, pp: 343-348, 2003.
    
    [115] H. Sundaram, S. F. Chang. Computable Scenes and structures in Films. IEEE Trans. Multimedia, vol.4, pp: 482-491, 2002.
    
    [116] A.F. Smeaton, B. Lehane, N.E. Connor. Automatically Selecting Shots for Action Movie Trailers. MIR'06, October 26-27, Santa Barbara, California, USA, 2006.
    
    [117] B. Ionescu, P. Lambert, D. Coquin. Animation Movies Trailer Computation. MM'06, October 23-27, Santa Barbara, California, USA. 2006.
    
    [118] Y. Li, S.H. Lee, C.H Yeh. Techniques for Movie Content Analysis and Skimming. IEEE Signal Processing, pp: 79-89, 2006.
    
    [119] B. Adams, C. Dorai, S. Venkatesh. Towards automatic extraction of expressive elements from motion pictures: Tempo. IEEE Transactions on Multimedia, vol. 4, no. 4, Dec. 2002.
    
    [120] B. Adams, S. Venkatesh, C. Dorai. Finding the beat: an analysis of the rhythmic elements of motion pictures. International Journal of Image and Graphics, vol. 2, no. 2, pp: 215-245, 2002.
    
    [121] H.W. Chen, J.H. Kuo, W.T. Chu. Action Movies Segmentation and Summarization Based on Tempo Analysis. ACM SIGMM 2004.
    
    [122] H.B. Kang. Affective Content Detection using HMMs. Proc. of the 11th ACM Multimedia, Berkeley, CA, pp:259-262, 2003.
    
    [123] A. Salway. Extracting Information about Emotions in Films. ACM Multimedia 2003.
    
    [124] A. Hanjalic, L.Q. Xu. Affective video content representation and modeling. IEEE Trans. Multimedia, vol. 15, pp:143-154, 2005.
    
    [125] H.L. Wang, L.F. Cheong. Affective understanding in. film. IEEE Trans. Circuits Syst. Video Technol., vol. 16, pp: 689-704, 2006.
    
    [126] C.Y. Wei, N. Dimitrova, S.F. Chang: Color-mood analysis of films based on syntactic and psychological models. ICME 2004.
    
    [127] J.A. Lay, G. Ling. Semantic retrieval of multimedia by concept languages: treating semantic concepts like words. IEEE Signal Processing Magazine. pp: 115-123, 2006.
    
    [128] W. H. Adams, G. Iyengar, C.Y. Lin. Semantic indexing of multimedia content using visual, audio and text cues," EURASIP J. Appl. Signal Processing, pp: 170-185, 2003.
    
    [129] Y. Li, S. Narayanan, C.C. Jay. Content-Based Movie Analysis and Indexing Based on Audiovisual Cues. IEEE Trans. on CSVT, vol. 14, pp: 1073-1085, 2004.
    
    [130] M. Huijbregts, R. Ordelman, F. Jong. Speech-based Annotation of Heterogeneous Multimedia Content Using Automatic Speech Recognition. CTIT-technical Report, version 1.0, May 2007.
    
    [131] G.J. Qi, X.S. Hua, Y. Rui. Correlative Multi-Label Video Annotation, ACM Multimedia 2007.
    [132][Online]Available:http://baike.baidu.com/view/1417318.htm.
    [133][Online]Available:http://www.filmsite.org/Film Genres.htm.
    [134]J.T.法雷尔.商业文化中的好来坞语言.世界电影,中国电影家协会,1996.
    [135]阿尔都塞.意识形态和意识形态国家机器.当代电影,1987.
    [136]爱德华.斯图尔特.美国文化模式:跨文化视眼中的分析.天津百花文艺出版社,2000.
    [137]Standard Occupational Classification.Bureau of Labor Statistics,Survey Reports,2006.
    [138]常见英美姓名表.[Online].Available:http://bbs.nfan.org/read.php?tid=12118.
    [139]NoteTab Light.[Online].Available:http://www.notetab.com/ntl.php.
    [140]ICTCLAS.[Online].Available:http://www.nlp.org.cn/project/project.php.
    [141]J.F.Bonastre,N.Scheffer,D.Matrouf.ALIZE/SpkDet:a state-of-the-art open source software for speaker recognition,in Proc.Odyssey:the Speaker and Language Recognition Workshop,2008.
    [142]W.M.Campbell,D.E.Sturim,D.A.Reynolds,Support vector machines using GMM supervectors for speaker verification.IEEE Signal Processing Letters,vol.13,2006.
    [143]A.U.Khan,S.Khan.A New Technique for Information Summarization.Trans.on Engineering,Computing and Technology,pp:25-27,2005.
    [144]E.Filatova,V.Hatzivassiloglou.Event-based Extractive Summarization.In Proc.of ACL Workshop on Summarization.Barcelona,Spain,2004.
    [145]A.Lavie,K.Zechner.Increasing the Coherence of Spoken Dialogue Summaries by CrossSpeaker Information Linking.Carnegie Mellon University.Sept.,2000.
    [146]K.Zechner.Automatic Summarization of Spoken Dialogs in Unrestricted Domains.Carnegie Mellon University.November 2001.
    [147]J.X.Wu,S.C.Brubaker,M.D.Mullin.Fast Asymmetric Learning for Cascade Face Detection.Trans.PAMI,No.3,pp:369-382,2008.
    [148]J.W.Lu,K.N.Plataniotis,A.N.Venetsanopoulos.Ensemble-Based Discriminant Learning With Boosting for Face Recognition.IEEE Trans.Neural Networks,vol.17,pp:166-178,2006.
    [149]IMDB.[Online].Available:http://www.imdb.com,2008.
    [150][Online].Available:http://images.google.com,2008.
    [151]D.T.Chen,J.M.O.Herv.Text detection and recognition in images and video frames.Pattern Recognition,pp:595-608,2004.
    [152]Tesseract-OCR.[Online].Available:http://code.google.com/p/tesseract-ocr/.
    [153]H.Yu,M.J Li,H.J.Zhang.Color Texture Moments for Content-Based Image Retrieval.ICME'03,2003.
    [154]D.G.Lowe.Distinctive image features from scale-invariant keypoints.ICCV,pp:91-110,2004.
    [155]C.C.Chang,C.J.Lin LIBSVM:a library for support vector machine.2001.[Online]Available:http://www.csie.ntu.edu.tw/~cjlin/libsvm.
    [156][Online]Available:http://www.w3.org/XML/,2008.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700