视觉媒体语义自动提取关键技术研究

英文题名：Study on Key Techniques in Automatically Extracting Semantic Information from Visual Media
作者：蒋树强
论文级别：博士
学科专业名称：计算机应用
中文关键词：语义提取 ; 视觉媒体 ; 体育视频分析 ; 图像分类 ; 支持向量机 ; 本体 ; 艺术图像 ; 高斯混合模型
英文关键词：Semantic extraction ; visual media ; sports video analysis ; image classification ; support vector machine ; ontology ; art image ; and Gaussian mixture models
学位年度：2005
导师：高文
学科代码：081203
学位授予单位：中国科学院研究生院（计算技术研究所）
论文提交日期：2005-09-01

摘要

近几年来,随着计算机和网络技术的发展,数字化视频与图像信息越来越多的涌现,基于多媒体信息服务的信息时代正在向我们走来。人们对视频和图像等视觉媒体内容的需求也越来越多,越来越广泛。这就需要行之有效的技术手段来满足用户的各种需求。而“语义鸿沟”是横在人与计算机和谐交互中的一个重要障碍,这是由于人的大脑对视觉媒体的评判标准和计算机系统对视觉媒体的评判标准存在着很大差异。虽然目前针对视觉媒体的语义分析和理解有了很多研究,但这一倍受关注的技术还远远不能满足用户的普遍需求。他们需要利用更多自动提取的语义信息。
     本文对视觉媒体语义自动提取中的几项关键技术进行了研究,提出了语义提取的四层技术框架,即对象语义层、场景语义层、知识及情感语义层和语义应用层,并分别研究了对象检测、场景分类、高级语义概念提取和基于本体的语义应用等多项关键技术。由于想找到一条普遍通用的语义提取技术是非常困难的,因此往往针对给定应用和利用专业领域知识对特定的视觉媒体内容采取各个击破的策略来分析和自动理解。体育视频的分析和理解由于具有广泛的用户群和巨大的市场潜力而成为近几年来的一个热门研究方向,而随着北京奥运会的临近,体育视频的语义分析和理解对中国具有更强的现实意义。另一方面,通过计算机技术对数字化艺术图像进行分析,并提取它们类别、风格、以及包含的内容等语义信息是一个非常重要而且迫切的问题,正逐渐获得越来越多的关注,国画是中华艺术的瑰宝,对国画等数字化艺术图像的研究也是一个重要的问题。因此本文针对视频和图像这两种视觉媒体,分别研究了体育视频和艺术图像中的语义提取技术。最后还给出了夜景图像的场景分类方法,该技术也具有重要的应用价值。具体来说,论文主要的研究成果包括:
     1)首先对视觉媒体的语义自动提取的系统框架进行了宏观分析,这是必要的,一方面可以对整个问题有个全局的认识,另一方面可以指导我们实现具体的语义提取技术。给出其中所包含的各个层次的语义信息;并对视觉媒体语义提取的应用框架和解决方案分别进行了系统分析。
     2)针对体育视频提出了一个鲁棒的球场对象分割检测方法。在很多种体育视频的自动分析中,球场区域起着至关重要的基础性作用,许多语义线索可以在球场分割结果的基础上获取。采用高斯混合模型(GMMs)为球场区域建立颜色模型,这是由于GMMs可以对复杂的,非线性的颜色分布进行建模,从而在进行球场区域的像素检测时具有足够的通用性。经过高斯混合模型的像素检测过程之后,采用区域分析方法把检测的像素连成区域,区域分析主要包括形态学的方法和区域增长的方法,这样得到最终的分割结果。实验证明,本文提出的方法对于不同的体育视频均能有效地实现球场区域的检测。论文还研究了体育视频场景
In the past few years, techniques of computers and Internet are improving very fast. This causes the amount of image and video content increasing drastically, and more and more people could conveniently access various equipments to obtain the desired visual information. Information era as the core of multimedia information services is coming to us. Techniques to process and analyze visual information need to be constructed to meet various application demands of users on images and video clips. However,“Semantic Gap”is a great challenge for human and computer harmonious interaction; this is because low-level features used by computers could not be always interpreted to high-level concepts that are commonly used by human. Although there exist many research works on semantic analysis and understanding of visual media, this important research area is far from satisfactory, as users need more automatically extracted semantic information.
     In this thesis, we make a study on some key techniques in automatically extracting semantic information from visual media. A four-level technical framework for semantic extraction is proposed, including object semantic layer, scene semantic layer, knowledge and emotion semantic layer, and semantic application layer. Four kinds of key techniques are investigated respectively: object detection, scene classification, high-level concept extraction and ontology-based semantic application. It is hard to provide a general solution to extract all the semantic concepts from visual media, and is best approached by a divide-and-conquer strategy. Sports video always appeals to large audiences, automatically extracting useful semantic information from sports video to facilitate retrieval and organization is an important problem; and this has emerged as a hot research area recently. With the Beijing 2008 Olympic Games being near; research on semantic understanding of sports video has a special meaning for China. On the other hand, automatically analyzing and understanding digitized art images and extracting their type, style and other semantic information is an important and imperative problem that needs to be addressed. Traditional Chinese Painting is the gem of of Chinese traditional arts; research on this kind of art images is also an important problem. This thesis investigates on extracting semantics from visual media including video and image content; particularly, we concentrate on sports video and art images. At the end, we propose a technique to classify night scene images, which is also an important problem in semantic processing of images. The contributions of the thesis are as follows:
     1) Firstly, we perform a global analysis on system framework of automatically extracting semantic information from visual media. This is a necessary work, as on

引文

[Agnihotri99] Agnihotri, L., Dimitrova, N.. "Text Detection for Video Analysis", Workshop on Content Based Image and Video Libraries, held in conjunction with CVPR, Colorado, pp.109-113, 1999.
    [Ancona01] Ancona, N., Cicirelli, G., Branca, A., Distante, A., "Goal detection in football by using support vector machines for classification", IJCNN '01. International Joint Conference on Neural Networks, Volume: 1 , 15-19 July 2001
    [Ancona03] N. Ancona, G. Cicirelli, E. Stella and A. Distante, "Ball detection in static images with Support Vector Machines for classification". Image and Vision Computing, Volume 21, Issue 8, pp. 675-692, Elsevier 2003
    [Andrade03] Andrade E.L., Khan E., Woods J.C., Ghanbari M., "Player classification in interactive sport scenes using prior information region space analysis and number recognition", International Conference on Image Processing, Volume: 3 , Sept. 14-17, 2003
    [Ariki03] Yasuo Ariki, Masahito Kumano, Kiyoshi Tsukada,"Highlight scene extraction in real time from baseball live video", Proceedings of the 5th ACM SIGMM international workshop on Multimedia information retrieval,November 2003
    [Arnheim54] Arnheim R., "Art and Visual Perception: A Psychology of the Creative Eye", Regents of the University of California,Palo Alto, Calif., 1954
    [Aslandogan97] Y. Alp Aslandogan, Chuck Thier, Clement T. Yu, Jon Zou, Naphtali Rishe, "Using Semantic Contents and WordNet in Image Retrieval", SIGIR 1997: 286-295
    [Assfalg03] J. Assfalg, Marco Bertini, Carlo Colombo, Alberto Del Bimbo, Walter Nunziati, "Semantic annotation of soccer videos: automatic highlights identification", Computer Vision and Image Understanding, Special isssue on video retrieval and summarization, Volume 92,Issue 2/3,November/ December 2003
    [Babaguchi99] Babaguchi, N., Sasamori, S., Kitahashi, T., Jains, R., "Detecting events from continuous media by intermodal collaboration and knowledge use",Multimedia Computing and Systems, 1999. IEEE International Conference on , Volume: 1 , 7-11 June 1999
    [Barnard02] Kobus Barnard, Pinar Duygulu, Nando de Freitas, David Forsyth, David Blei, and Michael I. Jordan,"Matching Words and Pictures", Journal of Machine Learning Research, Vol 3, pp 1107-1135,2001
    [Benitez02A] A. B. Benitez and S.-F. Chang, "Semantic Knowledge Construction from Annotated Image Collections", Proceedings of the 2002 International Conference On Multimedia & Expo (ICME-2002), Lausanne, Switzerland, Aug 26-29, 2002
    [Benitez02B] Ana B. Benitez, Hawley Rising, Corinne Jorgensen, Ricardo Leonardi, lesandro Bugatti, Koiti Hasida, Rajiv Mehrotra, A. Murat Tekalp, Ahmet Ekin, Toby Walker, “Semantics of Multimedia in MPEG-7”, IEEE ICIP 2002
    [Bertini03] M. Bertini, A. Del Bimbo, W. Nunziati, "Model checking for detection of sport highlights", Proceedings of the 5th ACM SIGMM international workshop on Multimedia information retrieval ,2003
    [Bilms98] JA. Bilms, "A gentle tutorial of the EM algorithm and its application to parameter estimation for Gaussian mixture and hidden Markov models". 1998. http://ssli.ee.washington.edu/people/bilmes/mypapers/em.ps.gz.
    [Bonzanini01] Bonzanini, A., Leonardi, R., Migliorati, P., "Event recognition in sport programs using low-level motion indices", IEEE International Conference on Multimedia and Expo, 22-25 Aug. 2001
    [Brand00] M. Brand and V. Kettnaker, "Discovery and segmentation of activities in video", IEEE Trans. PAMI, 22(8): 844–851,August 2000.
    [Breen02A] C. Breen, L. Khan, A. Ponnusamy, and L. Wang, "Ontology-based Image Classification Using Neural Networks", Proc. of SPIE Internet Multimedia Management Systems III, pp. 198-208, Boston, MA, July, 2002.
    [Breen02B] Casey Breen, Latifur Khan, Arunkumar Ponnusamy, "Image Classification Using Neural Networks and Ontologies", 13th International Workshop on Database and Expert Systems Applications (DEXA'02) September 02-06, 2002, France
    [Breiman84] L.Breiman, J.Friedman, R.Olshen, and C. Stone, "Classification and Regression Trees", Montery, CA: Wadsworth, 1984
    [Bobick95] A. Bobick and A. Wilson, "A state-based technique for the summarization and recognition of gesture," in Proc. Int. Conf. on Computer Vision, 1995, pp. 382–388.
    [Canny86] J.F. Canny, "A computational approach to edge detection", IEEE Transactions on Pattern Analysis and Machine Intelligence, 8(6): 679-698,1986
    [Carson99] C.Carson, M.Thomas, S.Belongie, J.M.Hellerstein, and J.Malik, “Blobworld: A System for Region-Based Image Indexing and Retrieval”, Proc. Visual Information Systems, June 1999
    [Cavazza98] Cavazza M, Green R J and Palmer I J., "Multimedia Semantic Features and Image Content Description", In Proceedings of the 1998 MultiMedia Modeling
    [Chang87] Chang S K, Shi Q Y, and Yan C W. "Iconic Indexing by 2D Strings", IEEE Trans.Pattern Analysis and Machine Intelligence, Vol.9, No.3, pp.413-428, 1987
    [Chang02] Peng Chang, Mei Han, Yihong Gong, "Extract highlights from baseball game video with hidden Markov models",International Conference on Image Processing, Volume: 1 , 22-25 Sept. 2002
    [Chen01] Datong Chen, Kim Shearer, and Hervé Bourlard, "Video OCR for Sport Video Annotation and Retrieval", Proceedings of the 8th IEEE International Conference on Mechatronics and Machine Vision in Practice, Aug 2001
    [Chen02] C.-c. Chen, A.Del Bimbo, G.Amato, N.Boujemaa, et al., "Report of the DELOS-NSF Working Group on Digital Imagery for Significant Cultural and Historical Materials," DELOS-NSF Reports, December, 2002
    [Chen03] T. Chen, M. Han, W. Hua, Y. Gong, and T. S. Huang, "A New Tracking Technique: Object Tracking and Identification from motion," in Proc. CAIP'03, LNCS2756, pp. 157-164.
    [Cheng04] Yong Cheng, Zhongzhi Shi, "Enriching Domain Ontology from Domain-Specific Documents with HowNet. Advanced Workshop on Content Computing", AWCC 2004, Chi Chi-Hung, Lam Kwok-Yan (Eds.) Lecture Notes in Computer Science, Vol. 3309, 2004
    [Colombo99] Colombo C, DelBimbo A, and Pala P., "Semantics in visual information retrieval", IEEE Multimedia, 6(3):38-- 53, 1999
    [Dagtas01] Dagtas, S., Abdel-Mottaleb, M., "Extraction of TV highlights using multimedia features", IEEE Fourth Workshop on Multimedia Signal Processing,, 3-5 Oct. 2001
    [Duan03] Ling-Yu Duan, Min Xu, Tat-Seng Chua, Qi Tian, Chuang-Sheng Xu, "A mid-level representation framework for semantic sports video analysis", ACM conference on Multimedia 2003
    [Duygulu02] Pinar Duygulu, Kobus Barnard, Nando de Freitas, and David Forsyth, "Object recognition as machine translation: Learning a lexicon for a fixed image vocabulary", Seventh European Conference on Computer Vision, pp IV: 97-112, 2002
    [Eakins96] Eakins J P., "Automatic image content retrieval - are we getting anywhere?" In Proc. of Third International Conference on Electronic Library and Visual Information Research, pages 123--135, May 1996
    [Effelsberg99] W. Effelsberg, R. Lienhart, S. Pfeiffer, "Generating video abstracts automatically", the Tenth DELOS Workshop on Audio-Visual Digital Libraries, Greece, June 1999
    [Ekin03A] A. Ekin, A. Murat Tekalp, and R. Mehrotra, “Automatic soccer video analysis and summarization,” IEEE Trans. on Image Process,12 (7) (2003) 796-807
    [Ekin03B] Ekin, and A. Murat Tekalp, “Robust dominant color region detection and color-based applications for sports video”, International Conference on Image Processing 2003
    [Ekin03C] Ekin, A., Tekalp, A.M., "Shot type classification by dominant color for sports video segmentation and summarization", IEEE International Conference on Acoustics, Speech, and Signal Processing, Proceedings. (ICASSP '03). 2003 , Volume: 3 , 6-10 April 2003
    [Ekin03D] A.Ekin and A. M. Tekalp, "Generic play-break event detection for summarization and hierarchical sports video analysis," Proc. IEEE ICME, Aug. 2003, Baltimore
    [Fleck96] M.Fleck, D.A. Forsyth, and C.Bregler, "Finding Naked People", Proc. European Conf. Computer Vision, vol.2, 1996
    [Flickner95] M.Flickner, H. Sawhney, W.Niblack, J.Ashley, Q.Huang, B.Dom, et al. “Query by Image and Video Content: The QBIC System. ” IEEE Computer, vol. 28, no. 9, 1995
    [Forsyth96] Forsyth D A, Malik J, Fleck M M, Leung T, Bregler C, Carson C, and Greenspan H. "Finding pictures of objects in large collections of images", In Proceedings, International Workshop on Object Recognition. IS&T/SPIE, April 1996
    [Frain04] Dirk Farin, Susanne Krabbe, Peter H.N. de With, Wolfgang Effelsberg, "Robust Camera Calibration for Sport Videos using Court Models," Storage and Retrieval Methods and Applications for Multimedia 2004
    [Fung99] Fung C Y and Loe K F., "Learning primitive and scene semantics of images for classification and retrieval", In Proceedings of the seventh ACM international conference (part 2) on Multimedia October 30 - November 5, 1999, Orlando, FL USA
    [Gevers96] T.Gevers and A.W.M. Smeulders. "A Comparative Study of Several Color Models for Color Image Invariant Retrieval.", In First International Workshop on Image Database and Multimedia Search, Amsterdam, Holland, 1996
    [Gong95] Y. Gong, T.S. Lim, and H.C. Chua, "Automatic Parsing of TV Soccer Programs", IEEE International Conference on Multimedia Computing and Systems, May, 1995, pp. 167-174
    [Gorkani94] Monika Gorkani and Rosalind W.Picard, "Texture orientation for sorting photos at a glance", In Proc. Int. Conf. Pat. Rec., vol. I, Jerusalem, Israel, Oct. 1994
    [Gudivada98] Gudivada V N. “ΘR-String:A Geometry-Based Representation for Efficient and Effective Retrieval of Images by Spatial Similarity” IEEE Transactions on Knowledge And Data Engineering,Vol.10, No.3 May/June 1998
    [Guo03] C.E Guo, S. C. Zhu, Ying Nian Wu, “Modeling Visual Patterns by Integrating Descriptive and Generative Models”, International Journal of Computer Vision, 53(1), 2003
    [Gupta97] A. Gupta and R.Jain, "Visual Information Retrieval", Comm. ACM, vol.40, no.5, pp. 70-79, May 1997
    [Haering00] Haering N., Qian R.J., Sezan M.I., "A semantic event-detection approach and its application to detecting hunts in wildlife video", Circuits and Systems for Video Technology, IEEE Transactions on , Volume: 10 , Issue: 6 , Sept. 2000
    [Hanjalic02] A. Hanjalic, "Shot-boundary detection: unraveled and resolved? ", IEEE Transactions on Circuits and Systems for Video Technology, Volume 12(2) , Feb. 2002
    [Hanjalic03] Alan Hanjalic, "Generic approach to highlights extraction from a sports video," IEEE International Conference on Image Processing, ICIP 2003
    [Hilton95] G. Hilton, M. Revon, P. Dayan, “Recognition handwritten digits using mixtures of linear models”, Tesauro, Touretzky, Leen eds. Advances in Neural Information Processing Systems, Cambridge, Massachusettes, The MIT Press, 1995
    [Hollink03] L.Hollink, A.Th.Schreiber, J.Wielemaker, B.Wielinga. "Semantic Annotation of Image Collections", In proceedings of the KCAP'03 Workshop on Knowledge Capture and Semantic Annotation, Florida, October 2003.
    [Hongeng00] S. Hongeng, F. Bremond, and R.Nevatia, "Bayesian Framework for Video Surveillance Application", in IEEE Proceedings of ICPR, Barccelona, Spain, 2000
    [Hsu96] Hsu C C, Chu W W, and Taira R K. "A Knowledge-Based Approach for Retrieving Images By Content", IEEE Trans.Knowledge and Data Eng.,vol.8, no.4, pp522-532, Aug.1996
    [Hu03] Bo Hu, S. Dasmahapatra, P. Lewis, N. Shadbolt, "Ontology-Based Medical Image Annotation with Description Logics", IEEE ICTAI’03, November 03-05, 2003.
    [Hunt66] E.B. Hunt, J. Marin, P.T. Stone, "Experiments in Induction", Academic Press, 1966
    [Hunter03] Jane Hunter, "Enhancing the Semantic Interoperability of Multimedia Through a Core Ontology", IEEE Tran. On Circuits and Systems for Video Technology, Vol.13, No.1, Jan. 2003
    [Hyv?nen03A] E. Hyv?nen, S. Saarela, K. Viljanen, "Ontology Based Image Retrieval", Proceedings of WWW 2003, Budapest, poster papers, 2003.
    [Hyv?nen03B] E. Hyvonen, S. Saarela, and K. Viljanen, "Intelligent Image Retrieval and Browsing Using Semantic Web Techniques- A Case Study", International SEPIA Conference at the Finnish Museum of Photography, Helsinki, September, 2003
    [Itten61] Itten J., "Art of Color (Kunst der Farbe)",Otto Maier Verlag, Ravensburg,Germany, 1961
    [Jaimes00] Alejandro Jaimes and Shih-Fu Chang, "A Conceptual Framework for Indexing Visual Information at Multiple Levels", Internet Imaging 2000, IS&T/SPIE. San Jose, CA, January 2000
    [Jaimes03] A. Jaimes, J.R. Smith, "Semi-Automatic, Data-Driven Construction of Multimedia Ontologies", IEEE ICME 2003
    [Jain95] R.Jain, S.N.J. Murthy, P.L.-J.Chen, and S.Chatterjee, "Similarity Measures for Image Databases", Proc. SPIE, vol.2420, Feb. 1995
    [Knag03] Yu-Lin Kang Joo-Hwee Lim, Qi Tian, Kankanhalli, M.S., "Soccer video event detection for visual keywords", Fourth International Conference on Fourth Pacific Rim Conference on Multimedia, Volume: 2, 15-18 Dec. 2003
    [Kato92] Kato T., "Database architecture for content-based image retrieval", In Proceedings of the SPIE, Image Storage and Retrieval Systems, San Jose, CA, February, 1992, volume 1662, pp.112--123.
    [Khatib99] AI-Khatib W, Day Y F, Ghafoor A, and Berra B., "Semantic Modeling and Knowledge Representation in Multimedia Databases", IEEE Transactions on Knowledge and Data Engineering, Vol.11, No.1 Jan/JFeb 1999
    [Kim98] Taeone Kim, Yongduek Seo, and Ki sang Hong, "Physics-based 3D position analysis of a soccer ball from monocular image sequences," in Proc. 6th int'l conference on Computer vision, Bombay, pages 721-726, 1998
    [Kobla99] V. Kobla, and D.S. Doermann, "Detection of slow-motion replays for identifying sports videos", In Proceedings of IEEE Third Workshop on Multimedia Signal Processing (MMSP), pp. 135-140, Sept., 1999
    [Kokaram01] Kokaram, A., Delacourt, P.,"A new global motion estimation algorithm and its application to retrieval in sports events", IEEE Fourth Workshop on Multimedia Signal Processing, 3-5 Oct. 2001
    [Leykin02] A. Leykin, F.Cutzu and R. Hammoud, "Visual Properties Differentiating Art from Real Scenes," Technical Report No. 565, Computer Science De-partment, Indiana University,2002
    [Li00] Jia Li, James Wang, Gio Wiederhold, "Classification of textured and non-textured images using region segmentation," Proc. IEEE International Conference on Image Processing (ICIP), Vancouver, BC, Canada, pp. 754-757, IEEE, September 2000
    [Li01] Li, B., Ibrahim Sezan, M., "Event detection and summarization in sports video", IEEE Workshop on Content-Based Access of Image and Video Libraries, , 14 Dec. 2001
    [Li03] Jia Li and James Z. Wang, "Automatic Linguistic Indexing of Pictures by a Statistical Modeling Approach", IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 25, no. 9, pp. 1075-1088, 2003
    [Li04] Jia Li, Wang, J.Z.,"Studying digital imagery of ancient paintings by mixtures of stochastic models," IEEE Transactions on Image Processing, Volume: 13, Issue: 3, March 2004, Pages: 340-353
    [Lienhart02] R. Lienhart, A. Hartmann, "Classifying images on the web automatically", Journal of Electronic Imaging 11(4), Oct. 2002
    [Liu03] Tianming Liu, Hong-Jiang Zhang, Feihu Qi, "A novel video key-frame-extraction algorithm based on perceived motion energy model", IEEE Transactions on Circuits and Systems for Video Technology, 13(10), 2003
    [Lu2004] Wenmiao Lu, Yap-Peng Tan,"A vision-based approach to early detection of drowning incidents in swimming pools",Circuits and Systems for Video Technology, IEEE Transactions on , Volume: 14 , Issue: 2 , Feb. 2004
    [Luo03] M Luo, Y.F. Ma, H.J. Zhang, “Pyramid wise Structuring for Soccer Highlight Extraction”, The 4th International Conference on Information, Communications & Signal Processing - 4th IEEE Pacific-Rim Conference On Multimedia (ICICS-PCM2003), Singapore, Dec.15-18, 2003
    [Ma97] M.Y. Ma and B. Manjunath, "Natra: A Toolbox for Navigating Large Image Databases", Proc. IEEE Int’l Conf. Image Processing, 1997
    [Masumitsu01] K. Masumitsu, T. Echigo, "Meta-Data Framework for Constructing Individualized Video Digest", IEEE ICIP 2001, Vol.2, pp 390-393
    [Mezaris03] V. Mezaris, I. Kompastsiaris, and M. G. Strintzis, "An Ontology Approach to Object-Based Image Retrieval", IEEE ICIP 2003.
    [Minka97] T.P.Minka and R.W.Picard, "Interactive Learning Using a Society of Models", Pattern Recognition, vol.30, no.3, 1997
    [MPEG01] International Organization for Standardization Organization International De Normalization ISO/IEC JTC1/SC29/WG11 Coding of Moving Pictures and Audio, ISO/IEC JTC1/SC29/WG11 N4031, Singapore, March 2001
    [Mukherjea99] S. Mukherjea, K.Hirata, and Y. Hara, "AMORE: A World Wide Web Image Retrieval Wngine", Proc. World Wide Web, vol.2, no.3, pp.115-132, 1999
    [Naphade03] M. Naphade, C.Y. Lin, A.Natsev, B.Tseng and J.Smith, "A Framework for Moderate Vocabulary Semantic Visual Concept Detection", IEEE ICME 2003
    [Needham01] Needham C J, Boyle, R. D. "Tracking multiple sports players through occlusion, congestion and scale," British Machine Vision Conference, vol. 1, pp. 93-102, 2001
    [Nepal01] Surya Nepal, Uma Srinivasan, Graham Reynolds,"Automatic detection of 'Goal' segments in basketball videos", Proceedings of the ninth ACM international conference on Multimedia, October 2001
    [Nitta02] Nitta, N., Babaguchi, N., Kitahashi, T., "Story based representation for broadcasted sports video and automatic story segmentation", IEEE International Conference on Multimedia and Expo, Volume: 1 , 26-29 Aug. 2002
    [Ogle95] Virginia E. Ogle, Michael Stonebraker, "Chabot: Retrieval from a Relational Database of Images", Computer, v.28 n.9, p.40-48, September 1995
    [Ohno00] Y. Ohno, J. Miura and Y. Shirai, "Tracking Players and Estimation of the 3D Position of a Ball in Soccer Games," in Proc. 15th International Conf. on Pattern Recognition, vol.1, 3-7 Sep. 2000, pp. 145-148
    [Ok02] OK H. W, Seo Y, Hong, K. S. “Multiple Soccer Players Tracking by Condensation with Occlusion Alarm Probability,” In Proc. Europ. Conf. Computer Vision, 2002
    [Okuma04] Okuma K, Taleghani A, Freitas N. "A Boosted Particle: Multitarget Detection and Tracking," In Proc. Europ. Conf. Computer Vision, pp. 28-39, 2004.
    [Orazio02] T. D'Orazio, N. Ancona, G. Cicirelli and M. Nitti, "A Ball Detection Algorithm for Real Soccer Image Sequences", Proceedings of 16th International Conference on Pattern Recognition, Quebec City, Canada, 11-15 August, 2002
    [Ouyang03]Jian-Quan Ouyang, Jin-Tao Li, Yong-Dong Zhang, "Replay boundary detection in MPEG compressed video", Machine Learning and Cybernetics, 2003 International Conference on , Volume: 5 , 2-5 Nov. 2003
    [Pass96] G.Pass, R.Zabih, and J.Miller, “Comparing images using color coherence vectors”, in Proceedings of Fourth ACM Conference on Multimedia, November 1996
    [Pan01] H. Pan, P. Van Beek and M.I.Sezan, "Detection of slowmotion replay segments in sports video for highlights generation", in Proc. IEEE International Conf. On Acoustics, Speech and Signal Processing, 2001
    [Pease02] A. Pease, I. Niles, and J. Li, "The Suggested Upper Merged Ontology: A Large On-tology for the Semantic Web and its Applica-tions", In Working Notes of the AAAI-2002 Work-shop on Ontologies and the Semantic Web, Edmon-ton, Canada, July 28-August 1, 2002
    [Pentland94] A.Pentland, R.W. Picard, and S.Sclaroff, “Photobook: Tools for Content-Based Manipulation of Image Databases, Proc. SPIE, vol.2185, Feb.1994
    [Petrakis97] E.G.M Petrakis and A.Faloutsos, "Similarity Searching in Medical Image Databases", IEEE Trans. Knowledge and Data Eng., vol.9, no.3, May/June 1997
    [Pietikainen02] M. Pietikainen, T. Maenpaa, Jaakko Viertola, "Color Texture Classification with Color Histogram and Local Binary Patterns", Proc. 2nd International Workshop on Texture Analysis and Synthesis, June 1, Copenhagen, Denmark, 2002
    [Prabhakar02] S. Prabhakar, Hui Cheng, John C. Handley, Zhigang Fan and Ying-wei Lin, "Picture-Graphics Color Image Classification", IEEE International Conference on Image Processing, 2002
    [Priese93] L. Priese and V. Rehrmann, "On Hierarchical Color Segmentation and Applications", In Computer Vision and Pattern Recognition, IEEE, Los Alamitos, 1993
    [Quinlan79] J.R. Quinlan, “Discovering rules from large collections of examples: A case study”, Expert Systems in the Micro Electronic Age, Edinburgh University Press, 1979
    [Quinlan93] J. R. Quinlan, "C4.5: Programs For Machine Learning", Morgan Kaufmann, Los Altos 1993
    [Rabbitti89] Rabbitti F and Stanchev P. "GRIM_DBMS: a graphical image database management system", In Visual Database System (Kunii, T, ed.) Elsevier, Amsterdan, 415,530,1989
    [Rea04] Niall Rea, Rozenn Dahyot, and Anil Kokaram, "Modeling High Level Structure in Sports with Motion Driven HMMs", IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2004
    [Reid98] I. Reid, and A. North, “3D Trajectories from a Single Viewpoint using Shadows,” in Proc. BMVC, 1998, pp. 863-872.
    [Reynolds95] D. A. Reynolds, “Speaker identification and verification using Gaussian mixture speaker models”, Speech Communication, 1995, Vol. 17, 91-108
    [Rodden03] K. Rodden and K. Wood, "How do People Manage Their Digital Photographs?", ACM Con-ference on Human Factors in Computing Systems, April 2003
    [Rubner97] Y. Rubner, L.J. Guibas, and C. Tomasi, "The Earth Mover’s Distance, Multi-Dimensional Scaling, and Color-Based Image Retrieval", Proc. DARPA Image Understanding Workshop, May 1997
    [Rui98] Y.Rui, T. S. Huang, and S. Mehrotra, "Exploring video structure beyond the shots”, Proc. IEEE Conf. on Multimedia Computing and Systems, 1998. 237~240
    [Rui00] Yong Rui, Anoop Gupta, and Alex Acero, "Automatically Extracting Highlights for TV Baseball Programs", ACM Multimedia 2000
    [Sato98] Sato, T., Kanade, T., Hughes, E.K., Smith, M.A.. "Video OCR for Digital News Archives", IEEE International Workshop on Content-Based Access of Image and video Databases, January,1998, pp.52 - 60
    [Schettini93] R Schettini, "A segmentation algorithm for color images", Pattern Recognition Letters, 14:499-506, 1993
    [Schreiber01] A.T. Schreiber, B. Dubbeldam, etc. "Ontology-based photo annotation", IEEE Intelligent Systems, May/June 2001
    [Schreiber02] A. Schreiber, I.Blok, etc., "A Mini-experiment in Semantic Annotation", The Semantic Web-ISWC, LNCS 2342, pages 404-408, Berlin, 2002.
    [Seo97] Y.Seo, S. Choi, H.Kim, and K.Hong, "Where are the Ball and Players? Soccer Game Analysis with Color Based Tracking and Image Mosaick", ICIAP, Florence, Italy, Sept.17-19, 1997
    [Serrano02] N. Serrano, A. Savakis, and J. Luo, "A Computationally Efficient Approach to Indoor/Outdoor Scene Classification" , ICPR'02, Québec City, Canada, Aug. 2002
    [Shahraray95] B. Shahraray, "Scene Change Detection and Content-Based Sampling of Video Sequences", IS\&T/SPIE’95 Digital Video Compression: Algorithm and Technologies, San Jose, February 1995, Vol.2419, pp2-13.
    [Shih03] Huang-Chia Shih, Chung-Lin Huang,"A semantic network modeling for understanding baseball video", IEEE International Conference on Acoustics, Speech, and Signal Processing, Volume: 5 , 6-10 April 2003
    [Shim98] Shim, J. C., Dorai, C., Bolle, R., "Automatic Text Extraction from Video for Content-based Annotation and Retrieval", In Proc. of 14th Int. Conf. on Pattern Recognition(ICPR), pp. 618-620, 1998.
    [Shim02] Seong-O Shim, Tae-Sun Choi, "Edge Color Histogram for Image Retrieval", IEEE ICIP 2002
    [Shook00] F. Shook, "Television Field Production and Reporting", 3rd Ed. Allyn&Bacon Pub., 2000
    [Smeulders00] Arnold W. M. Smeulders, Marcel Worring, Amarnath Gupta and Ramesh Jain, "Content-Based Image Retrieval at the End of the Early Years", IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 22, no. 12, 2000
    [Smith96] J.R.Smith and S.-F.Chang, “VisualSEEK: A Fully Automated Content-Based Image Query System”, Proc. ACM Multimedia, Nov. 1996
    [Smith98] John R Smith and Li C S. “Decoding Image Semantics Using Composite Region Templates” In IEEE Workshop on Content-based Access of Image and Video Libraries (CBAIVL-98), June’98
    [Smith99] J.R.Smith and C.S.Li, “Image Classification and Querying Using Composite Region Templates”, Int’l J. Computer and Image Understanding, vol.75, nos.1-2,pp.165-174, 1999
    [Snoek03] Snoek, C.G.M., Worring, M., "Time interval maximum entropy based event indexing in soccer video", International Conference on Multimedia and Expo, Volume: 3, 6-9 July 2003
    [Sobottka98] Sobottka, K., Bunke, H., Kronenberg, H., "Identification of Text on Colored Book and Journal Covers", Proceedings of the Fifth International Conference on Document Analysis and Recognition, 1998.
    [Song03] Yuqing Song, Wei Wang, Aidong Zhang, "Automatic Annotation and Retrieval of Images", World Wide Web 6(2): 209-231 (2003)
    [Soo02] Von-Wun Soo, Chen-Yu Lee, Jaw Jium Yeh, Ching-chih Chen, "Using sharable ontology to retrieve historical images", JCDL 2002: 197-198.
    [Soo03] Von-Wun Soo, Chen-Yu Lee, Chung-Cheng Li, Shu Lei Chen, Ching-chih Chen, "Automated Semantic Annotation and Retrieval Based on Sharable Ontology and Case-Based Learning Techniques", JCDL 2003
    [Starner95] T. Starner and A. Pentland, "Visual recognition of american sign language using hidden markov model," in Proc. Int. Workshop on Automatic Face and Gesture Recognition, 1995, pp. 189–194.
    [Stevens94] S. Stevens, M.Christel, and H.Wactlar, "Informedia: Improving Access to Digital Video", Interactions, vol.1, no. 4, 1994
    [Sun2003] H. Sun, J.H. Lim, Q. Tian, M. Kankanhalli, “ Semantic Labeling of Soccer Video”, The 4th International Conference on Information, Communications & Signal Processing - 4th IEEE Pacific-Rim Conference On Multimedia (ICICS-PCM2003), Singapore, Dec.15-18, 2003.
    [Sung02] Si-Hun Sung, Woo-Sung Chun, "Knowledge-based numeric open caption recognition for live sportscast", 16th International Conference on Pattern Recognition, Volume: 2, 11-15 Aug. 2002
    [Szummer98] M.Szummer and R.W.Picard,"Indoor-Outdoor Image Classification", IEEE International Workshop on Content-based Access of Image and Video Databases, in conjunction with ICCV’98. Bombay, India, 1998
    [Tomita00] Tomita, A., Echigo, T., Knrokawa, M., Miyamori, H., Iisaku, S.,"A visual tracking system for sports video annotation in unconstrained environments", International Conference on Image Processing, Volume: 3, 10-13 Sept. 2000
    [Tong04] X. Tong, H. Lu, and Q. Liu, "An Effective and Fast Soccer Ball Detection and Tracking Method," in Proc. ICPR’04, vol. 4, Aug. 23-26, 2004, Pages: 795-798.
    [Tzouvaras03] Tzouvaras V., Tsechpenakis G., Stamou G., Kollias, S., "Adaptive rule-based recognition of events in video sequences", International Conference on Image Processing, Volume: 2, 14-17 Sept. 2003
    [Tu05] Zhuowen Tu, Xiangrong Chen, Alan L. Yuille, and Song-Chun Zhu, “Image Parsing: Unifying Segmentation, Detection, and Recognition”, International Journal of Computer Vision, 2005
    [Utsumi02] Utsumi, O., Miura, K., Ide, I., Sakai, S., Tanaka, H.,"An object detection method for describing soccer games from video", ICME '02, Proceedings International Conference on Multimedia and Expo, Volume: 1 , 26-29 Aug. 2002
    [Vailaya99] A. Vailaya, M. Figueiredo, A. Jain, and H.-J. Zhang, "A Bayesian Framework for Semantic Classification of Outdoor Vacation Images", in Proc. SPIE: Storage and Retrieval for Image and Video Databases VII, vol. 3656, pp. 415-426, San Jose, CA, January, 1999
    [Vailaya01] A.Vailaya, M.A.T. Figuereido, A.K.Jain and H.J.Zhang, "Image Classification for Content-Based Indexing", IEEE Trans. Image Process., vol.10, January 2001
    [Vandenbroucke00] Vandenbroucke, N., Macaire, L., Postaire, J.-G.,"Color image segmentation by supervised pixel classification in a color texture feature space. Application to soccer image segmentation", 15th International Conference on Pattern Recognition, Volume: 3 , 3-7 Sept. 2000
    [Vermaak03] Vermaak J, Doucet A, Perez P. "Maintaining Multi-Modality through Mixture Tracking," In Proc. International Conference on Computer Vision, 2003.
    [Vapnik95] Vapnik, V. N. “The Nature of Statistical Learning Theory”, New York: Springer-Verlag.1995
    [Wang98] J.Z.Wang, G. Wiederhold, O.Firschein, and X.W. Sha, "Content-Based Image Indexing and Searching Using Daubechies'Wavelets", Int'l J. Digital Libraries, vol.1, 1998
    [Wang01] James Z. Wang, Jia Li, Gio Wiederhold, "SIMPLIcity: Semantics-sensitive Integrated Matching for Picture Libraries," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 23, no. 9, pp. 947-963, 2001
    [Wang02] Weiqiang Wang, Wen Gao, "Automatic Segmentation of News Items Based on Video and Audio Features" Journal of Computer Science & Technology Vol.17(2) pp189-195, 2002
    [Wang04] Lei Wang, Boyi Zeng, Steve Lin, Guangyou Xu, Heung-Yeung Shum, "Automatic Extraction of Semantic Colors in Sports Video", IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP, 2004
    [Wielinga01] Bob Wielinga, Guus Schreiber, J. Wielemaker, and J. A. C. Sandberg, "From thesaurus to ontology", Internation Conference on Knowledge Capture, Victoria, Canada, October 2001
    [Wu97] Wu, V., Manmatha, R., Riseman, E., "Automatic Text Detection and Recognition", In Proceedings of Image Understanding Workshop, pp. 707--712, 1997
    [Wu02] Chuan Wu, Yu-Fei Ma, Hong-Jiang Zhan, Yu-Zhuo Zhong, "Events recognition by semantic inference for sports video", IEEE International Conference on Multimedia and Expo, , Volume: 1 , 26-29 Aug. 2002
    [Xiang03] Tao Xiang, Shaogang Gong, "Discovering Bayesian causality among visual events in a complex outdoor scene", IEEE Conference on Advanced Video and Signal Based Surveillance, 21-22 July 2003
    [Xie04] L. Xie, P. Xu, S. -F. Chang, A. Divakaran and H. Sun, “Structure Analysis of Soccer Video with Domain Knowledge and Hidden Markov Models,” Pattern Recognition Letters, No. 7, May 2004, pp. 767-775
    [Xiong03] Ziyou Xiong, Radhakrishnan, R., Divakaran, A.; Huang, T.S.,"Audio events detection based highlights extraction from baseball, golf and soccer games in a unified framework", International Conference on Multimedia and Expo,, Volume: 3 , 6-9 July 2003
    [Xu03] Min Xu, Ling-Yu Duan, Changsheng Xu, Kankanhalli, M., Qi Tian, "Event detection in basketball video using multiple modalities",Joint Conference of the Fourth International Conference on the Fourth Pacific Rim Conference on Multimedia, Volume: 3 , 15-18 Dec. 2003
    [Xugu03] Gu Xu; Yu-Fei Ma; Hong-Jiang Zhang; Shiqiang Yang, "A HMM based semantic analysis framework for sports game event detection", International Conference on Image Processing, Volume: 1 , Sept. 14-17, 2003
    [Yamada02] A. Yamada, Y. Shirai, and J. Miura, "Tracking Players and a Ball in Video Image Sequence and Estimating Camera Parameters for 3D Interpretation of Soccer Games," 16th International Conf. on Pattern Recognition, vol.1 11-15 Aug. 2002, pp. 303-306
    [Yang01] Jun Yang, Liu Wenyin, Hongjiang Zhang, Yueting Zhuang, "Thesaurus- Aided Approach for Image Retrieval and Browsing", Proceedings of 2nd IEEE International Conference on Multimedia and Expo (ICME 2001), pp. 313-316. Tokyo, Japan, 2001.
    [Ye2003] Qixiang Ye, Wen Gao, Wei Zeng. “Color Image Segmentation Using Density-Based Clustering”, International Conference on Acoustic, Speech and Signal Processing. ICASSP2003
    [Yeo96] B.L. Yeo and B. Liu,"Visual Content Highlighting via Automatic Extraction of Embedded Captions on MPEG Compressed Video", in SPIE Digital Video Compression: Algorithms and Technologies, Vol. 2668, Feb. 1996,pp. 38-47
    [Yeung97] M. M. Yeung and B. L. Yeo, "Video Visualization for Compact Presentation and Fast Browsing of Pictorial Content", in IEEE Transactions on Circuits and Systems for Video Technology, Vol. 7, No. 5, pp. 771-785, Oct. 1997
    [Yiu96] Elaine C. Yiu, Elaine C.Yiu, "Image Classification Using Color Cues and Texture Orientation", Master’s Thesis, MIT, dept EECS, 1996
    [Yow95] D. Yow, B. L. Yeo, M. Yeung, and B. Liu, "Analysis and presentation of soccer highlights from digital video," in ACCV95, 1995.
    [Yu03A] Xinguo Yu, Qi Tian, Kong Wah Wan,"A novel ball detection framework for real soccer video", ICME '03, Proceedings International Conference on Multimedia and Expo, Volume: 2 , 6-9 July 2003
    [Yu03B] X. Yu, C. Xu, Q. Tian, and H. W. Leong, “A Ball Tracking Framework for Broadcast Soccer Video,” in Proc. ICME03’, vol.2, 6-9 July 2003, pp. 273-276.
    [Zabih99] R. Zabih, J. Miller, and K. Mai, "A Feature-Based Algorithm for Detecting and Classifying Production effects", Multimedia System 7:119-128, 1999
    [Zhang95] H.J. Zhang, S.Y. Tan, S.W. Smoliar and Y. Gong, "Automatic Parsing and Indexing of News Video", Multimedia Systems, 2: 256-266, 1995
    [Zhang02] Dongqing Zhang, Shih-Fu Chang, "Event detection in baseball video using superimposed caption recognition", Proceedings of the tenth ACM international conference on Multimedia, December 2002
    [Zhao00] Zhao L., Qi W., Li S. Z., Yang S. Q., and Zhang H. J., "Key-frame extraction and shot retrieval using Nearest Feature Line (NFL)", International Workshop on Multimedia Information Retrieval, in conjunction with ACM Multimedia Conference 2000, Los Angeles, USA, 2000.
    [Zheng04] Qingfang Zheng, Wei Zeng, Wen Gao, Wei-Qiang Wang, "Shape-based Adult Images Detection", Third International Conference on Image and Graphics, Hong Kong, China, Dec.18-20, 2004, pp150-153
    [Zhong99] Zhong, Y., Zhang, H.J., Jain, A. K., "Automatic Caption Localization in Compressed Video", IEEE Int. Conf. on Image Processing, 1999.
    [Zhu03] Song-Chun Zhu, “Statistical modeling and conceptualization of visual patterns”, IEEE Trans. on PAMI, vol.25, no.6, 2003
    [Castleman98] Kenneth. R. Castleman, 数字图像处理, 电子工业出版社, 北京, 1998.
    [Sonka02] Milan Sonka, Vaclav Hlavac, Roger Boyle, "Image Processing, Analysis, and Machine Vision(图像处理、分析与机器视觉英文版)",人民邮电出版社,汤姆森学习出版集团,2002
    [邦 02] 廷斯.邦斯博,比厄.佩特森,足球比赛体系与战术打法,人民体育出版社,2002
    [崔 02] 崔唯,谭能活,色彩构成,中国纺织出版社,2002
    [邸 00] 邸凯昌,李德仁,李德毅.基于空间数据发掘的遥感图像分类方法研究[J].武汉测绘科技大学学报,2000,25(1):42～48
    [段 03] 段丽娟,基于内容的图像检索与过滤关键技术研究,[博士学位论文],中科院计算技术研究所,北京,2003
    [黄 01] 黄铁军,计算、通信、消费电子、内容、社群的融合技术, [博士后研究工作报告],中国科学院计算技术研究所 2001
    [李 95] 李德毅,孟海军,史雪梅, 隶属云和隶属云发生器, 计算机研究与发展, 1995, 32(6):16~21
    [马 99] 马继涌,高文,一种基于最小误分率估计高斯混合模型参数的方法,1999 年 8月,第 22 卷,第 8 期
    [阮 01] 阮球琦编著,数字图像处理学,电子工业出版社,2001
    [史 00] 史忠植, 智能主体及其应用, 科学出版社,2000
    [史 02] 史忠植,知识发现,清华大学出版社,2002
    [王 01] 王伟强,高文, 一种压缩域上的快速标题文字探测算法及其应用, 计算机学报, 2001,第 24 卷第六期,pp.620-626.
    [向 03] 向日华,王润生,一种基于高斯混合模型的距离图像分割算法,软件学报,Vol.14, No. 7, 2003
    [徐 99] 徐旭,基于视觉特征的图像检索系统研究,[博士学位论文],浙江大学,杭州,1999
    [章 98] 章艈晋,图像工程(上)--图像处理合分析,清华大学出版社,1998
    [章 01] 章艈晋,图像分割,科学出版社,2001

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700