基于队员行为信息的体育视频内容分析方法研究

英文题名：Research on Sports Video Content Analysis Using Player Behavior Information
作者：朱光宇
论文级别：博士
学科专业名称：计算机应用技术
中文关键词：对象跟踪 ; 动作识别 ; 精彩排序 ; 战术分析 ; 体育视频
英文关键词：object tracking ; action recognition ; highlight ranking ; tactic analysis ; sports video
学位年度：2009
导师：高文
学科代码：081203
学位授予单位：哈尔滨工业大学
论文提交日期：2009-04-01

摘要

随着计算机技术、网络技术和多媒体技术的迅速发展,多媒体数据正在呈指数级增长。视频作为多媒体数据的重要组成部分,其结构复杂,数据量庞大。体育视频由于拥有广泛的受众群体及巨大的市场潜力,以体育视频内容分析为主要目标的研究已成为视频分析领域内的一个热点。
     本文重点研究面向广播体育视频的内容分析技术。针对目前体育视频分析研究中存在的低层视频特征无法准确反映人类高层语义概念的问题,提出了以队员行为(轨迹、动作)分析为基础,结合音频分析的多模态融合体育视频语义分析与战术分析方法。重点讨论了广播体育视频中队员的轨迹跟踪与动作识别,基于队员轨迹与动作信息并采用多模态融合与领域知识构建视频内容的语义/战术中层表达,基于中层表达对广播体育视频进行语义内容分析与战术内容分析等几个关键技术问题。具体的研究内容如下:
     提出了基于支持向量机与粒子滤波的广播体育视频中队员检测与跟踪方法。首先,将支持向量分类与球场分割方法相结合,提出了一种针对体育视频中队员的自动检测算法,用来初始化后续视觉对象的跟踪。其次,将支持向量回归与序列蒙特卡罗框架相结合,提出了一种应用于视觉对象跟踪的改进粒子滤波算法,使得传统粒子滤波方法在小规模粒子集情况下能够实现对视觉对象的鲁棒跟踪,并有效提高跟踪系统的运行效率。
     提出了基于支持向量机与光流分析的广播体育视频中队员动作识别方法。针对广播体育视频图像质量差、摄像机非静止、队员图像分辨率低的问题,从运动分析角度出发,基于被跟踪队员区域光流场的空间分布性质,采用局部分析思想的栅格划分方法提取动作识别的描述特征。此种特征提取方法有别于传统的光流分析思想,将被跟踪区域内的光流矢量场看成是一种运动模式的空间分布信息,从而提高光流特征的鲁棒性。采用支持向量机作为模式分类器并结合时序投票策略,识别队员动作的类型。与现有基于表观特征的识别方法相比较,提出的运动描述特征及以此为基础的识别算法取得了更好的识别结果。
     提出了基于队员行为信息与体育比赛特定音频关键字多模态融合的体育视频摘要精彩排序方法。首先将球拍类体育比赛视频中队员的轨迹、动作信息结合音频关键字进行多模态融合,构建视频内容的“轨迹-动作-音频”中层表达。基于“轨迹-动作-音频”表达提取可计算的情感特征,用以描述用户对体育视频摘要片断进行精彩度排序的主观情感过程。考虑到目前人类情感思维的生理、心理学研究情况,提出了基于核统计学习的非线性精彩排序模型构建方法。此种构建方法不仅能够增强模型对噪声数据的鲁棒性,同时可以扩展模型的有效性与通用性。此外,还提出了精彩排序的客观评价标准,用于评价自动评估结果与主观感知事实的匹配程度。利用此评价标准,一方面可以评估精彩排序模型构建的有效性;另一方面结合前向搜索算法,从而指导情感特征的提取及有效特征的选择。
     提出了基于队员轨迹信息的广播体育视频战术分析方法。体育视频战术内容分析的目的在于发现体育比赛事件中队员个人或队员之间在完成一次比赛动作(或任务)过程中所使用的战术模式或比赛策略。基于比赛事件中队员和球的多对象轨迹信息,首先提出了一种基于时间片断分割的局部时间/空间交互关系分析算法,根据各时间片断中轨迹间的形状与距离度量及各片断之间轨迹的速度与距离度量,利用图模型方法构建对体育比赛中事件视频的战术表达,即交互轨迹。通过对交互轨迹中各组成片断的分析,对足球比赛视频中进攻事件的战术模式进行由粗至细的层次化识别:在粗识别过程中,将交互模式分为协同进攻与个人进攻;在进一步的精细识别中,将协同进攻模式细分为有拦截进攻与无拦截进攻,将个人进攻模式细分为直接进攻与带球进攻。
With the rapid development of the technologies of computer, network and multi-media, there is an explosive growth in the amount of available multimedia information.Video is one of the most important components of the multimedia data, which has hugequantity and complex structure. As an important genre of video document, sports videohas attracted increasing attention in automatic video analysis due to its wide viewershipand tremendous commercial potential.
     The research of this dissertation focuses on the problem of broadcast sports videocontent analysis. To solve the problems of low-level video features cannot represent hu-man high-level semantic concepts, this dissertation proposes a novel approach for thesports video analysis based on the player behavior (trajectory and action) and the inte-gration of audio analysis in terms of semantics and tactics. Some important technologiesand solutions are studied which especially concentrate on the player trajectory trackingand action recognition, semantic/tactic mid-level representation construction using playerbehavior information and multimodal fusion with domain knowledge and semantic/tacticcontent analysis of broadcast sports video based on mid-level representation. The detaileddescription of the research content in the dissertation is as follows:
     A new player detection and tracking approach in broadcast sports video using sup-port vector machine and particle filter is proposed. Support vector classification combinedwith playfield segmentation is employed to automatically detect the players in sportsvideo. Then, an improved particle filter called support vector regression particle filteris proposed as the player tracker by integrating support vector regression into sequentialMonte Carlo framework. The improved particle filter not only enhances the performanceof classical particle filter with small sample set but also improves the efficiency of trackingsystem.
     A novel player action recognition approach in broadcast sports video based on sup-port vector machine and optical ?ow analysis is proposed. Different from the existingappearance-based methods, our approach is based on the motion analysis and extract mo-tion descriptor in terms of spatial distribution and grid partition of the optical ?ow fieldwithin the player figure region. In the proposed approach, the optical ?ow is treated as the spatial patterns of the noisy measurements instead of the precise pixel displacementsto enhance the robustness of motion descriptor. Support vector machine and temporalvoting strategy are employed to recognize the type of player action in the video clip. Theproposed motion descriptor and the action recognition approach significantly outperformsthe existing appearance-based method.
     A novel multimodal approach of highlight ranking for sports video summaries in af-fective context is proposed based on player behavior information and audio keywords ofsports game. The mid-level representation“trajectory-action-audio”is constructed for thevideo content by fusing the information of player trajectory, action and audio keywords.Based on“trajectory-action-audio”, the computational affective features are extracted todescribe the objective process of highlight ranking of sports video summaries from usersubjective perception. A kernel based nonlinear probabilistic ranking model constructionmethod is proposed, which is robust for the noisy data and provided with good expan-sibility. In addition, a new subjective evaluation criterion is proposed to guide modelconstruction and feature extraction with the assistance of forward search algorithm.
     A new tactic analysis of broadcast sports video is proposed based on player trajec-tory information. Tactic analysis of sports video aims to recognize and discover tacticpatterns and match strategies that teams and individual players used in the games. Basedon players and ball trajectories, an algorithm of local temporal-spatial interaction analysisis firstly proposed. Using the multi-object trajectories, a weighted graph is constructedvia the analysis of temporal-spatial interaction among the players and the ball based onthe metrics of distance and shape in temporal interval and velocity and linking distanceamong intervals. The aggregate trajectory which is a new tactic representation of sportsvideo is computed based on the weighted graph. The interactive relationship of aggregatetrajectory with the hypothesis testing for trajectory temporal-spatial distribution are em-ployed to discover the tactic patterns in a hierarchical coarse-to-fine framework for theattach events of soccer game video.

引文

1 http://www.nielsen-netratings.com/news.jsp?section=dat gi. [J].
    2 http://www.altavista.com/about/default/. [J].
    3 http://www.youtube.com/t/about/. [J].
    4 http://video.google.com/video about.html. [J].
    5刘扬.足球视频场景分析与三维重建的关键技术研究[D]哈尔滨工业大学,2007:1–2.
    6 G. Pingali, Y. Jean, A. Opalach, et al. Lucentvision: Converting Real World EventsInto Multimedia Experiences[C]//IEEE International Conference on Multimedia &Expo. New York: IEEE, 2000, 3:1433–1436.
    7 A. Ekin, A. Tekalp, R. Mehrotra. Automatic Soccer Video Analysis and Summa-rization[J]. IEEE Trans. on Imapge Processing, 2003, 12(7):796–807.
    8 C. Cheng, C. Hsu. Fusion of Audio and Motion Information on Hmm-based High-light Extraction[J]. IEEE Trans. on Multimedia, 2006, 8(3):585–599.
    9 H. Xu, T. Chua. Fusion of Av Features and External Information Sources for EventDetection in Sports Video[J]. ACM Trans. on Multimedia Computing, Communi-cations, and Applications, 2006, 2(1):44–67.
    10 C. Xu, J. Wang, K. Wan, et al. Live Sports Event Detection Based on BroadcastVideo and Web-casting Text[C]//ACM International Conference on Multimedia.Santa Barbara: ACM, 2006:221–230.
    11 N. Babaguchi, Y. Kawai, T. Ogura, et al. Personalized Abstraction of BroadcastedAmerican Football Video by Highlight Selection[J]. IEEE Trans. on Multimedia,2004, 6(4):575–586.
    12 D. Liang, Q. Huang, Y. Liu, et al. Video2cartoon: A System for Converting Broad-cast Soccer Video Into 3d Cartoon Animation[J]. IEEE Trans. on Consumer Elec-tronics, 2007, 53(3):1138–1146.
    13 C. Xu, K. Wan, S. Bui, et al. Implanting Virtual Advertisement Into BroadcastSoccer Video[C]//IEEE Pacific-Rim Conference on Multimedia. Tokyo: Springer,2004:264–271.
    14 H. Zhang, A. Kankanhalli, S. Smoliar. Automatic Partitioning of Full-motionVideo[J]. Multimedia System, 1993, 1(1):10–28.
    15 H. Zhang, S. Smoliar. Developing Power Tools for Video Indexing and Re-trieval[C]//SPIE Conference on Storage and Retrieval for Image and VideoDatabases. San Jose: SPIE, 1994:140–149.
    16 H. Zhang, Y. Gong, S. Smoliar, et al. Automatic Parsing of News Video[C]//IEEEInternational Conference on Multimedia Computing and Systems. Boston: IEEE,1994:45–54.
    17 Y. Rui, A. Gupta, A. Acero. Automatically Extracting Highlights for Tv BaseballPrograms[C]//ACM International Conference on Multimedia. Los Angeles: ACM,2002:105–115.
    18 P. Chang, M. Han, Y. Gong. Extract Highlights from Baseball Game Video withHidden Markov Models[C]//IEEE International Conference on Image Processing.New York: IEEE, 2002, 1:609–612.
    19 J. Assfalg, M. Bertini, C. Colombo, et al. Semantic Annotation of Soccer Video:Automatic Highlights Identification[J]. Computer Vision and Image Understand-ing, 2003, 92(2-3):285–305.
    20 Y. Gong, T. Lim, H. Chua, et al. Automatic Parsing of Tv Soccer Pro-grams[C]//IEEE International Conference on Multimedia Computing and Systems.Washington, DC: IEEE, 1995:167–174.
    21 D. Zhong, S. Chang. Structure Analysis of Sports Video Using Domain Mod-els[C]//IEEE International Conference on Multimedia & Expo. Tokyo: IEEE,2001:713–716.
    22 W. Zhou, A. Vellaikal, J. Kuo. Rule-based Video Classification System for Basket-ball Video Indexing[C]//ACM International Conference on Multimedia. Los Ange-les: ACM, 2000:213–216.
    23 M. Petrovic, V. Mihajlovic, W. Jonker, et al. Multi-modal Extraction of Highlightsfrom Tv Formula 1 Programs[C]//IEEE International Conference on Multimedia &Expo. Lausanne: IEEE, 2002, 1:817–820.
    24 H. Denman, N. Rea, A. Kokaram. Content-based Analysis for Video from SnookerBroadcasts[J]. Computer Vision and Image Understanding, 2003, 92(2-3):176–195.
    25 M. Swain, D. Ballard. Color Indexing[J]. International Journal of Computer Vision,1991, 7(1):11–32.
    26 H. Tamura, S. Mori, T. Yamawaki. Textures Features Corresponding to Visual Per-ception[J]. IEEE Trans. on System Man and Cybernetics, 1978, 8(6):460–473.
    27 J. Canny. A Computational Approach to Edge Detection[J]. IEEE Trans. on PatternAnalysis and Machine Intelligence, 1986, 8(6):679–698.
    28庄越挺,潘云鹤,吴飞.网上多媒体信息分析与检索[M].清华大学出版社,2002:128–132.
    29 L. Wang, B. Zeng, S. Lin, et al. Automatic Extraction of Semantic Colors in SportsVideo[C]//IEEE International Conference on Acoustics, Speech, and Signal Pro-cessing. Montreal: IEEE, 2004, 3:617–620.
    30 P. Xu, L. Xie, S. Chang, et al. Algorithms and System for Segmentation and Struc-ture Analysis in Soccer Video[C]//IEEE International Conference on Multimedia& Expo. Tokyo: IEEE, 2001:928–931.
    31 L. Duan, M. Xu, T. Chua, et al. A Mid-level Representation Framework for Se-mantic Sports Video Analysis[C]//ACM International Conference on Multimedia.Berkeley: ACM, 2003:33–44.
    32 H. Pan, B. Li, M. Sezan. Automattion Detection of Replay Segments in Broad-cast Sports Programs by Detection of Logos in Scene Transitions[C]//IEEE Inter-national Conference on Acoustics, Speech, and Signal Processing. Orlando: IEEE,2002, 3:385–388.
    33 L. Duan, M. Xu, Q. Tian, et al. Mean Shift Based Video Segment Representa-tion and Applications to Replay Detection[C]//IEEE International Conference onAcoustics, Speech, and Signal Processing. Montreal: IEEE, 2004, 5:709–712.
    34 Z. Zhao, S. Jiang, Q. Huang, et al. Highlight Summarization in Sports Video Basedon Replay Detection[C]//IEEE International Conference on Multimedia & Expo.Toronto: IEEE, 2006:1613–1616.
    35 H. Pan, P. Beek, M. Sezan. Detection of Slow-motion Replay Segments in SportsVideo for Highlights Generation[C]//IEEE International Conference on Acoustics,Speech, and Signal Processing. Salt Lake City: IEEE, 2001, 1:649–652.
    36 J. Wang, E. Chng, C. Xu. Soccer Replay Detection Using Scene TransitionStructure Analysis[C]//IEEE IntrInternationalenational Conference on Acoustics,Speech, and Signal Processing. Philadelphia: IEEE, 2005, 2:433–436.
    37 V. Kobla, D. DeMenthon, D. Doermann. Identification of Sports Videos UsingReplay, Text, and Camera Motion Features[C]//SPIE Conference on Storage andRetrieval for Image and Video Databases. San Jose: SPIE, 2000:332–343.
    38 X. Tong, H. Lu, Q. Liu, et al. Replay Detection in Broadcasting SportsVideos[C]//IEEE International Conference on Image and Graphics. Hong Kong:IEEE, 2004:337–340.
    39 P. Wang, R. Cai, S. Yang. A Tennis Video Indexing Approach Through PatternDiscovery in Interactive Process[C]//IEEE Pacific-Rim Conference on Multimedia.Tokyo: Springer, 2004:49–56.
    40 J. Agbinya, D. Dees. Multi-object Tracking in Video[J]. Real-Time Image, 1999,5:295–304.
    41 C. Needham, R. Boyle. Tracking Multiple Sports Players Through Occlusion, Con-gestion and Scale[C]//British Machine Vision Conference. Manchester: BMVA,2001, 1:93–102.
    42 O. Utsumi, K. Miura, I. Ide, et al. An Object Detection Method for DescribingSoccer Games from Video[C]//IEEE International Conference on Multimedia &Expo. Lausanne: IEEE, 2002, 1:45–48.
    43 P. Figueroa, N. Leite, R. Barros, et al. Tracking Soccer Players Using the GraphRepresentation[C]//IEEE International Conference on Pattern Recognition. Cam-bridge: IEEE, 2004, 4:787–790.
    44 S. Iwase, H. Saito. Parallel Tracking of all Soccer Players by Integrating DetectedPositions in Multiple View Images[C]//IEEE International Conference on PatternRecognition. Cambridge: IEEE, 2004, 4:751–754.
    45 D. Yow, B. Yeo, M. Yeung, et al. Analysis and Presentation of Soccer High-lights from Digital Video[C]//Asian Conference on Computer Vision. Singapore:Springer, 1995:499–503.
    46 S. Choi, Y. Seo, H. Kim, et al. Where Are the Ball and Players? Soccer Game Anal-ysis with Color-based Tracking and Image Mosaic[C]//IEEE International Confer-ence on Image Analysis and Processing. Florence: IEEE, 1997:196–203.
    47 T. Orazio, C. Guaragnella, M. Leo, et al. A New Algorithm for Ball RecognitionUsing Circle Hough Transform and Neural Classifier[J]. Pattern Recognition, 2004,37(3):393–408.
    48 X. Yu, Q. Tian, K. Wan. A Novel Ball Detection Framework for Real Soc-cer Video[C]//IEEE International Conference on Multimedia & Expo. Baltimore:IEEE, 2003, 2:265–268.
    49 X. Yu, C. Xu, Q. Tian, et al. A Ball Tracking Framework for Broadcast Soc-cer Video[C]//IEEE International Conference on Multimedia & Expo. Baltimore:IEEE, 2003, 2:273–276.
    50 D. Liang, Y. Liu, Q. Huang, et al. A Scheme for Ball Detection and Trackingin Broadcast Soccer Video[C]//IEEE Pacific-Rim Conference on Multimedia. JejuIsland: Springer, 2005:864–875.
    51 M. Shah, R. Jain. Motion-based Recognition[M]. Kluwer Academic Publishers,1997.
    52 D. Gavrila. The Visual Analysis of Human Movement: A Survey[J]. ComputerVision and Image Understanding, 1999, 73(1):82–98.
    53 R. Cutler, L. Davis. Robust Real-time Periodic Motion Detection, Analysis andApplication[J]. IEEE Trans. on Pattern Analysis and Machine Intelligence, 2000,22(8):781–796.
    54 H. Miyamori, S. Iisaku. Video Annotation for Content-based Retrieval Using Hu-man Behavior Analysis and Domain Knowledge[C]//IEEE International Confer-ence on Automatic Face and Gesture Recognition. Grenoble: IEEE, 2000:320–325.
    55 H. Miyamori. Improving Accuracy in Behavior Identification for Content-basedRetrieval by Using Audio and Video Information[C]//IEEE International Confer-ence on Pattern Recognition. Quebec: IEEE, 2002, 2:826–830.
    56 A. Efros, A. Berg, G. Mori, et al. Recognizing Action at a Distance[C]//IEEEInternational Conference on Computer Vision. Nice: IEEE, 2003, 2:726–733.
    57 M. Roh, B. Christmas, J. Kittler, et al. Robust Player Gesture Spotting and Recog-nition in Low-resolution Sports Video[C]//European Conference on Computer Vi-sion. Graz: Springer, 2006:347–358.
    58 W. Lu, J. Little. Tracking and Recognizing Actions at a Distance[C]//EuropeanConference on Computer Vision Workshop on Computer Vision Based Analysis inSport Environments. Graz: Springer, 2006:49–60.
    59 Y. Kang, J. Lim, Q. Tian, et al. Soccer Video Event Detection with VisualKeywords[C]//IEEE Pacific-Rim Conference on Multimedia. Singapore: Springer,2003, 3:1796–1800.
    60 J. Hayet, J. Piater, J. Verly. Fast 2d Model-to-image Registration Using Vanish-ing Points for Sports Video Analysis[C]//IEEE International Conference on ImageProcessing. Genova: IEEE, 2005, 3:417–420.
    61 X. Tong, Q. Liu, L. Duan, et al. A Unified Framework for Semantic Shot Represen-tation of Sports Video[C]//ACM Workshop on Multimedia Information Retrieval.Singapore: ACM, 2005:127–134.
    62 E. Sahouria, A. Zakhor. Content Analysis of Video Using Principal Components[J].IEEE Trans. on Circuits and Systems for Video Technology, 1999, 9(8):1290–1298.
    63 A. Hanjalic. Adaptive Extraction of Highlights from a Sport Video Based on Ex-citement Modeling[J]. IEEE Trans. on Multimedia, 2005, 7(6):1114–1122.
    64 J. Huang, Z. Liu, Y. Wang, et al. Integration of Multimodal Features for VideoScene Classification Based on Hmm[C]//IEEE Workshop on Multimedia SignalProcessing. Copenhagen: IEEE, 1999:53–58.
    65 Y. Ma, H. Zhang. Motion Pattern Based Video Classification Using Support VectorMachines[C]//IEEE International Symposium on Circuits and Systems. Scottsdale:IEEE, 2002, 2:69–72.
    66 Z. Xiong, R. Radhakrishnan, A. Divakaran. Generation of Sports Highlights UsingMotion Activity in Combination with a Common Audio Feature Extraction Frame-work[C]//IEEE International Conference on Image Processing. Barcelona: IEEE,2003, 1:5–8.
    67 K. Peker, A. Divakaran. Framework for Measurement of the Intensity of MotionActivity of Video Segments[C]//SPIE Conference on Internet Multimedia Manage-ment System III. San Jose: SPIE, 2002, 4862:126–137.
    68 K. Wan, J. Lim, C. Xu, et al. Real-time Camera Field-view Tracking in SoccerVideo[C]//IEEE International Conference on Acoustics, Speech, and Signal Pro-cessing. Hong Kong: IEEE, 2003, 3:6–10.
    69 R. Leonardi, P. Migliorati, M. Prandini. Semantic Indexing of Soccer Audio-visualSequences: A Multimodal Approach Based on Controlled Markov Chains[J]. IEEETrans. on Circuits and Systems for Video Technology, 2004, 14(5):34–43.
    70 K. Wan, C. Xu. Efficient Multimodal Features for Automatic Soccer HighlightGeneration[C]//IEEE International Conference on Patern Recognition. Cambridge:IEEE, 2004, 3:973–976.
    71 K. Wan, C. Xu. Robust Soccer Highlight Generation with a Novel DominantSpeech Feature Extractor[C]//IEEE International Conference on Multimedia &Expo. Taipei: IEEE, 2004, 1:591–594.
    72 Z. Xiong, R. Radhakrishnan, A. Divakaran, et al. Comparing Mfcc and Mpeg-7Audio Features for Feature Extraction, Maximum Likelihood Hmm and EntropicPrior Hmm for Sports Audio Classification[C]//IEEE International Conference onAcoustics, Speech, and Signal Processing. Hong Kong: IEEE, 2003, 5:628–631.
    73 M. Xu, N. Maddage, C. Xu, et al. Creating Audio Keywords for Event Detectionin Soccer Video[C]//IEEE International Conference on Multimedia & Expo. HongKong: IEEE, 2003, 2:281–284.
    74 M. Xu, L. Duan, C. Xu, et al. A Fusion Scheme of Visual and Auditory Modalitiesfor Event Detection in Sports Video[C]//IEEE International Conference on Acous-tics, Speech, and Signal Processing. Hong Kong: IEEE, 2003, 3:189–192.
    75 W. Greiff, A. Morgan, R. Fish, et al. Fine-grained Hidden Markov Modeling forBroadcast News Story Segmentation[C]//ACM International Conference on Multi-media. Ottawa: ACM, 2001:1–5.
    76 S. Takao, T. Haru, Y. Ariki. Summarization of News Speech with Unknown TopicBoundary[C]//IEEE International Conference on Multimedia & Expo. Tokyo:IEEE, 2001:615–618.
    77 http://en.wikipedia.org/wiki/closed captioning/. [J].
    78 N. Babaguchi, Y. Kawai, T. Kitahashi. Event Based Indexing of Broadcasted SportsVideo by Intermodal Collaboration[J]. IEEE Trans. on Multimedia, 2002, 4(1):68–75.
    79 N. Babaguchi, N. Nitta. Intermodal Collaboration: A Strategy for Semantic Con-tent Analysis for Broadcasted Sports Video[C]//IEEE International Conference onImage Processing. Barcelona: IEEE, 2003, 1:13–16.
    80 N. Nitta, N. Babaguchi. Automatic Story Segmentation of Closed Caption Text forSemantic Content Analysis of Broadcasted Sports Video[C]//International Work-shop on Multimedia Information Systems. Tempe, 2002:110–116.
    81 Y. Ariki, T. Shigemori, T. Kaneko, et al. Live Speech Recognition in SportsGames by Adaptation of Acoustic Model and Language Model[C]//EURO Speech.Geneva, 2003:1453–1456.
    82 V. Mihajlovic, M. Petrovic. Automatic Annotation of Formula 1 Races for Content-based Video Retrieval[R]. Tech. Rep. TR-CTIT-01-41, University of Twente, 2001.
    83 D. Chen, K. Shearer, H. Bourlard. Video Ocr for Sport Video Annotation andRetrieval[C]//IEEE International Conference on Mechatronics and Machine Visionin Practic. Hong Kong: IEEE, 2001, 28:57–62.
    84 D. Zhang, S. Chang. Event Detection in Baseball Video Using Superimposed Cap-tion Recognition[C]//ACM International Conference on Multimedia. Los Angeles:ACM, 2002:315–318.
    85 Y. Li, C. Xu, K. Wan, et al. Reliable Video Clock Time Recognition[C]//IEEEInternational Conference on Pattern Recognition. Hong Kong: IEEE, 2006, 4:128–131.
    86 H. Shih, C. Huang. A Robust Superimposed Caption Box Content Understandingfor Sports Videos[C]//IEEE International Symposium on Multimedia. San Diego:IEEE, 2006:867–872.
    87 B. Li, J. Errico, H. Pan, et al. Bridging the Semantic Gap in Sports Video Retrievaland Summarization[J]. Journal of Visual Communication and Image Representa-tion, 2004, 15(3):393–424.
    88 B. Li, I. Sezan. Semantic Sports Video Analysis: Approaches and New Applica-tions[C]//IEEE International Conference on Image Processing. Barcelona: IEEE,2003, 1:17–20.
    89 N. Rea, R. Dahyot, A. Kokaram. Modeling High Level Structure in Sports withMotion Driven Hmms[C]//IEEE International Conference on Acoustics, Speech,and Signal Processing. Montreal: IEEE, 2004, 3:621–624.
    90 J. Assfalg, M. Bertini, A. Bimbo, et al. Soccer Highlights Detection and Recog-nition Using Hmms[C]//IEEE International Conference on Multimedia & Expo.Lausanne: IEEE, 2002, 1:825–828.
    91 G. Xu, Y. Ma, H. Zhang, et al. Motion Based Event Recognition UsingHmm[C]//IEEE International Conference on Pattern Recognition. Quebec: IEEE,2002, 2:831–834.
    92 L. Xie, s. Chang, A. Divakaran, et al. Unsupervised Discovery of Multilevel Statis-tical Video Structures Using Hierarchical Hidden Markov Models[C]//IEEE Inter-national Conference on Multimedia & Expo. Hong Kong: IEEE, 2003, 3:29–32.
    93 L. Xie, S. Chang, A. Divakaran, et al. Feature Selection for Unsupervised Discoveryof Statistical Temporal Structures in Video[C]//IEEE International Conference onImage Processing. Barcelona: IEEE, 2003, 1:29–32.
    94 V. Vapnik. The Nature of Statistical Learning Theory[M]. Springer, 2000.
    95 M. Han, W. Hua, W. Xu, et al. An Integrated Baseball Digest System Using Maxi-mum Entropy Method[C]//ACM International Conference on Multimedia. Los An-geles: ACM, 2002:347–350.
    96 H. Shih, C. Huang. A Semantic Network Modeling for Understanding BaseballVideo[C]//IEEE International Conference on Acoustics, Speech, and Signal Pro-cessing. Hong Kong: IEEE, 2003, 5:820–823.
    97 L. Duan, M. Xu, Q. Tian. Semantic Shot Classification in Sports Video[C]//SPIEConference on Storage and Retrieval for Image and Video Databases. San Jose:SPIE, 2003:300–313.
    98 D. Zhang, R. Ray, S. Chang. General and Domain-specific Techniques for Detectingand Recognizing Superimposed Text in Video[C]//IEEE International Conferenceon Image Processing. New York: IEEE, 2002, 1:593–596.
    99 L. Xie, P. Xu, S. Chang, et al. Structure Analysis of Soccer Video with DomainKnowledge and Hidden Markov Models[J]. Pattern Recognition Letter, 2004,25(7):767–775.
    100 K. Peker, R. Cabasson, A. Divakaran. Rapid Generation of Sports Video High-lights Using the Mpeg-7 Motion Activity Descriptor[C]//SPIE Conference on In-ternet Multimedia Management System III. San Jose: SPIE, 2000, 4676:318–323.
    101 H. Michael, A. Thorsten, O. Rainer. Application of Mpeg-7 Descriptors forContent-based Indexing of Sports Videos[C]//SPIE Conference on Visual Commu-nications and Image Processing. Lugano: SPIE, 2003, 5150:1317–1328.
    102 D. Tjondronegoro, Y. Chen, B. Pham. Highlights for more Complete Sports VideoSummarization[J]. IEEE Multimedia, 2004, 11(4):22–37.
    103 L. Duan, M. Xu, C. X. Q. Tian, et al. A Unified Framework for Semantic ShotClassification in Sports Video[J]. IEEE Trans. on Multimedia, 2005, 7(6):1066–1083.
    104 G. Piriou, P. Bouthemy, J. Yao. Learned Probabilistic Image Motion Models forEvent Detection in Videos[C]//IEEE International Conference on Pattern Recogni-tion. Cambridge: IEEE, 2004, 4:207–210.
    105 Q. Ye, Q. Huang, S. Jiang, et al. Jersey Number Detection in Sports Video forAthlete Identification[C]//SPIE Conference on Visual Communications and ImageProcessing. Beijing: SPIE, 2005, 5960:1599–1606.
    106 K. Wan, J. Wang, C. Xu, et al. Automatic Sports Highlights Extraction with ContentAugmentation[C]//IEEE Pacific-Rim Conference on Multimedia. Tokyo: Springer,2004, 3332:19–26.
    107 S. Nepal, U. Srinivasan, G. Reynolds. Automatic Detection of Gaol Segmentsin Basketball Videos[C]//ACM International Conference on Multimedia. Ottawa:ACM, 2001:261–269.
    108 J. Assfalg, M. Bertini, C. Colombo, et al. Semantic Annotation of Sports Videos[J].IEEE Multimedia, 2002, 9(2):52–60.
    109 J. Wang, C. Xu, E. Chng, et al. Automatic Generation of Personalized MusicSports Video[C]//ACM International Conference on Multimedia. Singapore: ACM,2005:735–744.
    110 L. Xie, L. Kennedy, S. Chang, et al. Discovering Meaningful Multimedia Patternswith Audio-visual Concepts and Associated Text[R]. Tech. Rep. 2004-128, Mit-subishi Electric Research Lab, 2005.
    111 A. Demiris, M. Traka, E. Reusens, et al. Enhanced Sports Broadcasting by Meansof Augmented Reality in Mpeg-4[C]//EURO Image ICAV3D. Mykonos, 2001:10–13.
    112 A. Demiris, G. Diamantakos, K. Walczak, et al. Pist: Mixed Reality for SportsTv[C]//International Workshop on Very Low Bitrate Video Coding. Athens, 2001.
    113 Y. Ohta, I. Kitahara, Y. Kameda, et al. Live 3d Video in Soccer Stadium[J]. Inter-national Journal of Computer Vision, 2007, 75(1):173–187.
    114 X. Yan, X. Yu, T. Hay. A 3d Reconstruction and Enrichment System for BroadcastSoccer Video[C]//ACM International Conference on Multimedia. New York: ACM,2004:746–747.
    115 T. Bebie, H. Bieri. A Video-based 3d Reconstruction of Soccer Games[J]. EUROGraphics, 2000, 19(3):391–400.
    116 X. Qiu, Z. Wang, S. Xia, et al. Virtual Real Comparison Technique Used on SportSimulation and Analysis[C]//International Conference on Signal Processing. Istan-bul, 2004:1296–1300.
    117 Y. Ariki, S. Kubota, M. Kumano. Automatic Production System of Soccer SportsVideo by Digital Camera Work Based on Situation Recognition[C]//IEEE Interna-tional Symposium on Multimedia. San Diego: IEEE, 2006:851–860.
    118 J. Wang, C. Xu, E. Chng, et al. Automatic Replay Generation for Soccer VideoBroadcasting[C]//ACM International Conference on Multimedia. New York: ACM,2004:31–38.
    119 J. Wang, E. Chng, C. Xu, et al. Generation of Personalized Music Sports VideoUsing Multimodal Cues[J]. IEEE Trans. on Multimedia, 2006, 9(3):576–588.
    120 T. Taki, J. Hasegawa, T. Fukumura. Development of Motion Analysis System forQuantitative Evaluation of Teamwork in Socer Games[C]//IEEE International Con-ference on Image Processing. Lausanne: IEEE, 1996, 3:815–818.
    121 S. Hirano, S. Tsumoto. Finding Interesting Pass Patterns from Soccer GameRecords[C]//European Conference on Principles and Practic of Knowledge Dis-covery in Database. Pisa: Springer, 2004, 3202:209–218.
    122 J. Wang, N. Parameswaran. Analyzing Tennis Tactics from Broadcasting TennisVideo Clips[C]//International Conference on Multimedia Modeling. Melbourne:Springer, 2005:102–106.
    123 X. Yu, H. Leong, J. Lim, et al. Team Possession Analysis for Broadcast SoccerVideo Based on Ball Trajectory[C]//International Conference on Information, Com-munication and Signal Processing. Singapore: Springer, 2003, 3:1811–1815.
    124 Q. Tang, S. Jin, H. Sun. A Framework for Visualization of Soccer Video UsingMotion Trajectories: Presenting Insights Into Tactics and Performance[C]//IEEEInternational Symposium on Intelligent Multimedia, Video and Speech Processing.Hong Kong: IEEE, 2004:551–554.
    125 G. Sudhir, J. Lee, A. Jain. Automatic Classification of Tennis Video for High-level Content-based Retrieval[C]//IEEE International Workshop on Content-BasedAccess of Image and Video Databases. Bombay: IEEE, 1998:81–90.
    126 S. Jiang, Q. Ye, W. Gao, et al. A New Method to Segment Playfield and its Appli-cations in Match Analysis in Sports Video[C]//ACM International Conference onMultimedia. New York: ACM, 2004:292–295.
    127 Q. Ye, W. Gao, W. Zeng. Color Image Segmentation Using Density-based Cluster-ing[C]//IEEE International Conference on Acoustics, Speech, and Signal Process-ing. Hong Kong: IEEE, 2003, 2:401–404.
    128 P. Perez, C. Hue, J. Vermaak, et al. Color Based Probabilistic Track-ing[C]//European Conference on Computer Vision. Copenhagen: Springer,2002:661–675.
    129 M. Isard, A. Blake. Condensation-conditional Density Propagation for VisualTracking[J]. International Journal of Computer Vision, 1998, 29(1):5–28.
    130 A. Doucet. On Sequential Simulation-based Methods for Bayesian Filter[R]. Tech.Rep. CUED/F-INFENG/TR310, University of Cambridge, 1998.
    131 R. Merwe, A. Doucet, N. Freitas, et al. The Unscented Particle Filter[R]. Tech.Rep. CUED/F-INFENG/TR380, University of Cambridge, 2000.
    132 G. Pingali, Y. Jean, I. Carlbom. Real Time Tracking for Enhanced Tennis Broad-casts[C]//IEEE International Conference on Computer Vision and Pattern Recogni-tion. Santa Barbara: IEEE, 1998:260–265.
    133 B. Horn, B. Schunck. Determining Optical Flow[J]. Artificial Intelligence, 1981,17:185–203.
    134 D. Comaniciu, V. Ramesh, P. Meer. Kernel-based Object Tracking[J]. IEEE Trans.on Pattern Analysis and Machine Intelligence, 2003, 25(5):564–577.
    135 N. Dalal, B. Triggs. Histograms of Oriented Gradients for Human Detec-tion[C]//IEEE International Conference on Computer Vision and Pattern Recog-nition. Santa Diego: IEEE, 2005, 1:886–893.
    136 R. Picard. Affective Computing[M]. MIT Press, 2000.
    137 J. Russell, J. Snodgrass. Emotion and the Environment[M]. John Wiley, 1987:245–280.
    138 J. Russell, G. Pratt. A Description of the Affective Quality Attributed to Environ-ments[J]. Journal of Personality and Social Psychology, 1980, 38(2):311–322.
    139 A. Mehrabian, J. Russell. An Approach to Environmental Psychology[M]. MITPress, 1974.
    140 M. Bradley. Emotional Memory: A Dimensional Analysis[M]. Lawrence Erlbaum,1994:93–134.
    141 P. Lang. The Network Model of Emotion: Motivational Connections[M]. LawrenceErlbaum, 1995.
    142 C. Osgood, G. Suci, P. Tannenbaum. The Measurement of Meaning[M]. Universityof Illinois Press, 1957.
    143 J. Russell, A. Mehrabian. Evidence for a Three-factor Theory of Emotions[J]. Jour-nal of Research in Personality, 1977, 11:273–294.
    144 R. Dietz, A. Lang. Aefective Agents: Effects of Agent Affect on Arousal, AttentionLiking and Learning[C]//International Conference on Cognitive Technology. SanFrancisco, 1999.
    145 R. Hartley, A. Zisserman. Multiple View Geometry in Computer Vision[M]. Cam-bridge University Press, 2003.
    146 D. Farin, S. Drabbe, W. Effelsberg, et al. Robust Camera Calibration for SportVideos Using Court Models[C]//SPIE Conference on Storage and Retrieval Meth-ods and Applications for Multimedia. San Jose: SPIE, 2004, 5307:80–91.
    147 F. Dufaux, J. Konrad. Efficient, Robust, and Fast Global Motion Estimation forVideo Coding[J]. IEEE Trans. on Imapge Processing, 2000, 9(3):497–501.
    148邢丽媛.基于音视频融合的体育视频分析及精彩排序[D]中国科学院计算技术研究所, 2006:9–12.
    149 M. Xu. Content Based Sports Video Analysis Using Multiple Modali-ties[D]National University of Singapore, 2003:17–31.
    150 A. Hanjalic, L. Xu. Affective Video Content Representation and Modeling[J]. IEEETrans. on Multimedia, 2005, 7(1):143–154.
    151 J. Chin, V. Diehl, K. Norman. Development of an Instrument Measuring UserSatisfaction of the Human-computer Interface[C]//SIGCHI Conference on HumanFactors in CS. Washington, 1988:213–218.
    152 A. Jain. Statistical Pattern Recognition: A Review[J]. IEEE Trans. on PatternAnalysis and Machine Intelligence, 2000, 22(1):4–37.
    153 C. Kang, J. Hwang, K. Li. Trajectory Analysis for Soccer Players[C]//IEEE Interna-tional Conference on Data Mining Workshops. Hong Kong: IEEE, 2006:377–381.
    154 L. Rabiner. A Tutorial on Hidden Markov Model and Selected Applications inSpeech Recognition[J]. Proceedings of IEEE, 1989, 77(2):257–285.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700