基于音视频融合的网球视频检索
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
本文以网球视频为研究对象,检测网球比赛视频中出现的精彩事件,如ACE球、上网球等。并提出了一个网球视频精彩事件检测框架,实现了视频流语义分析、音频流语义分析和音视频特征融合的精彩事件检测三个部分。
     视频流语义分析包括镜头分类、球员检测、球员跟踪等。镜头分类是网球比赛视频分析的基础,直接决定精彩事件检测的准确性。针对目前现有的镜头分类方法,结合网球比赛视频的特点,提出了一种基于Hough直线检测的镜头分类方法,将镜头分为比赛镜头和非比赛镜头。然后在比赛镜头中利用帧差法提取球员所在位置区域,利用Camshift算法实现球员跟踪。
     音频流语义分析包括基于帧的特征提取及基于段的音频分类等。本文先将音频流分段,再对音频段分帧处理,然后提取音频帧的特征参数,包括短时平均能量、短时过零率、MFCC以及差分MFCC等,利用连续隐马尔可夫模型实现对音频段的分类,将音频段分为击球声、欢呼声、解说员激昂解说、解说员平缓解说、背景噪音等五个类别。
     最后通过比赛镜头的长度、球员位置、球员运动变化、击球声和欢呼声等特征检测出ACE球事件、底线对打事件和上网球事件。
     综上所述,本文采用音视频融合的方法实现了网球比赛精彩事件自动分析与提取。最后,本文以Visual C++ 6.0、matlab 7.0为开发平台,应用Intel OpenCV Library实现了一个网球视频自动分析原型系统。实验表明,本文提出的网球视频语义分析算法具有令人满意的效果。
This paper uses tennis video as research object, detects the exciting events occurred in the tennis video, such as ACE,Net-approach and so on. And proposes a wonderful tennis video incident detection framework, containing visual semantic analysis, audio semantic analysis, highlights detection based on the fusion of audio-visual information.
     Visual semantic analysis includes shot classification, player detection, player tracking. Shot classification which directly determines the accuracy of exciting event detection is the foundation of Video analysis. For the current shot classification, combined with the characteristics of tennis game video, shot classification method based on Hough line detection is proposed. The lens is divided into game and non-game camera lens. And then extracts the location of players in the game lens by the frame difference, and accomplish player tracking using Camshift algorithm.
     Audio semantic analysis includes frame-level audio features extraction and audio clip recognition. This paper designs an algorithm to extract average short time energy, short time zero-crossing rate, MFCC and difference MFCC. And then an audio classifier based on continuous hidden Markov model is realized which divides audio information in tennis game into five classes:shots, cheers, excited commentary, normal commentary and background noise.
     Finally, ACE ball, Base-line Rally and Net-approach can be abstracted according to the length of the lens, position of player, player movement, shots and cheers.
     In summary, this paper proposes an algorithm to automatically analyze and extract tennis game highlights scene based on the fusion of audio-visual features. And a prototype system, using Intel OpenCV Library, for automatic tennis video semantic analysis by Visual C++6.0 and matlab 7.0 is implemented. Experiments have demonstrated that all these methods are effective.
引文
[1]任晓峰.基于内容的视频检索算法研究[D].武汉:武汉理工大学,2008
    [2]A. Jain, A. Vailaya, Wei Xiong. Query by video clip[C].In Proceedings of the Fourteenth International Conference on PatternRecognition.(1):909-911
    [3]M. S.Lew, N. Sebe, J. P. Eakins. Content Based Analysis for Video from Snooker Broadcasts[C].In Proceedings of the International Conference on Image and Video Retrieval. London, UK, July 18-19,2002:198-205
    [4]H. Denman, N. Rea, A. Kokaram. Content Based Analysis for Video from Snooker Broadcasts[J].In Proceedings of the International Conference on Image and Video Retrieval. London, UK, July 18-19,2002:198-205
    [5]Assfalg J., Bertini M. et al.Soccer Highlights Detection and Recognition Using HMMs[C].In Proceedings of the IEEE International Conference on Multimedia and Expo.Lausanne, Switzerland,2002:825-828
    [6]Yihong Gong, Lim Teck Sin, Chua Hock et al. Automatic Parsing of TV Soccer Programs[C].In Proceedings of the IEEE International Conference on Multimedia Computing and Systems. Washington, D. C., USA,1995:167-174
    [7]Y. Ohno,J. Miurs,Y. Shirai. Tracking players and a ball in soccer games[C].In Proceedings of International Conference on Multisensor Fusion and Integration for Intelligent Systems.15-18 Aug.1999:147-152
    [8]Y.Ohno, J. Miura, Y. Shirai.Tracking players and estimation of the 3D position of a ball in soccer games[C].In Proceedings of International Conference on Pattern Recognition. 3-7 Sept.2000:145-148
    [9]杨颖,林守勋,张勇东等.基于动态规划融合多模态的足球视频事件分析[J].计算机辅助设计与图形学学报,2008,20(8):1056-1063
    [10]Surya Nepal, Uma Srinivasan. Graham Reynolds Automatic Detection of Goal Segments in Basketball Videos[C].May.2001,ACM:1-581
    [11]R.Leonardi, P. Migliorati, M. Prandini.Semantic indexing of soccer audio-visual sequences:a multimodal approach based on controlled Markov chains[J].Circuits and Systems for Video Technology,IEEE Transactions May 2004,14(5):634-643
    [12]Xingquan Zhu, Walid G.Aref, JianPing Fan, Ann C.Catlin, Ahmed K. Elmagarmid. Medical Video Mining for Efficient Database Indexing[C].In Proceedings of the 19th IEEE International Conference on Data Engineering.2003
    [13]Baoxin Li, M. I. Sezan. Event Detection and Summarization in American Football Broadcast Video[C].In Proceedings of SPIE on Storage and Retrieval for Media Database. San Jose,USA,2002:202-213
    [14]Wensheng Zhou, Asha Vellaikal C.-C Jay Kuo.Rule-based Video Classification System for Basketball Video Indexing[C].In Proceedings of Workshop on ACM Multimedia. Marina Del Rey CA USA,2003:1-581
    [15]蒋树强.面向体育视频增强与拓展的多视角场景重建技术的研究与系统开发[EB/OL].http://www.jdl.ac.cn/en/project/mrhomepage/index.htm
    [16]Sharp Technology Ventures. HiMPACT? Sports Technology[EB/OL].http://www. sharptechnologyventures.com/tech/himpact.php.
    [17]Niblack W, Et al.Updates to the QBIC system[J].SPIE.1998,12(33):150-161
    [18]李慧芳.网球视频多层次分析与检索[D].南京:南京理工大学,2009
    [19]董敏.网球比赛视频分析的若干技术研究[D].南京:南京理工大学,2008
    [20]庄越挺,潘云鹤,吴飞.网上多媒体信息分析与检索[M].第1版.北京:清华大学出版社,2001
    [21]Xu Peng, Xe Lexing, Chang S.F. et al.Algorithms and System for Segmentation and Structure Analysis in Soccer Video[C].In Proceedings of the IEEE International Conference on Multimedia and Expo.Tokyo, Japan,2001:721-724
    [22]Ahmet Ekin, Tekalp A. Murat. Shot Type Classification by Dominant Color for Sports Video Segmentation and Summarization[C].In Proceedings of IEEE International Conference on Acoustics Speech and Signal Processing. New York, USA,2003: 173-176
    [23]J.Han, D.Farin, P. H.N.de With and W. Lao.Automatic Tracking Method for Sports Video Analysis[J].In Proceedings of Symposium on information theory in the Benelux. Brussels, Belgium,2005:309-316
    [24]P. V. C.Hough. A methed and means for recognizing complex patterns.US:patent 3,069,654,1962
    [25]J. Illingworth and J.Kittler. A Survey of the Hough Transform[J].Computer Vision, Graphics and Image Processing.1988,44:87-116
    [26]V. F. Leavers. Survey:Which Hough transform?[J].Computer Vision, Graphics and Image Processing:Image Understanding.1993.58(2):250-264
    [27]边肇祺,张学工等.模式识别[M].北京:清华大学出版社,2000
    [28]崔伟东,周志华.支持向量机研究[J].计算机工程与应用.2001,1:58-61
    [29]闫龙川,杜军.基于SVM的足球场地禁区检测[J].计算机应用.2007,27:80-82
    [30]Christopher, J.C. Burges. A tutorial on support vector machines for pattern recognition[J].Data Mining and Knowledge Discovery.1998:121~167
    [31]S.S.Keerthi,E. G Gilbert.Convergence of a generalized SMO algorithm for SVM classifier design[J].Machine Learning.2002,46(1):351~360
    [32]Meier T, Ngan KN.Automatic segmentation of moving objects for video object plane generation[J].IEEE Trans.on Circuits and Systems for Video Technology,1998,8(5): 525-538
    [33]Chris Stauffer, W. E. L Grimson.Adaptive Background Mixture Models for Real-Time Tracking[C].In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Fort Collins, Colorado,1999(2):246-252
    [34]屈有山,田维坚,李英才.基于并行隔帧差分光流场与灰度分析综合算法的运动目标检测[J].光子学报,2003,32(2):182-185
    [35]万缨,韩毅,卢汉青.运动目标检测算法的探讨[J].计算机仿真,2006,23(10):221-226
    [36]赵彦玲,张之超,高振明,张庆超.灰度图像序列中基于二次帧差的分割算法[J].中国体视学与图象分析,2005,2(10):104-107
    [37]Alan J.LIPTON, Hironobu FUJIYOSHI, S.PATIL. Moving target classification and tracking from real-time video[C].In Proceedings of Workshop on Application of Comptuter Vision.Princeton:IEEE,1998,8-14
    [38]A.M.Baumberg, D.C. Hogg. An E_cient Method for Contour Tarcking using ActiveShape Models[C].Technical Report,Univesrity of Leeds, UK,1994:11
    [39]贾云德.《机器视觉》[M].北京:科学出版社,2005
    [40]Comaniciu D.,P.Meer. Mean Shift analysis and application[C].In Proceedings of the Seventh IEEE International Conference on Computer Vison,1999
    [41]Yizong Cheng. Meanshift,mode seeking,and clustering[J].Pattern Analysis and Machine Intelligence IEEE Transactions, Aug 1995,17(8):790-799
    [42]Yang Changjiang, Duraiswami Ramani, Davis Larry. Efficient Mean-shift Tracking Via a New Similarity Measure[M].United States:Institute of Electrical and Electronics Engineers Computer Society,2005:176-183
    [43]Haiting Zhai, Xiaojuan Wu, HuiHan. Reseacrh of a Real-time Hand Tracking Algorithm[C].In Proceedings of IEEE International Conference on Neural NetWorks and Brains,2005,2(0):1233-1235
    [44]Bradski G R, Clara S.Computer Vision Face Tracking for Use in a Perceptual User Interface[J].Intel Technology Journal,1998,2:67-72
    [45]N.Rea, R. Dahyot, A. Kokaram. Classification and Representation of Semantic Content in Broadcast Tennis Videos[J].In Proceedings of IEEE International Conference on Image Processing, Genoa, Italy,2005:1204-1207.
    [46]Rabiner, Lawrence R. A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition[C].In Proceedings of the IEEE,1989:257-286
    [47]谢锦辉.隐Markov模型(HMM)及其在语音处理中的应用[M].武汉:华中理工大学出版社,1995
    [48]蔡莲红,黄德智,蔡锐.现代语音技术基础与应用[M].北京:清华大学出版社,2003
    [49]白亮,老松杨,陈剑贇,吴玲达.音频自动分类中的特征分析和抽取[J].小型微型计算机系统,2005,11(11):2029-2034
    [50]杨行峻,迟惠生.语音信号数字处理[M].第1版.北京:电子工业出版社,1995
    [51]张长胜.HMM在语音识别中的应用研究[D].长春:吉林大学.2006
    [52]Latio W.,Mehrabi M. G.,Elijha K. A.. Hidden Markov Model-based Tool Wear Monitoring in Turning[J].Journal of Manufacturing Science and Engineering,2002, 124(3):651-658
    [53]卢坚,陈毅松,孙正兴等.基于隐马尔可夫模型的音频自动分类[J].软件学报,2002,13(8):1593-1597
    [54]魏带娣.基于音视频融合的足球视频检索技术研究[D].南京:南京理工大学,2009
    [55]卜庆凯,胡爱群,刘威.基于音视频特征的足球视频体育事件交互式检索方法[J].信号处理,2009,25(7):1070-1074
    [56]蔡群,陆松年,杨树堂.基于音视特征的视频内容检测方法[J].计算机工程,2007,33(22):240-242
    [57]Sundaram H, Chang S F. Video Scene Segmentation Using Video and Audio Features[C].In Proceedings of the International Conference on Multimedia and Expo.. 2000:1145-1148
    [58]Xiong Ziyou, Radhakrishnan R, Divakaran A, et al. Highlights Extraction from Sports Video Based on an Audio-visual Marker Detection Framework[C].In Proceedings of the International Conference on Multimedia and Expo..2005:29-32.
    [59]刘华咏.基于音视频特征和文字信息自动分段新闻故事[J].系统仿真学报,2004,16(11):2608-2610
    [60]Boreczk J S,Wilcox L D.A Hidden Markov Model Frame-work for Video Segmentation Using Audio and Image Features[C].In Proceedings of ICASSP.1998:3741-3744
    [61]王策,何炎祥,王云等.基于视音频特征和文本信息的新闻视频自动场景分割[J].计算机工程,2005,31(6):171-172
    [62]王伟强,高文,马继涌等.基于音视频特征的新闻条目自动分割[J].计算机科学,2001,28(8):116-120
    [63]N. Nitta, N. Babaguchi, T. Kitahashi. Extracting actors,actions and events from sports video-A fundamental approach to story tracking[C].In Proceedings of the 15th International Conference on Pattern Recognition,2000:718-721
    [64]Snoek C G. M.,Worring M. A.. State-of-the-Art Review on Multimodal Video Indexing[A].In Proceedings of the 8th Annual Conference of the Advanced School for Computing and Imaging[C].Lochem, Netherlands:Springer Verlag,2002:21-24

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700