面向视频挖掘的视觉内容分析

英文题名：A Research on Visual Content Analysis Towards Video Mining
作者：罗青山
论文级别：博士
学科专业名称：通信与信息系统
中文关键词：视频信息挖掘 ; 视频内容理解 ; 计算机视觉 ; 模式识别 ; 智能视频监控
英文关键词：video mining ; video content understanding ; computer vision ; patternrecognition ; intelligent visual surveillance
学位年度：2009
导师：曾贵华
学科代码：081001
学位授予单位：上海交通大学
论文提交日期：2009-05-01
答辩委员会主席：范建平

摘要

视频挖掘具有广泛的应用前景,它通过分析原始视频数据的内容,实现不同目的和用途的数据挖掘任务。借助视频挖掘,我们可以发现隐藏在视频内容中的有趣的模式,得到有用的知识,用来辅助情报分析和事务决策。然而,计算机在视频内容理解上的困难极大限制了视频挖掘技术的发展。
     本文着手于解决视频挖掘在视频内容理解上的难题,寻求视频内容在句法分段、语义提取等方面的解决途径,建立视频挖掘与视频数据之间的桥梁。本文提出和改进了若干算法进行视觉内容分析,实现视频的内容理解。具体工作包括:
     首先,视频的镜头检测是视频内容理解的第一步,它实现了视频内容的句法分段。在图像帧的描述及匹配上,本文提出了连续颜色直方图的概念,基于距离插值的思想建立颜色直方图,克服了简单量化的“间隔效应”。并引入空间的金字塔匹配算法,巧妙地在基于颜色直方图的图像匹配中添加了几何空间信息约束。在镜头边界的判定上,本文提出相似度演化矩阵来描述视频镜头边界的特征。借助少量的矩阵模板,并结合成熟的动态时间规整算法,实现了一种既能检测切变、又能检测渐变的统一算法。
     其次,为了实现视频流中视觉对象的自动标注,本文提出了一种基于格子的均值漂移搜索算法,将视频识别问题看成是一个对直方图特征进行检测和跟踪的问题。这种方法用一组图像标本来表示每个对象,检测算法被用来以不同的缩放比例、不同的旋转角度扫描整个图像帧。同时进行对象的跟踪,把前一帧获得对象在本帧进行了状态和特征的更新。通过将检测信息与跟踪信息进行融合,实现了视觉对象在视频中的连续识别。
     再次,本文采用了基于时空体的整体法实现视觉行为的自动标注。本文所研究的检测问题并不局限于静态的背景、稳定的光照等限制条件,而是在真实的场景中研究人物的行为。为了找到更有效的表示模式,同时克服背景运动和不同外观的影响,本文仅仅利用运动信息来描述的人物行为。基于光流场的计算和统计,本文设计了三种类型的局部运动直方图来描述某个行为时空体。另一方面,本文采用GentleAdaBoost方法,选择具有区分性的特征来学习行为模型,从而实现了对行为体的有效的分类。
     此外,本文还采用基于时空块的部件法实现视觉行为的自动标注。本文设计了一种在视频流中提取行为的时空块部件的快速算法。可以按照实际应用的需要,通过设置不同的频率参数组合,实现时空块部件的疏密程度和数量控制,使得行为的描述具有良好的可伸缩性。为了充分利用视觉行为在时间和空间上的结构信息,本文提出了“部件三链环”的概念,建立显式的形状模型来描述行为的不同时空块部件的相对位置信息。结合传统的pLSA,实现了基于部件的视频中的人物行为的检测。
     最后,本文基于低级特征分析,提出了一种定位监控视频中的“异常”行为的方法。无需预定义和学习显式的模型来描述异常行为和正常行为,本文将异常行为的检测问题理解为:从现有的几段包含正常行为的视频剪辑构成的数据库中,查询新的观测时空块。本文提出了一种时空块部件的特征描述符,融合了时空块的外观、运动和位置等三方面的信息对时空块部件进行全面的描述。为了实现“异常”行为的推理,本文提出一种“K-best”概率推理算法,对每个“时空块”进行极大似然估计,从而判断当前的部分行为是否异常。本文对现实生活中的监控视频进行了试验,结果很好地证实“K-best”算法的有效性。
The technique video mining has a bright prospect of application, which realizes datamining for different goals and different tasks by automatic analysis on the content from thoseraw videos. Specially, the hidden patterns of interest can be discovered, and useful knowl-edge can also be obtained. They are meaningful and helpful for information analysis anddecision-making for transaction. However, the difficulties on video content understandinglimit the development of video mining.
     This paper aims at solving the key problems on video content understanding towardsvideo mining. We make efforts to find solutions for video syntax segmentation and semanticinformation extraction, which bridges the gap between data mining and video sequences.Some algorithms are proposed or promoted, this paper realizes video content understandingby visual information analysis. The main contributions include:
     Firstly, automatic shot detection is the first step on the way of video content under-standing, which realizes syntax segmentation. A concept of continuous color histogram isproposed, which is based on the idea of distance-interpolation, and the resulting histogramavoids the interval effect. In addition, Spatial Pyramid Matching is introduced to add geome-try restrictions to frames matching. When determining a shot boundary, similarity evolutionmatrix is proposed to characterize the potential shot boundaries. To compare to severalmatrix templates, Dynamic Time warping is introduced to match different matrices. Thismethod for shot detection is a united method which can detect both abrupt boundaries andgradual boundaries.
     Secondly, to achieve automatic annotation of visual objects in videos, a grid-basedMean-Shift method is proposed which treats video recognition as a problem of detectingand tracking on histogram features. With this method, a set of exemplars are applied to rep-resent an object, and an efficient detection is used to scan over the whole video frames withmultiple scales and rotations. Furthermore, detection is going together with tracking, and the pre-obtained objects are updated. Finally, continuous video recognition is achieved bycombining results from detection and tracking.
     Thirdly, a holistic approach based on spatio-temporal volumes is proposed to realize theautomatic annotation of visual actions. The detecting problem is not limited in controlled set-tings like stationary background or invariant illumination, but studied in real scenarios. Todevelop effective representation while remaining resistant to background motions, only mo-tion information is exploited to define suitable descriptors for action volumes. Based on thecalculation of optical ?ow, three types of local motion histograms are designed to describethe action inside a spatio-temporal volume. On the other hand, action models are learned byusing boosting techniques to select discriminative features for efficient classification.
     Additionally, a part-based approachb ased on spatio-temporal cuboids is also proposedto realize the automatic annotation of visual actions. To ensure enough number of cuboidscan be extracted, an improved detector is used to detect interest points at multiple frequen-cies. We can adjust the density and number of interest points via different combination offrequencies according to the requirement, which achieves a scalable description of an action.To make full use of the structural information among cuboids, a concept word triplet is pre-sented, which builds an explicit shape model to describe the relative positions of cuboids.The classic probabilistic Latent Semantic Analysis is introduced to achieve our part-basedaction detection.
     Finally, using low-level features, an approach capable of detecting and localizing ir-regularities in surveillance video is proposed. Without predefining rules or learning explicitmodels to describe regularities and irregularities, we formulate the detecting problem asquerying new observed cuboids from the database built from several video clips containingonly regular behaviors. This paper designs a descriptor to characterize a spatio-temporalcuboid, which fuses appearance, motion and spatio-temporal configuration. To infer irregu-lar cuboids from videos, a“K-best”probabilistic inference algorithm is employed to find theML estimation for each cuboid to check whether the current part of behavior is irregular ornot. Experiments on real world videos have validated the approach quantitatively.

引文

[1] Ankur Agarwal and Bill Triggs. Learning to track 3d human motion from silhouettes. In ICML,pages 9–16, July 2004.
    [2] Gaurav Aggarwal, Amit K. Roy Chowdhury, and Rama Chellappa. A system identification ap-proach for video-based face recognition. In ICPR’2004, August 2004.
    [3] Y. Alp Aslandogan and Clement T. Yu. Techniques and systems for image and video retrieval.IEEE Transactions on Knowledge and Data Engineering, 11(1):56–63, 1999.
    [4] D. Berndt and J. Clifford. Using dynamic time warping to find patterns in time series. In AAAI-94workshop on knowledge discovery in databases (KDD-94), pages 229–248, 1994.
    [5] Zaoqi Bian and Xuegong Zhang. Pattern Recognition. Tsinghua University Press, Beijing, China,2000. In Chinise.
    [6] M. J. Black. Explaining optical ?ow events with parameterized spatio-temporal models. In CVPR,1999.
    [7] Oren Boiman and Michal Irani. Detecting irregularities in images and in video. In ICCV, volume1, pages 462–469, 2005.
    [8] Jiaheng Cao, Feng di Shu, Kai Zhang, Min Pen, and Ke Ye. The system prototype for data miningof multimedia database. Journal of Wuhan Univ. (Nat. Sci. Ed.), 46(5):567–570, 2000. In Chinese.
    [9] Dorin Comaniciu and Peter Meer. Mean shift: A robust approach toward feature space analysis.IEEE Transactions on Pattern Analysis and Machine Intelligence, 24:603–619, May 2002.
    [10] Dorin Comaniciu, Visvanathan Ramesh, and Peter Meer. Real-time tracking of non-rigid objectsusing mean shift. In CVPR’2000, volume 2, pages 142–149, June 2000.
    [11] Kexue Dai, Defeng Wu, Changjian Fu, Guohui Li, and Huijia Li. Video mining: A survey. Journalof Image and Graphics, 11(4):451–457, 2006. In Chinese.
    [12] Navneet Dalal and Bill Triggs. Histograms of oriented gradients for human detection. In CVPR,volume 2, pages 886–893, June 2005.
    [13] Navneet Dalal, Bill Triggs, and Cordelia Schmid. Human detection using oriented histograms of?ow and appearance. In ECCV, volume 2, pages 428–441, May 2006.
    [14] Ajay Divakaran, Koji Miyahara, Kadir A. Peker, Regunathan Radhakrishnan, and Ziyou Xiong.Video mining using combinations of unsupervised and supervised learning techniques. In SPIEConference on Storage and Retrieval for Multimedia Databases, pages 235–243, San Jose, Cali-fornia, USA, 2004.
    [15] Piotr Dolla′r, Vincent Rabaud, Garrison Cottrell, and Serge Belongie. Behavior recognition viasparse spatio-temporal features. In VS-PETS, pages 65–72, October 2005.
    [16] Rakesh Dugad, Krishna Ratakonda, and Narendra Ahuja. Robust video shot change detection. InIEEE Workshop on Multimedia Signal Processing, December 1998.
    [17] Alexei A. Efros, Alexander C. Berg, Greg Mori, and Jitendra Malik. Recognizing action at adistance. In ICCV, pages 726–733, 2003.
    [18] Jianping Fan, Yuli Gao, Hangzai Luo, and R. Jain. Mining multi-level image semantics via hi-erarchical classification. IEEE Trans. on Multimedia, special issue on Multimedia Data Mining,10(1):167–187, 2008.
    [19] Jianping Fan, Hangzai Luo, and Mohand-Said Hacid. Mining images on semantics via statisticallearning. In KDD’05: Proceedings of the eleventh ACM SIGKDD international conference onKnowledge discovery in data mining, pages 22–31, New York, NY, USA, 2005. ACM.
    [20] Rob Fergus, P. Perona, and A. Zisserman. A sparse object category model for efficient learning andexhaustive recognition. In CVPR, July 2005.
    [21] T. S. Ferguson. A bayesian analysis of some nonparametric problems. Annals of Statistics,1(2):209–230, 1973.
    [22] Y. Freund and R. E. Schapire. Experiments with a new boosting algorithm. In Proc. of the 13thInternational Conference on Machine Learning, pages 148–156, 1996.
    [23] Y. Freund and R. E. Schapire. A decision-theoretic generalization fo online learning and an appli-cation to boosting. Journal of Computer and System Sciences, 55(1):119–139, 1997.
    [24] Jerome Friedman, Trevor Hastie, and Robert Tibshirani. Additive logistic regression: A statisticalview of boosting. The Annals of Statistics, 38(2):337–374, April 2000.
    [25] U. Gargi, R. Kasturi, and S. H. Strayer. Performance characterization of video-shot-change de-tection methods. IEEE Transaction on Circuits and Systems for Video Technology, 10(1):1–13,February 2000.
    [26] D. M. Gavrila. The visual analysis of human movement: A survey. Computer Vision and ImageUnderstanding: CVIU, 73(1):82–98, 1999.
    [27] Michael A. Gennert and Shahriar Negahdaripour. Relaxing the brightness constancy assumptionin computing optical ?ow. A.I. Memo 975, M.I.T., June 1987.
    [28] Bogdan Georgescu, Ilan Shimshoni, and Peter Meer. Mean shift based clustering in high dimen-sions: A texture classification example. In ICCV’2003, pages 456–463, October 2003.
    [29] D. Gorodnichy. Video-based framework for face recognition in video. In CRV’05, pages 330–338,May 2005.
    [30] Costantina Grana and Rita Cucchiara. Linear transition detection as a unified shot detection ap-proach. IEEE Transitions on Circuits and Systems for Video Technology, 17(4):483–489, April2007.
    [31] Kristen Grauman and Trevor Darrell. The pyramid match kernel: Discriminative classificationwith sets of image features. In International Journal of Computer Vision: ICCV, volume 2, pages1458–1465, 2005.
    [32] A. Hampapur, R. Jain, and T. Weymouth. Digital video segmentation. In Proceedings of ACMMultimedia, pages 357–364, 1994.
    [33] A. Hampapur, R. Jain, and T. Weymouth. Production model based digital video segmentation.Multimedia Tools and Applications, 1(1):9–46, 1995.
    [34] Arun Hampapur, Lisa Brown, Jonathan Connell, Sbarat Pankanti, Andrew Senior, and Yingli Tian.Smart surveillance: applications, technologies and implications. In Proceedings of IEEE Pacific-Rim Conference on Multimedia, pages 451–456, 2003.
    [35] Jiawei Han and Micheline Kamber. Data Mining: Concepts and Techniques. Morgan KaufmannPublishers, San Francisco, CA, USA, 2002.
    [36] Alan Hanjalic. Shot-boundary detection: unraveled and resolved? IEEE Transitions on Circuitsand Systems for Video Technology, 12(1):90–105, January 2002.
    [37] B. Heisele, U. Kressel, and W. Ritter. Tracking non-rigid, moving objects based on color cluster?ow. In CVPR, pages 257–260, 1997.
    [38] Thomas Hofmann. Probabilistic latent semantic indexing. In Proceedings of the 22nd InternationalConference on Research and Development in Information Retrieval (SIGIR’99), 1999.
    [39] Thomas Hofmann. Unsupervised learning by probabilistic latent semantic analysis. MechineLearning, 42(1-2):177–196, 2001.
    [40] Juntao Hu, Defeng Wu, and Guohui Li. System structure and approaches of multimedia datamining. Computer Engineering, 29(9):149–151, 2003. In Chinese.
    [41] ISO/IEC JTC1/SC29/WG11. Iso/iec 15938-3 information technology—multimedia content de-scription interface—part 3: Visual, doc. n4358, July 2001.
    [42] ISO/IEC JTC1/SC29/WG11. Mpeg-7 visual part of experimentation model version 13.0, doc.n4582, March 2002.
    [43] Yan Ke, Rahul Sukthankar, and Martial Hebert. Efficient visual event detection using volumetricfeatures. In ICCV, volume 1, pages 166–173, 2005.
    [44] Eamonn Keogh. Exact indexing of dynamic time warping. In VLDB’02: Proceedings of the 28thinternational conference on Very Large Data Bases, pages 406–417. VLDB Endowment, 2002.
    [45] Tae-Kyun Kim, Shu-Fai Wong, and Roberto Cipolla. Tensor canonical correlation analysis foraction classification. In CVPR, June 2007.
    [46] H. Koumaras, G. Gardikis, G. Xilouris, E. Pallis, and A. Kourtis. Shot boundary detection withoutthreshold parameters. Journal of Electronic Imaging, 15(2):1–3, 2006.
    [47] Christian Ku¨blbeck and Andreas Ernst. Face detection and tracking in video sequences using themodified census transformation. Image and Vision Computing, 24:564–572, 2006.
    [48] Ivan Laptev. Improvements of object detection using boosted histograms. In BMVC, volume 3,pages 949–958, 2006.
    [49] Ivan Laptev and Tony Lindeberg. Space-time interest points. In ICCV, volume 1, pages 432–439,2003.
    [50] Ivan Laptev and Patrick Pe′rez. Retrieving actions in movies. In ICCV, pages 432–439, 2007.
    [51] Svetlana Lazebnik, Cordelia Schmid, and Jean Ponce. Beyond bags of features: Spatial pyramidmatching for recognizing natural scene categories. In CVPR, volume 2, pages 2169–2178, 2006.
    [52] Kuang-Chih Lee, Jeffrey Ho, Ming-Hsuan Yang, and David Kriegman. Video-based faces recog-nition using probabilistic appearance manifolds. In CVPR’2003, volume 1, pages 313–320, June2003.
    [53] Bastian Leibe, AlesˇLeonardis, and Bernt Schiele. Robust object detection with interleaved cate-gorization and segmentation. International Journal of Computer Vision, pages 259–289, 2008.
    [54] Shih-Fu Chang Lexing Xie. Unsupervised mining of statistical temporal structures in video. Kluw-erAcademic Publishers, Boston, USA, 2003. Book chapter.
    [55] Baoxin Li and Rama Chellappa. A generic approach to simultaneous tracking and verification invideo. IEEE Transactions on Image Processing, 11(5):530–544, 2002.
    [56] Guohui Li, Defeng Wu, and Changjian Fu. Multimedia knowledge and mining. In Proceedings ofthe 13 th National Conference of Multimedia Technology, pages 451–456, 2004. In Chinese.
    [57] Yun Li, Chunjing Xu, Jianzhuang Liu, and Xiaoou Tang. Detecting irregularity in videos usingkernel estimation and kd trees. In Proceedings of the 14th annual ACM international conferenceon Multimedia, pages 639–642, 2006.
    [58] Rainer Lienhart. Reliable dissolve detection. In Proc. SPIE Storage and Retrieval for Media Data,pages 219–230, January 2001.
    [59] Rainer Lienhart. Reliable transition detection in videos: A survey and practitioner’s guide. Inter-national Journal of Image and Graphics (IJIG), 1(3):469–486, August 2001.
    [60] Rainer Lienhart, Alexander Kuranov, and Vadim Pisarevsky. Empirical analysis of detection cas-cades of boosted classifiers for rapid object detection. MRL Technical Report, May 2002.
    [61] Zhongwei Liu and Yujin Zhang. Color image retrieval using local accumulative histogram. Journalof Image and Graphics, 3(7), 1998. In Chinese.
    [62] H. Lu and Y. Zhang. An efficient algorithm for detecting abrupt scene change in video. Journal ofImage and Graphics, 4(10):805–810, October 1999. (in Chinese).
    [63] Xidao Luan, Yuxiang Xie, Zhiguang Han, and Lingda Wu. Research on news video mining tech-niques. Computer Science, 34(2), 2007. In Chinese.
    [64] Hangzai Luo, Jianping Fan, and Daniel A. Keim. Personalized news video recommendation. InMM’08: Proceeding of the 16th ACM international conference on Multimedia, pages 1001–1002,New York, NY, USA, 2008. ACM.
    [65] Hangzai Luo, Yuli Gao, Xiangyang Xue, Jinye Peng, and Jianping Fan. Incorporating featurehierarchy and boosting to achieve more effective classifier training and concept-oriented videosummarization and skimming. ACM Trans. Multimedia Comput. Commun. Appl., 4(1):1–25, 2008.
    [66] B. S. Manjunath, Jens rainer Ohm, Vinod V. Vasudevan, and Akio Yamada. Color and texturedescriptors. IEEE Transaction on Circuits and Systems for Video Technology, 11(6):703–715, June2001.
    [67] Aleix M. Martinez and Yongbin Zhang. From static to video: Face recognition using a probabilisticapproach. In FPiV’04, June 2004.
    [68] Krystian Mikolajczyk, Ragini Choudhury, and Cordelia Schmid. Face detection in a video se-quence - a temporal approach. In CVPR’2001, December 2001.
    [69] Jeho Nam and Ahmed H. Tewfik. Detection of gradual transitions in video sequences using b-splineinterpolation. IEEE Transactions on Multimedia, 7(4):667–679, August 2005.
    [70] Juan Carlos Niebles, Hongcheng Wang, and Fei-Fei Li. Unsupervised learning of human actioncategories using spatial-temporal words. In BMVC, 2006.
    [71] JungHwan Oh, JeongKyu Lee, Sanjaykumar Kote, and Babitha Bandi. Multimedia data miningframework for raw video sequences. In Proc of the 3rd International Workshop on MultimediaData Mining (MDM/ KDD’2002), pages 1–10, July 2002.
    [72] H. Palus. The Color Image Processing Handbook. Representations of color images in differentcolor spaces. Chapman and Hall, London, U.K, 1998. Editor: S. Sangwine and R. Horne.
    [73] Jia-Yu Pan and Christos Faloutsos. Videograph: a new tool for video mining and classification.In JCDL’01: Proceedings of the 1st ACM/IEEE-CS joint conference on Digital libraries, pages116–117, New York, NY, USA, 2001. ACM.
    [74] Jia-Yu Pan and Christos Faloutsos.”geoplot”: spatial data mining on video libraries. In CIKM’02:Proceedings of the eleventh international conference on Information and knowledge management,pages 405–412, New York, NY, USA, 2002. ACM.
    [75] Jia-Yu Pan and Christos Faloutsos. Videocube: A novel tool for video mining and classification.In ICADL’02: Proceedings of the 5th International Conference on Asian Digital Libraries, pages194–205, London, UK, 2002. Springer-Verlag.
    [76] Konstantinos N. Plataniotis and Anastasios N. Venetsanopoulos. Color image processing and ap-plications. Springer-Verlag New York, Inc., New York, NY, USA, 2000.
    [77] Kin pong Chan and Ada Wai chee Fu. Efficient time series matching by wavelets. In ICDE’99:Proceedings of the 15th International Conference on Data Engineering, page 126, Washington,DC, USA, 1999. IEEE Computer Society.
    [78] Fatih Porikli. Integral histogram: A fast way to extract histograms in cartesian spaces. InCVPR’2005, volume 1, pages 829–836, December 2005.
    [79] Marc Proesmans, Luc J. Van Gool, Eric J. Pauwels, and Andre′Oosterlinck. Determination ofoptical ?ow and its discontinuities using non-linear diffusion. In ECCV, pages 295–304, London,UK, 1994. Springer-Verlag.
    [80] Y. J. Qi. Supervised classification for video shot segmentation. In Proc. IEEE Intl. Conf. onMultimedia and Expo (ICME’03), pages 689–692, July 2003.
    [81] Georges M. Que′not and Philippe Mulhem. A re-examination of text categorization methods. InCBMI’99: European Workshop on Content Based Multimedia Indexing, pages 187–194, 25–27,October 1999.
    [82] Deva Ramanan and David A. Forsyth. Automatic annotation of everyday movements. Advances inNeural Information Processing Systems 16. MIT Press, Cambridge, MA, 2004.
    [83] Deva Ramanan and David A. Forsyth. Automatic annotation of everyday movements. Advances inNeural Information Processing Systems 16, 2004. MIT Press, Cambridge, MA.
    [84] Supriya Rao and P.S. Sastry. Abnormal activity detection in video sequences using learnt proba-bility densities. In Proceedings Conference on Convergent Technologies for Asia-Pacific RegionTENCON 2003, volume 1, pages 369–372, 2003.
    [85] Zeeshan Rasheed. Video categorization using semantics and semiotics. PhD thesis, School of Elec-trical Engineering and Technology, Karachi, Orlando, FL, USA, 2003. Major Professor-MubarakShah.
    [86] Chotirat Ann Ratanamahatana and Eamonn Keogh. Everything you know about dynamic timewarping is wrong. 3rd Workshop on Mining Temporal and Sequential Data, in conjunction with10th ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining (KDD-2004), 2004.
    [87] Peter E. Hart. Richard O. Duda and David G. Stork. Pattern Classification, 2nd Edition. JohnWiley & Sons, Inc, 2001.
    [88] Jiawei Rong, Yu-Fei Ma, and Lide Wu. Gradual transition detection using em curve fitting. InProceeding of the 11th Multimedia Modelling Conference(MMM’05), 2005.
    [89] Hiroaki Sakoe and Seibi Chiba. Dynamic programming algorithm optimization for spoken wordrecognition. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1990.
    [90] Joaqu′?n Salas, Hugo Jime′nez, Joel Gonza′lez, and Juan Hurtado. Detecting unusual activities atvehicular intersections. In IEEE International Conference on Robotics and Automation, pages864–869, April 2007.
    [91] David Sankoff and Joseph B. Kruskal. Time warps, string edits, and macromolecules: The theoryand practice of sequence comparison. Reading: Addison-Wesley Publication, 1983.
    [92] R. E. Schapire. The boosting approach to machine learnings: An overview. In MSRI Workshop onNonlinear Estimation and Classification, 2002.
    [93] R. E. Schapire, Y. Freund, and P. Bartlett. Boosting the margin: A new explanation for the effec-tiveness of voting methods. Annals of Statistics, 6(5):1651–1686, 1998.
    [94] R. E. Schapire and Y. Singer. Improved boosting algorithms using confidence-rated predictions.Machine Learning, 37(3):297–336, 1999.
    [95] Christian Schuldt, Ivan Laptev, and Barbara Caputo. Recognizing human actions: A local svmapproach. In ICPR, volume 3, pages 32–36, 2004.
    [96] Mubarak Shah and Ramesh Jain. Motion-Based Recognition. Computational Imaging and VisionSeries. Kluwer Academic Publishers, 1997.
    [97] Eli Shechtman and Michal Irani. Space-time behavior based correlation. In CVPR, volume 1, pages405–412, June 2005.
    [98] Josef Sivic, Bryan C. Russell, Alexei A. Efros, Andrew Zisserman, and William T. Freeman. Dis-covering objects and their location in images. In ICCV, pages 370–377, October 2005.
    [99] Chris Stauffer and W. Eric L. Grimson. Learning patterns of activity using real-time tracking. IEEETransactions on Pattern Analysis and Machine Intelligence, 22(8):747–757, August 2000.
    [100] Daniel. F. Stewart, Lee C. Potter, and Stanley C. Ahalt. Computationally attractive real gabortransforms. IEEE Transactions On Signal Processing, 43(1):77–84, January 1995.
    [101] Markus Stricker and Markus Orengo. Similarity of color images. In SPIE, volume 2420, pages381–392, 1995.
    [102] Raghav Subbarao and Peter Meer. Nonlinear mean shift for clustering over analytic manifolds. InCVPR’2006, volume 1, pages 1168–1175, June 2006.
    [103] M. Sugano, M. Furuya, Y. Nakajima, and H. Yanagihara. Shot classification and scene segmentationbased on mpeg compressed movie analysis. In IEEE Pacific Rim Conf. on Multimedia (PCM) 2004,pages 271–279, 2004.
    [104] MICHAEL J. Swain and DANA H. Ballard. Color indexing. International Journal of ComputerVision: IJCV, 7(1):11–32, 1991.
    [105] S. M. M. Tahaghoghi, James A. Thom, and Hugh E. Williams. Shot boundary detection using themoving query window. In NIST Special Publication 500-251: Proceedings of the Eleventh TextREtrieval Conference (TREC 2002), pages 529–538, November 2002.
    [106] Carlo Tomasi and Takeo Kanade. Detection and tracking of point features, April 1991. CarnegieMellon University Technical Report CMU-CS-91-132.
    [107] Ba Tu Truong, Chitra Dorai, and Svetha Venkatesh. New enhancements to cut, fade, and dissolvedetection processes in video segmentation. In ACM Multimedia 2000, pages 219–227, November2000.
    [108] Ragini Choudhury Verma, Cordelia Schmid, and Krystian Mikolajczyk. Face detection and track-ing in a video by propagating detection probabilities. IEEE Transactions on Pattern Analysis andMachine Intelligence, 25(10):1215–1228, 2003.
    [109] Paul Viola and Michael Jones. Rapid object detection using a boosted cascade of simple features.In CVPR’2001, volume 1, pages 511–518, 2001.
    [110] Timo Volkmer, S. M. M. Tahaghoghi, and Hugh E. Williams. Gradual transition detection us-ing average frame similarity. In Proceedings of the 2004 IEEE Computer Society Conference onComputer Vision and Pattern Recognition Workshops (CVPRW’04), pages 139–146, 2004.
    [111] M. J. Wainwright and M. I. Jordan. Graphical models, exponential families and variational infer-ence. Technical Report 649, Department of Statistics, UC Berkeley, 2003.
    [112] Duminda Wijesekera and Daniel Barbara′. Mining cinematic knowledge: Work in progress. In The1st International Workshop on Multimedia Data Mining, pages 98–103, August 2000.
    [113] Shu-Fai Wong, Tae-Kyun Kim, and Roberto Cipolla. Learning motion categories using both se-mantic and structural information. In CVPR, June 2007.
    [114] Shu-Fai Wong, Tae-Kyun Kim, and Roberto Cipolla. Learning motion categories using both se-mantic and structural information. In CVPR, June 2007.
    [115] Christopher Richard Wren, Ali Azarbayejani, Trevor Darrell, and Alex Paul Pentland. Pfinder:Real-time tracking of the human body. IEEE Transactions on Pattern Analysis and Machine Intel-ligence, 19(7):780–785, July 1997.
    [116] Yuxiang Xie. Research on News Video Mining Technology Supporting Intelligence Analysis. PhDthesis, Graduate school of National University of Defense Technology, 2004. Major Professor:Lingda Wu, In Chinese.
    [117] Yiming Yang and Xin Liu. A re-examination of text categorization methods. In SIGIR’99: Pro-ceedings of the 22nd annual international ACM SIGIR conference on Research and developmentin information retrieval, pages 42–49, New York, NY, USA, 1999. ACM.
    [118] B.-L. Yeo and B. Liu. Rapid scene analysis on compressed video. IEEE Transition on Circuits andSystems for Video Technology, 5(6):533–544, December 1995.
    [119] Alper Yilmaz and Mubarak Shah. Actions as objects: A novel action representation. In CVPR,July 2005.
    [120] Alper Yilmaz and Mubarak Shah. Recognizing human actions in videos acquired by uncalibratedmoving cameras. In ICCV, volume 1, pages 150–157, 2005.
    [121] Jinhui Yuan, Huiyi Wang, Lan Xiao, Wujie Zheng, Jianmin Li, Fuzong Lin, and Bo Zhang. Aformal study of shot boundary detection. IEEE Transitions on Circuits and Systems for VideoTechnology, 17(2):168–186, February 2007.
    [122] Osmar R. Za¨?ane, Jiawei Han, Ze-Nian Li, Sonny H. Chee, and Jenny Y. Chiang. Multimediaminer:a system prototype for multimedia data mining. SIGMOD Rec., 27(2):581–583, 1998.
    [123] Dong Zhang, Daniel Gatica-Perez, Samy Bengio, and Iain McCowan. Semi-supervised adaptedhmms for unusual event detection. In CVPR, volume 1, July 2005.
    [124] HongJiang Zhang, Atreyi Kankanhalli, and Stephen W. Smoliar. Automatic partitioning of full-motion video. Multimedia Systems, 1(1):10–28, 1993.
    [125] Yujin Zhang, Zhong W. Liu, and Yun He. Comparison and improvement of color-based imageretrieval techniques. In SPIE, volume 3312, pages 371–382, 1997.
    [126] Hua Zhong, Jianbo Shi, and Mirko′Visontai. Detecting unusual activity in video. In CVPR, 2004.
    [127] Shaohua Zhou, Volker Krueger, and Rama Chellappa. Probabilistic recognition of human facesfrom video. Computer Vision and Image Understanding, 91(1-2):214–245, 2003.
    [128] Xingquan Zhu, Jianping Fan, Mohand-Said Hacid, and Ahmed K. Elmagarmid. Classminer: min-ing medical video for scalable skimming and summarization. In ACM Multimedia 2002, pages79–80, June 2002.
    [129] Xingquan Zhu and Xindong Wu. Mining video associations for efficient databasemanagement.Multimedia Systems, 9(6):31–53, 2003.
    [130] Yunyue Zhu and Dennis Shasha. Warping indexes with envelope transforms for query by hum-ming. In SIGMOD’03: Proceedings of the 2003 ACM SIGMOD international conference onManagement of data, pages 181–192, New York, NY, USA, 2003. ACM.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700