视像概念检测中在线学习算法研究

英文题名：Study on Online Learning Algorithms in Video Concept Detection
作者：吴俊
论文级别：博士
学科专业名称：计算机科学与技术
中文关键词：在线学习 ; 基于内容的视像检索 ; 有限混合模型 ; 多时间粒度 ; TREC视像检索评测
英文关键词：Online Learning ; Content-Based Video Retrieval (CBVR) ; Finite Mixture Model (FMM) ; Temporal Multi-granularity ; TREC Video Retrieval Evaluation (TRECVID)
学位年度：2008
导师：张钹
学科代码：081203
学位授予单位：清华大学
论文提交日期：2008-06-01

摘要

在视像概念检测中可以发现,对于同一语义概念而言,其视觉特征的潜在分布通常会随着时间发生变化。针对这种现象,本文将主要研究两个核心问题:第一,不同语义概念在不同条件下随着时间到底有什么变化规律;第二,如何在有限规模训练样本的条件下动态地更新概念模型。
     论文工作主要包括:
     1)提出了一套基于有限混合模型的、衡量在线数据潜在分布变化的度量方法,专门研究不同语义概念的潜在分布随时间变化的规律。该度量方法为在线数据流中某语义概念是否发生潜在分布的变化提供了定量的判断依据,也为建立整个在线学习系统提供了较为合理的先验知识。
     2)提出了监督学习条件下多时间粒度的自适应在线学习算法。该算法主要利用不同语义概念在不同时间粒度上的偏向性,并基于多时间粒度的分类器进行融合。本文着重比较了增量式和平坦式两大类分类器的选择策略,并深入研究了上述度量指标对于该在线学习算法的指导意义。当在线数据流中某语义概念的分布变化相对较快时,可以应用该在线学习算法。
     3)提出了半监督学习条件下在线优化的增量学习算法。该算法尽可能充分地挖掘大量存在的无类别标签的样本信息,先通过本地自适应步骤得到当前最新的本地概念模型,然后利用最新本地模型动态地更新初始的全局概念模型。该算法在一定条件下解决了半监督学习环境下的模型更新问题。当在线数据流中某语义概念的分布变化相对较慢时,可以应用该在线学习算法。
     4)实验结果表明,本文提出的这套基于有限混合模型的度量方法,对于在线数据流的语义概念分布特性的描述是有效的,可以为在线学习算法提供相对合理的参考信息,具有一定的普遍意义。基于体育视像和大规模TRECVID视像数据集的实验结果表明,本文提出的两种在线学习算法比同类算法更有效。
In semantic concept detection of the online video streams, the underlying data distribution for a certain semantic concept in the visual feature space generally evolves over time. This thesis will tackle two key issues: i) what are the rules of the evolving underlying data distribution for different semantic concepts at different conditions? ii) how to update the concept models for the limited training samples from the current video sequence?
     The major contributions of this thesis comprise:
     1) Based on the Finite Mixture Models (FMMs), a couple of tracking measures are proposed to describe statistical properties of the evolving underlying data distribution in a quantitative way. On one hand, they can be utilized to investigate the evolving rules of different semantic concepts. On the other hand, they can provide much reasonable prior knowledge of the online data streams for the establishment of the whole online learning system.
     2) The Multi-granularity Adaptive (MGA) online learning algorithm in supervised learning is proposed. It mainly focuses on studying the statistical properties of the diverse granularity in the time domain for the different semantic concepts, and the corresponding classifiers fusion techniques. Two types of the classifier selection, the incremental version and the flat version, are both investigated in detail. In addition, the relationship between the enhancing capacity of the overall performance and the above defined tracking measures are also covered. This MGA algorithm is suitable for the situation where the current target concept evolves relatively quickly over time.
     3) The Online-optimized Incremental Learning (OOIL) algorithm in semi-supervised learning is proposed. It manages to sufficiently utilizes the statistical characteristics of the easily-collected unlabeled data samples from the newly-upcoming online data streams. The Local Adaptation (LA) step can derives the latest local concept models, which can also be used to dynamically update the original global concept models, resulting in the Global Model Incremental Updating (GMIU) step. Therefore, this algorithm has solved the problem of model updating in semi-supervised learning under appropriate conditions. This OOIL algorithm is suitable for the circumstance where the current target concept evolves relatively slowly over time.
     4) The experimental results show that, the proposed FMM-based tracking measures are very useful and practical to describe the evolving process of the underlying data distribution, and they are also able to be applied to the online learning applications in other domains. These tracking measures can effectively derive the evolving rule of the target concept, and provide much reasonable reference information for the above two types of the online learning algorithms. Furthermore, the experimental results based on the sports video and the large-scale TRECVID data collections demonstrate that, compared with the existing strategies, the two types of the online learning algorithms (MGA and OOIL) proposed in this thesis are more effective.

引文

[1]Berners-Lee T.Information management:A proposal,March,1989.CERN,available from http://www.w3.org/History/1989/proposal.html.
    [2]Shiflett C.HTTP Developer's Handbook.Sams Publishing,March 19,2003:282.
    [3]Baldi P,Frasconi P,Smyth R Modeling the Internet and the web:probabilistic methods and algorithms.Chichester,West Sussex;Hoboken,NJ:Wiley,2003.
    [4]Google Search.Web page at:http://www.google.com.
    [5]Roush W.Search Beyond Google.MIT Technology Review,2004.34-35.
    [6]Rasheed Z.Video categorization using semantics and semiotics.Proceedings of DIMACS Workshop on Video Mining,2002.
    [7]MPEG:Moving Picture Experts Group.http://www.mpeg.org/MPEG/index.html.
    [8]AVI:Audio Video Interleave.http://www.avi-writer.com/.
    [9]Machine Learning.Available at http://www.aaai.org/AITopics/html/machine.html.
    [10]Vapnik V.The Nature of Statistical Learning Theory.New York,NY,USA:Springer,1995.
    [11]Vapnik V.Statistical Learning Theory.New York,NY,USA:Wiley,1998.
    [12]Nilsson N J.Introduction to Machine Learning,August,2005.Draft of Incomplete Notes.http://ai.stanford.edu/nilsson/mlbook.html.
    [13]Kemal Machines.Available at http://www.kernel-machines.org/.
    [14]边肇祺,张学工等.《模式识别》(第二版).北京:清华大学出版社,2000.
    [15]Mitchell T M.Machine Learning(Hardcover).U.K.:McGraw Hill,1997.
    [16]McCarthy J.What is Artificial Intelligence? Computer Science Department,Stanford University,USA.Available at:http://www-formal.stanford.edu/jmc/whatisai/.
    [17]Charniak E,McDermott D.Introduction to Artificial Intelligence.Boston,MA,USA:Addison-Wesley Longman Publishing Co.,Inc.,November,1985:701.
    [18]Copeland.Artificial Intelligence:A Philosophical Introduction.Blackwell Publishing,January,1993:328.
    [19]Scott D R E.Artificial Intelligence:Methodology,Systems,and Applications.Springer,October,2002:289.
    [20]Turing A M.Computing machinery and intelligence.Mind,1950,59:433-460.
    [21]Wilson B.The Machine Learning Dictionary for COMP9414.http://www.cse.unsw.edu.au/billw/mldict.html.
    [22]Schlimmer J C,Richard H.Granger J.Incremental Learning from Noisy Data.Machine Learning,1986,1(3):317-354.
    [23]Widmer G,Kubat M.Effective Learning in Dynamic Environments by Explicit Context Tracking.Proceedings of the European Conference on Machine Learning(ECML'93),Vienna,Austria:Springer,1993.227-243.
    [24]Widmer G,Kubat M.Learning in the Presence of Concept Drift and Hidden Contexts.Machine Learning,1996,23(1):69-101.
    [25]Wang H,Fan W,Yu P S,et al.Mining concept-drifting data streams using ensemble classifiers.KDD'03:Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining,New York,NY,USA:ACM Press,2003.226-235.
    [26]Harries M B,Sammut C,Horn K.Extracting Hidden Context.Machine Learning,1998,32(2):101-126.
    [27]Hulten G,Spencer L,Domingos P.Mining Time-Changing Data Streams.Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining,San Francisco,CA:ACM Press,2001.97-106.
    [28]Street W N,Kim Y.A streaming ensemble algorithm(SEA)for large-scale classification.KDD '01:Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining,New York,NY,USA:ACM Press,2001.377-382.
    [29]Domingos P,Hulten G.Mining high-speed data streams.KDD'00:Proceedings of the ACM Sixth International Conference on Knowledge Discovery and Data Mining,Boston,MA,USA,2000.71-80.
    [30]Gama J,Medas P,Rodrigues P.Learning decision trees from dynamic data streams.SAC '05:Proceedings of the 2005 ACM symposium on Applied computing,New York,NY,USA:ACM Press,2005.573-577.
    [31]Kolter J Z,Maloof M A.Dynamic Weighted Majority:A new ensemble method for tracking concept drift.Proceedings of the Third IEEE International Conference on Data Mining,Los Alamitos,CA:IEEE Press,2003.123-130.
    [32]Stanley K O.Learning Concept Drift with a Committee of Decision Trees.Technical Report UTAI-TR-03-302,Department of Computer Sciences,University of Texas at Austin,USA,2003.
    [33]Fan W.StreamMiner:A Classifier Ensemble-based Engine to Mine Concept-drifting Data Streams.In:Nascimento M A,Ozsu M T,Kossmann D,et al,(eds.).VLDB'04:Proceedings of the 30th International Conference on Very Large Data Bases,San Francisco,CA,USA:Morgan Kaufmann,2004.1257-1260.
    [34]Widyantoro D H,Yen J.Relevant Data Expansion for Learning Concept Drift from Sparsely Labeled Data.IEEE Transactions on Knowledge and Data Engineering,2005,17(3):401-412.
    [35]Klinkenberg R.Learning Drifting Concepts:Example Selection vs.Example Weighting.Intelligent Data Analysis(IDA),Special Issue on Incremental Learning Systems Capable of Dealing with Concept Drift,2004,8(3):281-300.
    [36]Kubat M,Widmer G.Adapting to Drift in Continuous Domains(Extended Abstract).In:Lavrac N,Wrobel S,(eds.).ECML'95:Proceedings of the 8th European Conference on Machine Learning,London,UK:Springer-Verlag,1995.307-310.
    [37]Aha D W,Kibler D,Albert M K.Instance-based learning algorithms.Machine Learning,1995,6:37-66.
    [38]Salganicoff M.Tolerating Concept and Sampling Shift in Lazy Learning Using Prediction Error Context Switching.Artificial Intelligence Review,1997,11(1-5):133-155.
    [39]Cunningham P,Nowlan N,Delany S J,et al.A case-based approach to spam filtering that can track concept drift.Proceedings of ICCBR'03 Workshop on Long-Lived CBR Systems,Trondheim,Norway,2003.
    [40]Jebara T.Machine Learning:Discriminative and Generative,volume 67556 of The Kluwer International Series in Engineering and Computer Science.Kluwer Academic Publishing,November 1,2003:224.
    [41]Valpola H.List of Abbreviations and Glossary of Terms,2000.Available at http://www.cis.hut.fi/harri/thesis/valpola_thesis/node4.html.
    [42]Murphy K,Yuille A.A Brief Introduction to Bayes' Rule.Available at http://www.cs.ubc.ca/ murphyk/Bayes/bayesrule.html.
    [43]Blum A.On-line Algorithms in Machine Learning.In:Fiat A,Woeginger G,(eds.).Online Algorithms:the state of the art.Springer,1998.306-325.
    [44]Klinkenberg R.Meta-Learning,Model Selection,and Example Selection in Machine Learning Domains with Concept Drift.Proceedings of the seventh workshop on Learning,Knowledge Discovery,and Adaptivity,Saarland University,2005.
    [45]Fan W.Systematic data selection to mine concept-drifting data streams.KDD'04:Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining,New York,NY,USA:ACM Press,2004.128-137.
    [46]Lienhart R,Pfeiffer S,Effelsberg W.The MoCA Workbench:Support for Creativity in Movie Content Analysis.IEEE International Conference on Multimedia Computing and Systems,Hiroshima,Japan,1996.314-321.
    [47]Pfeiffer S,Lienhart R,Kuhne G,et al.The MoCA Project-Movie Content Analysis Research at the University of Mannheim.Informatik '98:Informatik zwischen Bild und Sprache 28.Jahrestagung der Gesellschaft fuer Informatik Magdeburg,Berlin,Heidelberg:Springer Verlag,1998.329-338.Availalbe at:http://www.informatik.uni-mannheim.de/pi4/projects/moca/.
    [48]Gong Y,Sint L T,Chuan C H,et al.Automatic Parsing of TV Soccer Programs.ICMCS'95:Proceedings of the International Conference on Multimedia Computing and Systems(ICMCS),Washington,DC,USA:IEEE Computer Society,1995.167-174.
    [49]Xu P,Xie L,Chang S F,et al.Algorithms and Systems for Segmentation and Structure Analysis in Soccer Video.ICME'01:Proceedings of 2001 IEEE International Conference on Multimedia and Expo(ICME),Tokyo,Japan:IEEE Computer Society,2001.721-724.
    [50]Xie L,Chang S F.Structure Analysis of Soccer Video With Hidden Markov Models.ICASSP'02:The 27th IEEE International Conference on Acoustics,Speech,and Signal Processing,Orlando,Florida,USA:IEEE Signal Processing Society,2002.13-17.
    [51]Zhong D,Chang S F.Structure analysis of sports video using domain models.ICME'01:Proceedings of 2001 IEEE International Conference on Multimedia and Expo(ICME),2001.713-716.
    [52]Sudhir G,Lee J C,Jain A K.Automatic Classification of Tennis Video for High-level Content-based Retrieval.CAIVD '98:Proceedings of the 1998 International Workshop on Content-Based Access of Image and Video Databases(CAIVD),Washington,DC,USA:IEEE Computer Society,1998.81-90.
    [53]Li B,Sezan M I.Event detection and summarization in sports video.CBAIVL'01:IEEE Workshop on Content-Based Access of Image and Video Libraries(CBAIVL),Kauai,HI,USA,2001.132-138.
    [54]Yan R,Hauptmann A G.A review of text and image retrieval approaches for broadcast news video.Information Retrieval,2007,10(4-5):445-484.
    [55]Hauptmann A,Yan R,Lin W H,et al.Can High-Level Concepts Fill the Semantic Gap in Video Retrieval? A Case Study With Broadcast News.IEEE Transactions on Multimedia,2007,9(5):958-966.
    [56]Snoek C G M,Huurnink B,Hollink L,et al.Adding Semantics to Detectors for Video Retrieval.IEEE Transactions on Multimedia,2007,9(5):975-986.
    [57]Snoek C G M,Worring M,Koelma D C,et al.A Learned Lexicon-Driven Paradigm for Interactive Video Retrieval.IEEE Transactions on Multimedia,2007,9(2):280-292.
    [58]Snoek C G M,Worring M,Geusebroek J M,et al.The Semantic Pathfinder:Using an Authoring Metaphor for Generic Multimedia Indexing.IEEE Transactions on Pattern Analysis and Machine Intelligence,2006,28(10):1678-1689.
    [59]Snoek C G M,Worring M,Hauptmann A G.Learning Rich Semantics from News Video Archives by Style Analysis.ACM Transactions on Multimedia Computing,Communications and Applications,2006,2(2):91-108.
    [60]Smeaton A F,Over P,Kraaij W Evaluation campaigns and TRECVID.MIR '06:Proceedings of the 8th ACM International Workshop on Multimedia Information Retrieval,New York,NY,USA:ACM Press,2006.321-330.
    [61]O'Connor N,Duffy T,Ferguson P,et al.A Content-based Retrieval System for UAV-like Video and Associated Metadata.SPIE Defence and Security Conference,2008.
    [62]Over P,Smeaton A F,Kelly R The TRECVID 2007 BBC rushes summarization evaluation pilot.TVS '07:Proceedings of the international workshop on TRECVID video summarization,New York,NY,USA:ACM Press,2007.1-15.
    [63]L.Hollink,G.RNguyen,D.Koelma,et al.Assessing user behaviour in news video retrieval.IEEE Proceedings on Vision,Image and Signal Processing,2005.911-918.
    [64]Hsu W H,Hsu W H,Kennedy L S,et al.Video search reranking through random walk over document-level context graph.MULTIMEDIA'07:Proceedings of the 15th International conference on Multimedia,New York,NY,USA:ACM,2007.971-980.
    [65]Hauptmann A,Yan R,Lin W H.How many high-level concepts will fill the semantic gap in news video retrieval? CIVR '07:Proceedings of the 6th ACM international conference on Image and video retrieval,New York,NY,USA:ACM,2007.627-634.
    [66]丁大勇.信息融合技术的研究及其在视像内容分析中的应用[博士学位论文].北京:清华大学,2007.
    [67]Klinkenberg R,Joachims T.Detecting Concept Drift with Support Vector Machines.ICML'00:Proceedings of the 17th International Conference on Machine Learning,San Francisco,CA,USA:Morgan Kanfmann,2000.487-494.
    [68]Mitchell T M,Caruana R,Freitag D,et al.Experience with a learning personal assistant.Communications of the ACM,1994,37(7):80-91.
    [69]Lanquillon C.Dynamic neural classification[M].Gernmany:Computer Science Department,Universitat Braunschweig,Germany,October,1997.
    [70]Kukar M.Drifting Concepts as Hidden Factors in Clinical Studies.In:Dojat M,Keravnou E T,Barahona P,(eds.).AIME'2003:Proceeding of the 9th Conference on Artificial Intelligence in Medicine in Europe,Protaras,Cyprus:Springer,2003.355-364.
    [71]Klinkenberg R.Predicting phases in business cycles under concept drift.Proceedings of GI workshop LLWA 2003:Lehren - Lernen - Wissen - Adaptivitaet(teachings-learning-knowledge -Adaptivitaet),2003.3-10.
    [72]Taylor C,Nakhaeizadeh G,Lanquillon C.Structural change and classification.Workshop Notes on Dynamically Changing Domains:Theory Revision and Context Dependence Issues,9th European Conference on Machine Learning(ECML '97),Prague,Czech Republic,1997.67-78.
    [73]Klinkenberg R.Using labeled and unlabeled data to learn drifting concepts.In:Kubat M,Morik K,(eds.).Workshop notes of the IJCAI-01 Workshop on Learning from Temporal and Spatial Data,Menlo Park,CA,USA:AAAI Press,2001.16-24.
    [74]Active Learning.Available at:http://www.active-learning-site.com/bibl.htm.
    [75]Helmbold D P,Long P M.Tracking Drifting Concepts By Minimizing Disagreements.Machine Learning,1994,14(1):27-45.
    [76]Tsymbal A.Dynamic Integration of Classifiers for Tracking Concept Drift in Antibiotic Resistance Data.Technical Report TCD-CS-2005-26,Department of Computer Science,Trinity College,Dublin,Ireland,2005.http://www.cs.tcd.ie/publications/techreports /reports.05/TCD-CS-2005-26.pdf.
    [77]Koychev I.Learning about user in the presence of hidden context.Proceedings of Machine Learning for User Modeling(UM-2001 Workshop),2001.
    [78]Littlestone N,Warmuth M K.The Weighted Majority Algorithm.Information and Computation,1994,108(2):212-261.
    [79]Vovk V G.Aggregating strategies.COLT '90:Proceedings of the third annual workshop on Computational learning theory,San Francisco,CA,USA:Morgan Kaufmann Publishers Inc.,1990.371-386.
    [80]Cesa-Bianchi N,Fretmd Y,Helmbold D P,et al.How to use expert advice.STOC '93:Proceedings of the twenty-fifth annual ACM symposium on Theory of computing,New York,NY,USA:ACM Press,1993.382-391.
    [81]Kolter J Z,Maloof M A.Using additive expert ensembles to cope with concept drift.ICML '05:Proceedings of the 22nd international conference on Machine learning,New York,NY,USA:ACM Press,2005.449-456.
    [82]Tsymbal A.The problem of concept drift:definitions and related work.Technical Report TCD-CS-2004-15,Department of Computer Science,Trinity College,Dublin,Ireland,2004.http://www.cs.tcd.ie/publications/tech-reports/reports.04/TCD-CS-2004-15.pdf.
    [83]Yang J,Yan R,Hauptmann A G.Cross-domain video concept detection using adaptive svms.MULTIMEDIA '07:Proceedings of the 15th international conference on Multimedia,New York,NY,USA:ACM,2007.188-197.
    [84]Wang H,Yin J,Pei J,et al.Suppressing model overfitting in mining concept-drifting data streams.KDD '06:Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining,New York,NY,USA:ACM,2006.736-741.
    [85]吴俊.体育视像的结构分析[硕士学位论文].北京:清华大学计算机系,2003年.
    [86]Zhang L,Lin F,Zhang B.A CBIR method based on color-spatial feature.TENCON'99:Proceedings of the IEEE Region 10 Annual International Conference,Cheju Island,South Korea,1999.166-169.
    [87]Kuh A,Petsche T,Rivest R L.Learning time-varying concepts.NIPS-3:Proceedings of the 1990 conference on Advances in Neural Information Processing Systems 3,San Francisco,CA,USA:Morgan Kaufmann,1990.183-189.
    [88]Kullback S.Information theory and statistics.New York,USA:John Wiley and Sons,1959.
    [89]TREC Video Retrieval Evaluation.Available at http://www-nlpir.nist.gov/projects/trecvid/.
    [90]TREC Video Retrieval Evaluation Past Data.Online resources.Available at:http://www-nlpir.nist.gov/projects/trecvid/trecvid.data.html.
    [91]Huang J,Kumar S R,Mitra M,et al.Image Indexing Using Color Correlograms.CVPR'97:Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition,Washington,DC,USA:IEEE Computer Society,1997.762-768.
    [92]LIBSVM-A Library for Support Vector Machines,2007.Department of Computer Science,National Taiwan University,Taipei,Taiwan.Availabe at:http://www.csie.ntu.edu.tw/cjlin/libsvm/.
    [93]Figueiredo M,Leitao J M N,Jain A K.On Fitting Mixture Models.In:Hancock E,Pellilo M,(eds.).Energy Minimization Methods in Computer Vision and Pattern Recognition,New York,NY,USA:Springer-Verlag,1999.54-69.
    [94]West M,Harrison J.Bayesian Forecasting and Dynamic Models(2nd ed.).Springer-Verlag Statistics,New York,NY,USA:Springer Verlag,1997.
    [95]Raaijmakers S,Hartog J,Baan J.Multimodal topic segmentation and classification of news video.ICME'02:Proceedings of 2002 IEEE International Conference on Multimedia and Expo(ICME),2002.33-36.
    [96]Lewis D D.Evaluating text categorization.HLT'91(Human Language Technology Conference):Proceedings of the workshop on Speech and Natural Language,Morristown,NJ,USA:Association for Computational Linguistics,1991.312-318.
    [97]TREC Video Retrieval Evaluation Tools.Online resources.Available at:http://www-nlpir.nist.gov/projects/trecvid/trecvid.tools/.
    [98]TREC-10 Proceedings appendix on common evaluation measures,2003.Search Evaluation,Section 5.4,Guidelines for the TRECVID 2003 Evaluation.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700