基于内容的视频数据库多模式检索方法研究

英文题名：Study on Multimodel Retrieval Method of Content-Based Video Database
作者：吕凝
论文级别：博士
学科专业名称：通信与信息系统
中文关键词：基于内容 ; 多模式 ; 语义特征 ; 视频检索 ; 主动网络
英文关键词：content based ; multimodel ; semantic feature ; video retrieval ; active network
学位年度：2005
导师：陈贺新
学科代码：081001
学位授予单位：吉林大学
论文提交日期：2005-06-01
答辩委员会主席：杨华民

摘要

本文提出的多模式视频检索方法,是从视频语义特征的角度构建视频数据的语义特征库,将与视频语义相关的声音、字幕、音乐、剧情脚本、新闻文稿等信息特征进行整合,以人像、字幕、语音、视频镜头识别和剧情脚本分析的组合技术,建立视频数据语义特征的多模式提取模型,将语音识别引擎、OCR引擎集成在检索平台中。本文提出应用语音与音乐的临界点为场景的分割点,以说话人音色变化的临界点为镜头的分割点。
    本文提出的利用剧情脚本中的描述信息与字幕、语音、人像提取的特征匹配的方法,来实现对视频数据诸如人物名称、台词内容、主演人的检索。并根据剧中的代表场次图像帧,利用文献[46]改进的最近特征线法(nearestfeature line,简称NFL)算法对镜头进行基于内容的检索具有一定的创新性。
    在视频数据流描述的模型建立、同期化、压缩及安全机制方面,本文给出了空间性、时间性、描述多样性的基于四维矩阵的运动影像与音频数据的表示形式。把视频流看作是图像与音频数据以一种持续的密切结合的形式组成的数据流整体。这种表示形式提高了数据压缩率和视频(或多媒体)数据库系统及播放系统的QoS。并提出在主动网络体系结构下解决视频数据流的网络安全机制的方法。
contents based Video data retrieve is more and more popular in thecomputer multi-media domain , its methods include various techniquestrategies such as image retrieval , key frame retrieval , graphics retrieval ,audio retrieval and etc . Most of these strategies get retrieve results in theway of extracting feature description from video data . Of all strategiesmentioned above , they can be divides into three kinds of methods . One ismanual selection or providing feature descriptions , that is to say selectingretrieval keywords to retrieval system from various objective attribute orfeatures description library in video data . And the system will carry onretrieving operations according to the designed retrieval strategies .Another method is that to extract the physics feature from the image of videodata . It carris on the retrieving operations by user's selections , and makinguse of describable contents from color, texture, shape, motion of videocamera . The last method is that to extract logic features from video data .
    Currently , most methods of video data retrieving contains only oneprocessing technique , such as image retrieval , key frame retrieval , tonecolor or tone techniques , but there are few retrieval technologies based onsemantic contents in video data . The research that make use of video dataintegrate with ralated data of other forms , and carry on mutli-patternretrieving in a safe and efficient network environment is rare seen . So thisresearch give you a retrieval methods that foucs on the strategy above ,under the standard framework of MPEG-7 , it can build the speech soundfeature library of video data , which according to contents of video audioinformation and any other related multimodel data information .
    According to the multimodel video retrieval technique in this paper , wecan build a video data semantic-feature library from the point of view ofsemantic feature.Including audio ,subtitle ,music, plot script , news draft andother related information together with portrait, subtitle, speech sound ,technologies of video recognition and script analysis , we can create amultimodel model that extracting semantic feature of video data .Theretrieval platform also contains speech recognition engine and OCR engine .By video data information characteristics of subject meeting , television
    news ,teleplay and movies , this paper provide a approach that taking thecritical point of speech sound or music as the scene subdivision ,and takingthe critical point of changes in tone color as the subdivision of videosegment .By feature extraction of the tone color or other audio ,we cansubdivide scene , audio frequency analysis is also helpful for video scenesubdivision . Make use of the speech sound relativity measurement, talker'sself-adapting , selecting subtitle frame, the edge extracting , two valulized insubtitle area , small wave packet decode ,the kernel function technique andso on , together with other related knowledge library , we can create videodata comment feature library that based on semantic comments andsubdivision of video scene. This paper also provide some original methods , the retrievaloperation such as people's name, actor's lines and actors can beimplemented by the way of feature match that make use of descriptioninformation of script , subtitle, speech sound ,portrait analyse . We canexecute the content-based retrieval operations according to therepresentative key image frame under the nearest feature line algorithm ,which is mentioned in literature [46] . The research in this paper based on semantic feature extractionimplement the access and retrieval operations toward video data , whichprovide the method of building the data feature library with descriptionstructure , and the library can be created by automatic , semiautomatic ormanual manipulation .Then the video data feature library framework that wediscuss becomes a traditional relation model . One characteristic of thispaper is that the content of video data and feature description library isseparate when designning the structure of video feature library .The contentof multimodel video feature library is a constraint condition that a retrievalsystem carries on various operations . Video data semantic feature librarytakes content and time constraint as checking conditions when it is built . Concerning construction , synchronization and compression of videostream description model ,this paper describes the presentation of motionimage and audio data in 4-D matrix form that include spatiality ,temporalityand presentation requirements featrues . We can view a video stream as theintegration of image and audio interwoven in a temporally close-coupledfashion .The presentation can be used to improve the compression rate andthe QoS for video ( or multimedia ) database systems and transmissionsystems . Safe is a very important factor when video stream is transmitted in thenetwork , so if we want to have an efficient video data retrieval system , we
    should build a safe network enviroment . Currently , the main factors thataffect video stream application research are as follows : usability of thecontents ,the processing ability of the terminal , network performance ,characteristics of Consumer , the natural environment of the consumer :availability of bandwidth ,error rate, contents scale, adaptability, interactionand etc . The active network architecture is efficaciously solution for thisproblem . In an active network , some operations are transplanted to everynode , so DoS/DDoS ( Denial-of-Service/Distributed Denial-of-Service )packets can be discriminated and discarded . Upperstream nodes innetwork could discard these useless packets by the notify fromdownstream nodes , in this way the flow of useful packets can get morestream bandwidths . This paper provide you a system framework thatdiscriminate and control distributed attack and its implement strategy . Thissystem framework is build up on the active network enviroment , mainlyinclude three parts : automatic authentication and control strategy based ongathering architecture , active notify and track strategy based on gatheringarchitecture , control and cooperates strategy based on management domainenviroment . The thesis researching goal is to improve the practicability, efficiency,security, stability and interactivity of content-based video database multimoderetrieval methods. The thesis’s major contribution and innovation are asfollows: 1.Under the security safeguards mechanism of the network performance,the thesis advances the multimode semantic interpretation-based videodatabase retrieval methods. According to the systematic analysis to datastructure feature of video, audio and text etc., this retrieval method is putforward after summarizing the correlativity among them. It saves the retrievaltime and storage space, reduces the communication cost, and correlated withall kinds of data. 2 . According to the technical analysis and combination of imageprocessing, audio processing and text processing, the thesis establishes thevideo semantic extracting model. This model offers the convenience for themultimode alternate notes, moreover, correlates all kinds of complex dataattributes so as to retrieve the relevant data efficiently. 3.According to the analysis and extraction of the feature information,such as, caption, speech, portrait and script etc., highly correlated with videodata’s lexeme, the thesis advances the multimode alternate note methods

引文

[1] M. T. Ozsu, P. Iglinski, D. Szafron, S. Ei-Medani and M. Junghanns, “An Object-Oriented SGML/HYTIME Compliant Multimedia Database Management System”, ACM Multimedia 97 pp.239 –249 Seattle Washington USA 1997.
    [2] L. Rutledge, L. Hard man and J. van Ossenbruggen, Evaluating SMIL: three user case studies; Proceedings of the seventh ACM international conference (part 2) on Multimedia 1999,pp.171 –174, Orlando FL USA 1999.
    [3] S. Adali, M. L.Sapino and V. S. Subrahmanian: “A Multimedia Presentation Algebra”Proceedings of 199 ACM SIGMOD international conference on Management of data. Pp.121 –132, May 31 –June 3 1999, Philadephia PA USA.
    [4] J. Z. Li,”Modeling and Querying Multimedia Data (Synchronization)”Phd 1998, Univ. of Alberta (Canada).
    [5] S. Radev: “Spatio-Temporal Synchronization and Semantic Modling for Video and Multimedia Database Systems”, Phd 1998, Univ. of Southwestern Louisiana.
    [6] P. Hoschka (ed.), “Synchronized Multimedia Integration Language”, World Wide Web Consortium Recommendation. June 1998. Available: HTTP: http://www.w3.org/TR/REC-smil
    [7] Lee Garber. Denial-of-Service Attacks Rip the Internet. IEEE Computer. April 2000.
    [8] Raul Mahajan and Sally Floyd. Controlling High-Bandwidth Flows at the Congested Router. November 2000.
    [9] Vern Paxson. An analysis of using reflectors to defeat DoS traceback. Ausust 2000.
    [10] Robert Stone. CenterTrack: An IP Overlay Network for Tracking DoS Floods. August 2000.
    [11] D.L.Tennenhouse, J.M.Smith, W.David Sincoskie, David J.Wetherall and Gary J.Minden, A Survey of Active Network Research. IEEE Communications Magazine, January 1997.
    [12] V.Srinivasan and Geroge Varhese. Faster IP Lookups using Controlled Prefix
    [13] Niblack, W et al.“The QBIC project: querying images by color, texture and shape”. IBM Research Report RJ-9203.1993
    [14] Sutcliffe, A et al.“Empirical studies in multimedia information r etrieval”.Intelligent Multimedia Information Retrieval.AAAI Press.1997.Menlo Park,CA.
    [15] Swain.M J ,Ballard, D H. “Color indexing”. International Journal of Computer Vision. 1991. 7(1). P11-32
    [16] Vellaikal.A, Kuo, C C J. “Hierarchical clustering techniques for image database organization and summarization”. Multimedia Storage and Archiving Systems III,Proc SPIE 3527.1998.P 68-79
    [17] R.M.Bolle,B.L.Yeo,M.M. Yeung.“Video query: Research directions”.IBM Journal of Research and Development.1998.42(2).P233–252.
    [18] R.Brunelli,O.Mich,C.M.Modena.“A survey on the automatic indexing of video data”. Journal of Visual Communication and Image Representation. 1999.10(2)P78-112.
    [19] Jose M,Martinez. “Overview of the MPEG7 standard”. Research Report.N4031,ISO/IEC JTC1/SC29/WG11.Singapore,SG.2001
    [20] Howard D.Wactlar,M.G. Christel,Y. Gong,A. G.Hauptmann.“Lessons learned f rom building a Terabyte digital video library”,IEEE Computer,1999.42(2).P66-73
    [21] Howard D.Wactlar et al. “Intelligent access to digital video: the Informedia project”. IEEE Computer 1996.29(5).P46-52
    [22] M.Brown,J.Foote,G.Jones,K.Sparck-Jones,S.Young.“Automatic content-based retrieval of broadcast news”.ACM Multimedia.1995.SanFrancisco,USA.
    [23] M.Bertini,A.Del Bimbo,P.Pala.“Content-based indexing and retrieval of TV news”.Pattern Recognition Letters.2001.22(5),P503–516
    [24] W. Zhu, C. Toklu, S.-P. Liou. “Automatic news video segmentation and categorization based on closed-captioned text”. IEEE International Conference on Multimedia and Expo. 2001. Tokyo, Japan.
    [25] J.Vendrig,M.Worring.“Evaluation measurement for logical story unit segmentation in video story sequences”,Technical Report.Intelligent Sensory Information Systems Group,University of Amsterdam.2001
    [26] A.G.Hauptmann,M.A.Smith,“Text, speech, and vision for video segmentation:The Informedia Project”.Fall Symposium on Computer Models for Integrating Language and Vision.1995.
    [27] M. Cai, J.Q. Song and M.R. Lyu,"A New Approach for Video Text Detection". International Conference On Image Processing. 2002.Rochester, New York, USA
    [28] Howard D.Wactlar,”New Directions in VideoInformation Extraction and summarization”,the 10th DELOS Workshop.1999.Sanorini, Greece
    [29] Howard D.Wactlar.”Multi-Document Summarization and Visualization in the Informedia digital Video Library”.New Information Technology 2001 Conference.2001.Tsinghua University,Beijing
    [30] M. A. Smith and T. Kanade,“Video skimming and characterization through the combination of image and language understanding techniques”.IEEE Computer Vision and Pattern Recognition.1997.P775-781
    [31] Multimedia Description Schemes group. Text of 15938-5 FCD Information Technology-Multimedia Content Description Interface-Part 5 Multimedia Description Schemes. ISO/IEC JTC 1/SC29/WG11/N3966. Singapore, March 2001.
    [32] Sato, T., Kanade, T., Hughes, E.K., Smith, M.A.. Video OCR for Digital News Archives. IEEE International Workshop on Content-Based Access of Image and video Databases, January,1998, pp.52 -60.
    [33] Agnihotri, L., Dimitrova, N.. Text Detection for Video Analysis. Workshop on Content Based Image and Video Libraries, held in conjunction with CVPR, Colorado, pp.109-113, 1999.
    [34] Sobottka, K., Bunke, H., Kronenberg, H.. Identification of Text on Colored Book and Journal Covers. Proceedings of the Fifth International Conference on Document Analysis and Recognition,1998.
    [35] Shim, J. C., Dorai, C., Bolle, R.. Automatic Text Extraction from Video for Content-based Annotation and Retrieval. In Proc. of 14th Int. Conf. on Pattern Recognition(ICPR), pp. 618-620, 1998.
    [36] Zhong, Y., Zhang, H.J., Jain, A. K.. Automatic Caption Localization in Compressed Video. IEEE Int. Conf. on Image Processing, 1999.
    [37] Wu, V.,Manmatha, R., Riseman, E.. Automatic Text Detection and Recognition. In Proceedings of Image Understanding Workshop, pp. 707--712, 1997.
    [38] Yeo, B. L., Liu, B.. Visual Content Highlighting via Automatic Extraction of Embedded Captions on MPEG Compressed Video. SPIE Digital Video Compression: Algorithms and Technologies, Vol.268, Feb. 1996, pp.38-47.
    [39] Wang, Wei-qiang, Gao, Wen. A fast caption detection algorithm on MPEG compressed video and its application. Journal of Computers (in Chinese). Vol.24, No.6, pp. 620-626. 2001.
    [40] Zhang, Yin, Pan, Yun-he. Design of a new color edge detector for text extraction under complex background. Journal of Software (in Chinese). Vol.12, No. 8, pp.1229-1235, 2001.
    [41] Linde, Y., Buzo, A., Gray, R. M.. An Algorithm for Vector Quantizer Design. I EEE Trans. On Communications 28(1): 84-95, January 1980.
    [42] Wang,Wei-qiang, Gao, Wen. Automatic segmentation of news items based on video and audio features. Proceedings of the Second IEEE Pacific Rim Conference on Multimedia. October 2001, pp.498-505.
    [43] 肖冬梅:垂直搜索引擎研究,图书馆学研究,2003(2)
    [44] 张卫丰,徐宝文等:元搜索引擎结果生成技术研究,小型微型计算机系统2003
    [45] 林通,张宏江,封举富,石青云.镜头内容分析及其在视频检索中的应用.软件学报,2002,13(8):1577~1585.
    [46] 赵黎,祁卫,李子青,杨士强,张宏江.利用改进NFL 算法对镜头进行基于内容的检索.软件学报,2002,13(4):586~590.
    [47] 庄越挺,刘小明,吴翌,潘云鹤.通过例子视频进行视频检索的新方法. 计算机报,2000,23(3):300~305.
    [48] 李国辉,曹莉华,柳伟:《基于内容的多媒体数据查询和检索》小型微型计算机系统1998.4
    [49] 朱学芳:《多媒体信息处理与检索技术》电子工业出版社2002.11
    [50] 柳沼良知,坂内正夫:《感性关键字在电视剧视频检索中的应用》日本电子信息通信学会1995 年春季大会1995.3
    [51] 俞铁城, 周健来, 宋岩涛: 基于神经网络/隐马尔可夫模型的混合语音识别方法的研究现状. 第五届全国人机语音通讯学术会议论文集, 哈尔滨, 1998 年7 月. 18~21
    [52] 袁春,陈意云:约束检查的最弱及增量前条件方法计算机研究与发展第40 卷第7 期2003 年7 月
    [53] 邱洋,岳昆:利用缓存优化关系数据的XML 发布计算机研究与发展第 41 卷第10 期2004 年10 月
    [54] 王伟强,高文. 一种压缩域上的快速标题文字探测算法及其应用. 计算机学报, 2001,第24 卷第六期,pp.620-626.
    [55] 张引,潘云鹤. 复杂背景下文本提取的彩色边缘检测算子设计. 软件学报, 2001,第12 卷第8 期,pp.1229-1235
    [56] 杨友庆,高隽,鲍捷基于视频的字幕检索与提取计算机报, 2000,23(3): 318~323
    [57] Mika,S.,G. R?tsch,J. Weston,B. Sch?lkopf and K.-R.Müller, “Fisher discriminate analysis with kernels”,Neural Networks for Signal Processing IX 41-48 IEEE(1999)
    [58] Mika,S.,B. Sch?lkopf,A. Smola,K.-R.Müller,M. Scholz and G. R?tsch,“Kemel PCA and de-noising in feature spaces”,Advance in Neural Information Processing System 11,Cambridge,MA 536-542 MIT Press.(1999)
    [59] S.Mika,G. R?tsch,K.R. Müller,“A mathematical programming approach to the kernel fisher algorithm”,Advances Neural Inform Processing System,13,2001
    [60] Young.S J. The HTK Book [M]. Version 2.1, 1997.72-75.
    [61] Hsu, P.R., Harashima, H. Detecting scene changes and activities in video databases. In: Proceedings of the ICASSP’94 IEEE International Conference on Acoustics, Speech, and Signal Processing. 1994. 33~36.
    [62] Lee K, Hon H, Reddy D. An Overview of the SPHINX Speech Recognition System. IEEE Trans. Acoustics, Speech, Signal Proc., 1990, 38: 600~610
    [63] Lee K, Automatic Speech Recognition . the Development of the SPHINX System. Boston Kluwer Academic Publishers, 1989
    [64] Joe Tebelskis, .Speech Using Neural Networks., CMU-CS-95-142, May 1995.
    [65] M. Cohen, H. Franco, N. Morgan, et al. Combining Neural Networks and Hidden Markov Models for Continuous Speech Recognition, Proceedings of the DARPA Speech and Natural Language Workshop, Harriman, NY, 1992.
    [66] H. Bourlard, N. Morgan. Continuous Speech Recognition: A Hybrid Approach. Kluwer Academic Publishers, 1994 17(1): 127~136
    [67] Huang X, Acero A, Alleva F, et al. Microsoft Windows Highly Intelligent Speech Recognizer: Whisper. In: IEEE, eds. Proc. ICASSP-95. New York: IEEE, 1995, I: 93~96
    [68] Hwang M, Rosenfeld R, Thayer E, et al, Improving Speech Recognition Performance viaPhone-dependent VQ codebooks and Adaptive Language Models in SPHINX-II. In: IEEE,eds. Proc. ICASSP-94. Adelaide: IEEE, 1994, I: 549~552
    [69] 朱凌云(责任编辑). 语音识别. 个人电脑. 2000, 6(2): 81~91
    [70] Chris J Leggetter. Improved Acoustic Modeling for HMMs Using Linear Transformations:[PhD Thesis]. University of Cambridge, February 1995. 80~137
    [71] 杨行峻, 迟惠生等. 语音信号数字处理. 北京: 电子工业出版社. 1995. 334～335.
    [72] 戴礼荣. 人机语声对话特点及系统设计. NCMMSC-96, 1996. 22~26
    [73] X.D.Huang, A.Acero, H.Hon, et al. Spoken Language Processing. New Jersey: Prentice Hall PTR, 2000, 401~403, 429~437
    [74] Ron Cole, Lynette Hirschman, et al, The Chanllenge of Spoken Language Systems:Research Directions fro the Nineties. IEEE Trans. on Speech and Audio Processing, 1995,3(1): 1~7
    [75] 张帆. 语音识别话者自适应谱变换分块变换: [硕士学位论文]. 北京: 清华大学电子工程系, 1997
    [76] F. Macro, M.M. Anna. Fast Speaker Adaptation: Some Experiments on Different Techniques for Codebook Adaptation and HMM Parameters Estimation. In: IEEE, eds.Proc. ICASSP. May 1991. 849~852
    [77] B.S. Atal. Automatic Recognition of Speakers from Their Voices. Proc. IEEE, 1976, 64(4):460~475
    [78] F. Fallside, W.A. Woods. Computer Speech Processing. Prentice-Hall, London, 1985
    [79] F. Nolan. The Phonetic Bases of Speaker Recognition. Cambridge University Press,Cambridge, 1983
    [80] 牛小川, 徐波. 说话人自适应策略与方法的研究与实验. 第五届全国人机语音通讯会议论文集, 哈尔滨. 1998. 181~186
    [81] Zheng Rong, Wang Zuoying. Speaker Adaptation: An Overview. Chinese Journal of Electronics, 1998, 7(2): 121~127
    [82] Qiguang Lin, Chiwei Che. Normalizing the Vocal Tract Length for Speaker IndependentSpeech Recognition. IEEE Signal Processing Letters, 1995, 2(11): 201~203
    [83] 陈景东, 徐波, 黄泰翼. 基于Mellin 变换的语音新特征与说话人自适应技术的比较.第五届全国人机语音通讯会议论文集, 哈尔滨. 1998. 86~91
    [84] Chen Hingdong, Xu Bo, Huang Taiyi. A New Speech Feature Insensitive to the Variation of Different Speakers. Chinese Journal of Electronics, 1999, 8(1): 67~72
    [85] Chen Hingdong, Xu Bo, Huang Taiyi. A Novel Robust Speech Feature Based on t he Mellin Transform and Speaker Normalization. Proc. ISCSLP98. Singapore: 1998. 191~195
    [86] A.Imamura. Speaker-Adaptive HMM-Based Speech Recognition with A Stochastic Speaker Classifier. In Proc. IEEE Int. Conf. Acoustic, Speech, Signal Proc., 1991, 841~844
    [87] L. Mathan, L. Miclet. Speaker Hierarchical Clustering for Improving Speaker-Independent HMM Word Recognition. In: IEEE, eds. Proc. ICASSP90. 1990, 149~152
    [88] T. Kosaka, S. Sagayama. Tree-Structured Speaker Clustering for Fast Speaker Adaptation.In: IEEE, eds. Proc. ICASSP94, 1994, 1: 245~248
    [89] Chin-Hui Lee. Learning from Surprises-Statistics and Speech/Speaker Recognition. Speech Lab 20th Anniversary Celebration, Tsinghua University, Beijing, 1999
    [90] Y. L. Chow, et al. BYBLOS: The BBN Continuous Speech Recognition System. In: IEEE,eds. Proc. ICASSP87. 1987, 89~92
    [91] G. Rigoll. Speaker Adaptation for Large Vocabulary Speech Recognition Systems using Speaker Markov Models. In: IEEE, eds. Proc. ICASSP89. 1989, 5~8
    [92] C.J. Leggetter, P.C. Woodland. Maximum Likelihood Linear Regression for Speaker Adaptation of Continuous Density Hidden Markov Model. Computer Speech and Language,1995, 9:171~185
    [93] C.J. Leggetter, P.C. Woodland. Speaker Adaptation of HMMs Using Linear Regression.Technical Report CUED/F-INFENG/TR181, Cambridge Univ., Jun. 1994
    [94] Heidi Christensen. Speaker Adaptation of Hidden Markov models using Maximum Likelihood Linear Regression: [MSc Thesis]. Aalborg University, Jun. 1996
    [95] M.J.F. Gales, P.C. Woodland. Mean and Variance Adaptation within the MLLR Framework.Computer Speech and Language, 1996, 10:249~264
    [96] Rui Y, Thomas S H, and Chang S F. “Image Retrieval:Past,Present,and Future.”Symposium on Multimedia Information Processing,Dec 1997.
    [97] Wang J Z, Li J, Chan D, Wiederhold G. “Semantics-sensitive Retrieval for Digital Picture Libraries”D-Lib Magazine 1999,5(11)
    [98] Eakins J P. “Automatic image content retrieval -are we getting anywhere ?”In Proc. of Third International Conference on Electronic Library and Visual Information Research, pages 123--135, May 1996.
    [99] Gudivada V N, Raghavan V V. “Content-based Image Retrieval System”. IEEE Computer 1995,28(9),18-22
    [100] AI-Khatib W, Day Y F, Ghafoor A, and Berra B. “Semantic Modeling and Knowledge Representation in Multimedia Databases”IEEE Transactions on Knowledge And Data Engineering,Vol.11, No.1 Jan/JFeb 1999
    [101 Jain A K, Zhong Y, and Lakshmanan S. “Object Matching Using Deformable Templates”IEEE Trans. Pattern Analysis and Machine Intelligence, vol.18, no.3,pp.267-278,Mar.1996
    [102] Hermes T et al. "Image retrieval for information systems" in Storage and Retrieval for Image and VideoDatabases III (Niblack, W R & Jain, R C, eds), Proc SPIE 2420, pp 394-405,1995
    [103] Voorhees E. "Using WordNet to Disambiguate Word Senses for Text Retrieval", in Proc. 16th Annual ACM SIGIR Conference on Research and Development in Information Retrieval, Pittsburgh, pp. 171-180. 1993
    [104] Zhuang Y, Mehrotra S, and Huang T S. “A multimedia information retrieval model based on semantic and visual content”In Proceedings of the 5th International ICYCS Conference, Nanjing, China, 1999. 6
    [105] Colombo C, DelBimbo A, and Pala P. “Semantics in visual information retrieval”IEEE Multimedia, 6(3):38--53, 1999
    [106] Meghini C, Sebastiani F, and Straccia U. “The Terminological Image Retrieval Model”In Proceedings of ICIAP'97, 9th International Conference On Image Analysis And Processing, volume II, pages 156--163, Florence, I, September 1997
    [107] Cavazza M, Green R J and Palmer I J. “Multimedia Semantic Features and Image Content Description”, In Proceedings of the 1998 MultiMedia Modeling
    [108]Reguirements Group, MPEG-7 requirements document, ISO/IEC
     JTC1/SC29/WG11 MPEG99/N2727, Seoul, Korea, March 1999
    [109] Jean-Luc Gauvain, Chin-Hui Lee. Maximum a Posteriori Estimation for MultivariateGaussian Mixture Observations of Markov Chains. IEEE Trans. on Speech and Audio Proc.,1994, 2(2): 291~298
    [110] C-H. Lee, C-H. Lin, B-H. Juang. A Study on Speaker Adaptation of the Parameters of Continuous Density Hidden Markov Models. IEEE Trans. on Signal Proc., 1991, 39(4):806~814
    [111] Seyed Mohammad Ahadi-Sarkani, Bayesian and Predictive Techniques for Speaker Adaptation: [PhD Thesis]. Cambridge Univ., 1996
    [112] 李虎生, 杨明杰, 刘润生. 汉语数码语音识别自适应算法. 电路与系统学报, 1999, 4(2):1~6
    [113] Qiang Huo. Adaptive learning and Compensation of Hidden Markov Model for Robust Speech Recognition. Proc. ISCSLP98. Singapore: 1998. 31~43
    [114] Qiang Huo, Chin-Hui Lee. On-Line Adaptive Learning of Continuous Density Hidden Markov Model Based on Approximate Recursive Bayes Estimate. IEEE Trans. on Speech and Audio Proc., 1997, 5(2): 161~171
    [115] Qiang Huo, Chorkin Chan, Chin-Hui Lee. On-Line Adaptation of SCHMM Parameters Based on the Segmental Quasi-Bayes Learning for Speech Recognition. IEEE Trans. On Speech and Audio Proc., 1996, 4(2): 141~144
    [116] Qiang Huo, Chin-Hui Lee. On-Line Adaptive Learning of Correlated Continuous Density Hidden Markov Models for Speech Recognition. IEEE Trans. on Speech and Audio Proc.,1998, 6(4): 386~397
    [117] Jun-ichi Takahashi, Shigeki Sagayama. Vector-field-smoothed Bayesian Learning for Fast and Incremental Speaker/Telephone-channel Adaptation. Computer Speech and Language,1997, 11: 127~146
    [118] Masahiro Tonomura, Tetsuo Kosaka, Shoichi Matsunaga. Speaker Adaptation Based on Transfer Vector Field Smoothing Using Maximum a Posteriori Estimation. Computer Speech and Language, 1996, 10: 117~132
    [119] Arnheim R. “Art and Visual Perception: A Psychology of the Creative Eye”Regents of the University of California,Palo Alto, Calif.,1954
    [120] Itten J. “Art of Color (Kunst der Farbe)”,Otto Maier Verlag, Ravensburg,Germany, 1961
    [121] 王昱. 语音识别自适应技术的研究与实现[硕士学位论文]. 北京: 清华大学计算机科学与技术系, 2000
    [122] 罗希平,田捷等. “图像分割方法综述”. 模式识别与人工智能,1999,12(3):300-312
    [123] Meirav A and Michael S L. “IRUS:Image Retrieval Using Shape”. In IEEE Multimedia System’99
    [124] Smith J R and Chang S -F. “VisualSEEk: a fully automated content-based image query system”In Proc. ACM Intern. Conf. Multimedia (ACMMM), page 87-98, Boston, MA, November 1996
    [125] Belongie S, Carson C, Greenspan H and Malik J. “Color-and texture-based image segmentation using EM and its application to content-basedimage retrieval”. In Proc. Int. Conf. Comp. Vis., 1998.
    [126] Samadani R and Han C. “Computer-assisted extraction of boundaries from images”In Proc.SPIE Storage and Retrieval for Image and Video Databases,1994
    [127] Anil K J and Aditya V. “Image retrieval using color and shape”. Pattern Recognition, 29(8):1233--1244, August 1996
    [128] Ciocca G, Schettini R. “A relevance feedback mechanism for content-based image retrieval”. Information Processing & Management,Volume: 35, Issue: 5, September, 1999, pp. 605-632
    [129] Cox I J, Miller M L, Minka T P, Papathornas T V and Yianilos P N. "The Bayesian Image Retrieval System, PicHunter: Theory,Implementation, and Psychophysical Experiments" IEEE Tran. On Image Processing, Volume 9, Issue 1, pp. 20-37, Jan. 2000
    [130] Li W S, Candan K S, Hirata K, and Hara Y. “IFQ: A Visual Query Interface and Query Generator for Object-based Media Retrieval”.In Proceedings of the 1997 IEEE Multimedia Computing and Systems Conference, pp. 353-361, Ottawa, Ontario, Canada, June 1997
    [131] ISO/IEC JTC1/SC29/WG11, “Overview of the MPEG-7 Standard”, La Baule, October 2000
    [132] ISO/IEC JTC 1/SC 29/WG 11/N3705,“Multimedia Description Schemes”, October 2000, La Baule, FR
    [133] M. Abdel-Mottaleb, et.al. “MEPG-7: A Content Description Standard Beyond Compression”, inIEEE 42nd Midwest Symposium on Circuits and Systems, MWSCAS'99, 1999.
    [134] Reguirements Group, MPEG-7 DDL development document V.2, ISO/IEC JTC1/SC29/WG11 MPEG99/N2997, Melbourne, Australia,October 1999
    [135] 徐义芳,张金杰,姚开盛,曹志刚,王勇前“语音增强用于抗噪声语音识别”模式识别与人工智能,2001,12
    [136] Document Database System in A Distributed Environment. IEEE Trans. Multimedia. 4(2), 2002,215-234.
    [137]吕凝,陈贺新,肖军:《多媒体数据流描述模型及同步化实现》2003 年5 月吉林大学学报信息版第5 期
    [138] Ning lu,Hexin chen,Jun xiao:”AGAINST DOS ATTACKS USING NOTFY MECHANISM”Proceedings of the IASTED International Conference Communication Network and Infirmation Security December 10-12,2003 New York USA
    [139] Ninglu,Hexinchen,Jun xiao:”A UNICAST-BASED MODEL FOR MANY-TO MANY GROUP COMMUNICATION ”Proceedings of the IASTED International Conference Communication Network and Information Security December 10-12,2003 New York,USA
    [140] Ning lu,Hexin chen, Jun xiao:”VIDEO IMAGE AND AUDIO REPRESENTED IN 4-D MATRIX FOR VIDEO STREAMS”The 8th World Multi-Conference on Systemics, Cybernetics and InformaticsJuly18-21,2004~Orlando,lorida,USA
    [141] Ninglu,Hexin chen, Jun xiao: ”DEFENDING DOS/DDOS ATTACKS USING NETWORK NEW TECHNOLOGY”Proceedings of 16th International Conference On Computer Communication Sept 15-17,2004 Beijing CHINA
    [142] Jun Xiao, Hexin chen,Zhonghua Sun, Ning lu: ”THE DERFORMANCE EVALUATION OF A H.323-BASED SECURITY VIDEO CONFERNCING SYSTEM”Proceedings of the Second IASTED International ConferenceCOMMUNICATION AND COMPUTER NETWORKS November 8-10 2004,Cambridge,MA,USA
    [143] Jun xiao,Hexin chen, Ning lu:”SDCS:A SMART DISTRIBUTED CONTROL SYSTEM”The 8th World Multi-Conference on Systemics, Cybernetics and Informatics July18-21,2004~Orlando,Florida,USA
    [144] ISO/IEC JTC1/SC29/WG11, “Overview of the MPEG-7 Standard”, La Baule, October 2000
    [145] ISO/IEC JTC 1/SC 29/WG 11/N3705,“Multimedia Description Schemes”, October 2000, LaBaule, FR
    [146] M. Abdel-Mottaleb, et.al. “MEPG-7: A Content Description Standard Beyond Compression”, inIEEE 42nd Midwest Symposium on Circuits and Systems,

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700