中文视频问答系统

英文题名：Research on Chinese Vedio Question Answering
作者：刘艳芳
论文级别：硕士
学科专业名称：信号与信息处理
中文关键词：问答系统 ; 视频分割 ; 信息检索 ; 自然语言处理 ; 知网
英文关键词：Question Answering(QA) ; Information Retrieval(IR) ; HowNet ; Video segmentation ; Natural Language Processing(NLP)
学位年度：2007
导师：丁天昌 ; 封化民
学科代码：081002
学位授予单位：燕山大学
论文提交日期：2007-03-01

摘要

问答系统(QA)是允许用户输入一个自然语言形式的提问,通过检索,得到能够回答该问句的比较简短而准确的一个句子、摘要或者一个词。文本文档的问答系统研究已经取得了一定的进展。随着网络技术的发展,除了文本,网络上其他多媒体信息变得越来越重要,这对问答系统既是机遇也是一种挑战。视频是我们获捕外界信息的最有效媒体之一,因此本论文主要对新闻视频进行问答系统研究。在视频的各种特征中,脚本是最重要的且是比较容易得到的,况且,视频问答系统输入的是一个纯文本类型的问句,所以系统框架中主要运用的是通过自动语音识别(ASR)而得到的脚本特征。
     本论文提出了一种中文视频问答系统的框架。整个系统包括6个模块:视频分割、语音识别、问句分类、脚本检索、答案抽取和最后的视频输出。脚本中包含了大量的语音识别错误,我们人为地对部分错误进行了纠错。在问句分类模块,本论文利用知网(HowNet)来提高问句分类的准确率。视频QA是为了得到问句的最准确的视频答案,而不仅仅是一个很长的故事单元,所以对检索得到的故事单元需要进行更详细地答案抽取。本论文根据关键词密度、问句分类时的答案类型等为输出的句子打分,分数最高的句子对应的视频作为输出。
     本论文的主要创新在于:(1)在问句分类中知网的运用;(2)把文本问答系统扩展到中文视频中,这对问答系统研究是一个突破。对中文CCTV4新闻视频的实验表明,我们提出的方法是可行的。
Question Answering is to locate, extract, and represent a specific answer to a user question expressed in natural language, and current question answering systems succeed in many aspects regarding to questions of textual documents. With the development of the internet, In addition to traditional text message, multimedia data has become increasingly important data on the web, which provides both opportunities and challenges for question answering. video is one of the most effective information for capturing the events in the real world. Our framework is based on news video. In all features, transcript is the most important and most readily available video features. Moreover, the input of video question answering(VideoQA) is a short question, so we main employ transcript feature that is gained by ASR.
     This paper proposes a framework for Chinese Video question answering system. The whole system consists of six modules: video segmentation, speech recognition, question classification, transcript retrieval, answer extraction and video output. But the news transcripts contain numerous speech recognition errors, so we manually correct some errors. In the module of question classification, we employ HowNet to improve the accuracy. VideoQA is to obtain the close video clips, and not just a long story unit, so we need to process and position the close sentences to answer the question. We claim that the best sentence that answers the question should satisfied some conditions which are based on query density, answer type, etc.
     The main contributions of this paper are: (1) HowNet is employed in QA system; (2)the extension of QA technology to support QA in Chinese news video. Experiments on Chinese news CCTV4 show that our framework is effective.

引文

1 郑实福等.自动问答综述[J].中文信息学报,2002,16(6):46-52
    2 Eugene Agichtei, Steve Lawrence, Luis Gravano. Learning Search Engine Specific Query Transformations for Question Answering. Tenth World Wide Web Conference, Hong Kong, China, 2001:505-512
    3 Jay Budzik, Kristian J. Hammond. Learning for Question Answering and Text Classification: Integrating Knowledge-Based and Statistical Techniques. AAAI Workshop on Text Classification, Menlo Park, CA. 1998:431-437
    4 Ellen M. Voorhees. Overview of the TREC2003 question answering track[A]. In Proceedings of the twelfth Text REtrieval Conference, Gaithersburg, 2003:54-69
    5 R. Yan, J. Yang, A.G.Hauputmann. Learning Query-Class Dependent Weights for Automatic Video Retrieval. Proceedings of ACM Multimedia, New York, 2004:18-27
    6 Tat-seng chua, Shi-Yong Neo, et al. TRECVID 2004 Search and Feature Extraction Task by NUS PRIS. In TRECVID 2004, USA, 2004:13-20
    7 Alan Hanjalic, Reginald L.Lagendijk, Jan Biemond. Recent Advances In Video Content Analysis:From Visual Features to Semantic Video segments. International Journal of Image and Graphics, 2001,1(1):63-81
    8 Nicu Sebe, Michael S Lew, Xiang Zhou,et al. The state of the Art in Image and Video Retrieval. Springer-Verlag Heidelberg. International Conference on Image and Video Retrieval, 2003:1-8
    9 Yao Wang, Zhu Liu, Jin-Cheng Huang. Multimedia Content Analysis. IEEE Signal Processing Magazine, 2000,17(6):12-36
    10 Hui Yang, Lekha Chaisorn, Yunlong Zhao,et al. VideoQA: Question Answering on News Video. In proceedings of the ACM conference on Multimedia (Multimedia’03),Berkeley, CA, 2003:185-195
    11 http://www.askjeeves.com. 2001
    12 Wu L, Huang X, Zhou Y, et al. FDUQA on TREC2003 QA task [A]. Proceedings of the Twelfth Text Retrieval Conference [C]. Gaithersburg, Maryland, 2003:246-253
    13 Wu Lide, Huang Xuanjing, Zhou Yaqian, et al. FDUQA on TREC2003 QA task. In: The Twelfth Text Retrieval Conf. Gaithersburg, Maryland, 2003:246-254
    14 A.Velivelli, C.W.Ngo, T.Huaxin. Detection of Documentary Scene Changes by audio- visual fusion. Proceedings of International Conference on Image and Video Retrieval, Urbana, IL, 2003:227-237
    15 M.Rautianen, T.Seppanen, J.Pentilla and J.Petola. Detecting Semantic concept from video using temporal gradients and audio classification. Proceedings of International Conference on Image and Video Retrieval, Urbana, IL 2003:260-270
    16 H.D.Wactlar, T.Kanade, M.A.Smith, S.M.Stevens. Intelligent Access to Digital Video.IEEE Computer Maginze,1999(29):46-52
    17 H.D.Wactlar, M.G.Christel, A.G.Hauptmann. Lessons learned from building a terabyte digital video library. IEEE Computer Magazine, 1999,32(2):66-73
    18 M. Pickerings, L.Wong, S.Ruger. Summarization of news video. Proceedings of International Conference on Image and Video Retrieval, 2003:425-434
    19 L.Chaisorn, T.-S Chua C.-H.Lee. The segmentation of news video into story units. IEEE Int’l Conf.on Multimedia and Expo, Lausanne, Switzerland, 2002:15-23
    20 Feng huamin, Qi Tian. A two-level multi-model approach for story segmentation of large news video corpus. TRECVID WORKSHOP 2003, Gaithersburg, USA, 2003:402-410
    21 范竟往,翟晓飞,封化民等. 一种双层新闻逻辑单元分割框架. 第14届中国多媒体学术会议(NCMT2005),昆明,2005:75-82
    22 Feng Huamin, Zhai Xiaofei, Fan Jingwang, et al. Story Segmentation in News Video The 2nd International Conference on Neural Networks and Brain (ICNN&B2005),Beijing, 2005:678-685
    23 Yong Fang, Huamin Feng, Xiaofei Zhai,et al. News Video Story Segmentation. International Multimedia Modeling Conference, Beijing, 2006:45-55
    24 陈伟萍, 王琳,杨鼎才等. 一种基于语义概念的中文文本分类方法. 第14届全国多媒体学术会议,昆明,2005:400-406
    25 赵林,胡恬.基于知网的概念特征抽取方法.通信学报,2004,27(7):27-32
    26 董振东,董强.知网简介, www.keenage.com. 2006
    27 周茜 , 赵明生 , 扈曼 . 中文文本分类中的特征选择研究 . 中文信息学报,2004,18(3):17-23
    28 朱华宇,孙正兴,张福炎.一个基于向量空间模型的中文文本自动分类系统.计算机工程,2001, 27(2):42-45
    29 Sundheim B M. Named entity task definition, version 2.1. In:Proc. of the Sixth M essage Understanding Conf, 1995:319-332
    30 张晓艳,王挺,陈火旺.命名实体识别研究.计算机科学,2005,32(4):12-16
    31 李保利,陈玉忠,俞士汶.信息抽取研究综述. 计算机工程与应用,2003,29(10):1-5
    32 Humphreys K, Gaizauskas R, Azgam S,et al. Description of the LaSI]E—II system as used for MUC-7. In:Proc.of the 7th Message Understanding Conference(MUC-7),Fairfax, Virginia, 1998:111-115
    33 URLhttp://www.ltg.ed.ac.uk. 2000
    34 Chaisorn L, Chua T-S, Koh C-K, et al. A two-level multi-modal approach for story segmentation of large new video corpus. Proceedings of TRECVID workshop, Gaithersburg, USA , 2003:13-20
    35 Baum L E. An Inequality and Associated Maximization Technique in Statistical Estimation for Probabilistic Functions of Markov Process.Inequalities,1972:1-8
    36 王成友 , 梁甸农 , 孔铁生等 . 自动语音识别技术综述 . 声学与电子工程,1999,12(3):15-21
    37 卢志坚, 张冬荣.中文问答系统中的问句理解. 计算机工程,2004,30(18):64-65
    38 包金龙, 基于向量空间模型的信息检索系统的设计.情报杂志,2005.24(7):44-45
    39 X..Li and D.Roth. Learning Question Classifiers. In Proc. Of the 19th International Conference on Computational Linguistics(COLING 2002), Taipei 2002:556-562
    40 Zhang, D. and W. S. Lee. Question classification using support vector machines. In: Proceedings of the 26th annual international ACM SIGIR conference on Research and development in information retrieval, Toronto, 2003:96-103
    41 J.Suzuki, Y. Sasaki, E. Maeda. 2002a. Question type classification using statistical machine learning. In Forum on Information Technology(FIT2002), Information Technology Letters, Japanese, 2002:89-90
    42 Donald Metzler and W.Bruce Croft. Analysis of statistical question classification for Fact-based question. Information Retrieval, 2005,8(3):481-504
    43 Yang, H., Chua, T.S. The integration of lexical knowledge and external resources for question answering. In Proceedings of the Eleventh Text Retrieval Conference, Maryland, USA, 2002:155-161
    44 王永成. 信息检索的数学模型. 情报学报, 1989,8(4):302-313
    45 杨建林. 信息检索模型与逻辑理论. 情报学报, 2000,19(5):514-518
    46 Eugene Agichtein, Steve Lawrence, Luis Gravano. Learning Search Engine Specific Query Transformations for Question Answering. In the Proceedings of the 10th World Wide Web Conference(WWW2001), Hong Kong,2001:635-641
    47 王洋,秦兵,郑实福.句子相似度计算在FAQ中的应用.第一届学生计算语言研讨会论文集.第一届学生计算语言学研讨会(SWCL2002).北京北京大学.2002:175-181
    48 Zhang H J. Video parsing, retrieval and browsing: An integrated and content-based solution. In: Proc of ACM Multimedia’95, San Francisco, 1995:15-24
    49 Arono Amir, Jaanne O Argillander, Marco Berg, et al. IBM Research TRECVID-2004 Video Retrieval System. TREC Video Retrieval Evaluation Workshop (TRECVID 2004), Gaithersburg, Maryland, 2004:255-263
    50 海量分词软件介绍:http://www.hylanda.com. 2003
    51 Chih-Chung Chang and Chih-Jen Lin. LIBSVM-a Library for support vector machines. http://www.csie.nut.edu.2003
    52 邓乃扬 , 田英杰 . 数据挖掘中的新方法——支持向量机 . 北京:科学出版社,2004:25-35
    53 李蓉 , 叶世伟 , 史忠植 . 一种提高 SVM 分类精度的新方法 . 电子学报,2002,5(5):745-748
    54 张爱丽,刘广利.基于SVM的多类文本分类研究.情报方法,2004,(9):6-10
    55 S. S. Keerthi, S.K. Shevade, C. Bhattacharyya, et al. Imorovementsto platt’s SMO algorithm for SVM classifier design[R].Neural Computation, 2003 ,13(3):637-649
    56 Quinlan, J.R.C4.5: Programs for Machine Learning. San Mateo, CA: Morgan Kaufmann.1993:42-50

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700