基于云模型的中文面向查询多文档自动文摘研究

英文题名：Chinese Query-Focused Multi-document Summarization Based on Cloud Model
作者：陈劲光
论文级别：博士
学科专业名称：教育技术学
中文关键词：面向查询 ; 多文档自动文摘 ; 云模型 ; 文摘单元选取 ; 语料库 ; 中文句子修剪 ; 句子排序
英文关键词：Query-focused Multi-document Summarization ; Cloud Model ; Summarization Unit Selecting ; Chinese Query-focused Multi-document Summarization Corpus ; Chinese Sentence Compression ; Sentence Ordering
学位年度：2011
导师：何婷婷
学科代码：040110
学位授予单位：华中师范大学
论文提交日期：2011-09-01

摘要

随着互联网的普及,互联网上包含着海量的并且时刻在增加的信息。针对用户输入的一个简单查询,搜索引擎一般会返回用户可能需要的一系列经过排序的网页,其中有大量不相关的、重复的数据,需要用户耗费很多精力来自己查找有用的结果。面向查询的多文档自动文摘技术将大量的查询相关文档中的内容提炼、重组为一定长度的简短摘要,加速用户的信息获取,通常要求摘要的内容简洁、组织良好、冗余低、满足个性化需求。面向查询的多文档自动文摘技术能够减小从海量数据中获取信息的难度,提高信息获取及理解的速度,进而提高用户获取以及利用信息的效率,提高使用者在信息社会中的竞争实力。
     云模型是李德毅院士提出的一种处理不确定性概念中模糊性、随机性及其关联性的定性定量转换模型。云模型从研究自然语言概念的不确定性入手,展开对不确定性人工智能的研究。虽然云模型发端于自然语言中的概念,但遗憾的是,就目前搜集到的论文情况看来,将云模型直接应用在自然语言处理领域本身的工作还比较少见。
     本论文针对中文语料中的面向查询多文档自动文摘展开了研究。首先构建可以用于公开评测的评测语料、人工摘要；在此基础上利用云模型进行文摘内容选取、句子修剪、句子排序,力图生成满足用户需求的聚焦度高、内容精练、可读性好的连贯摘要；最后采用修改后的ROUGE工具进行中文文摘自动评测。
     本文主要研究工作和研究成果概括如下：
     一、提出了一种基于云模型的文摘单元选取方法,利用云模型,全面考虑文摘单元的随机性和模糊性,提高面向查询的多文档自动文摘系统的性能。首先计算文摘单元和查询条件的相关性,将文摘单元和各个查询词的相关度看成云滴,通过对云的不确定性的计算,找出与查询条件真正意义相关的文摘单元。随后利用文档集合重要度对查询相关的结果进行修正,将文摘句和其他各文摘句的相似度看成云滴,利用云的数字特征计算句子重要度,找出能够概括尽可能多的文档集合内容的句子,避免片面地只从某一个方面回答查询问题。为了证明文摘单元选取方法的有效性,在英文大规模公开语料上进行了实验,并参加了国际自动文摘公开评测,取得了较好的成绩。
     二、构建了中文自动文摘评测语料库及中文自动评测工具,并以此为基础,构建了一种基于云模型的中文面向查询多文档自动文摘系统。中文自动文摘评测语料库由1000篇文档、100个文档集合和查询条件、400篇人工摘要构成。通过修改英文文摘评测工具ROUGE的源程序,实现了中文自动文摘的ROUGE自动评测。首先将50个文档集合作为训练语料,采用哈工大最新共享的语言技术平台进行句子切分、分词；随后利用中文自动评测工具,在测试语料中进行参数训练；最后采用基于云模型的文摘单元选取方法生成中文摘要,就此搭建了中文云摘要系统。
     三、提出了一种基于多维云和依存分析的中文句子修剪方法,进一步提高文摘质量。首先制定基于依存分析的句子修剪规则,对每个候选文摘句进行句子修剪,从而产生多候选句;随后利用多维云,综合考虑词语在句子、文档集合中的分布以及和查询条件的相关性,对各修剪句进行打分,在云的叠加过程中实现了不确定性的有效传递；最后选取那些包含信息量最大、长度最短的修剪句替换候选文摘句,构成自动摘要,从而使文摘包含更多的有效信息。
     四、提出了一种基于云模板的文摘句排序方法,使生成的中文云摘要更加连贯。云模板的方法将文档集合中的每一篇文档都看成模板,利用云模型将各篇文档的排序结果综合到一起,既避免了单一模板方法对于单个文档的依赖,也避免了多数次序方法只能两两排序的缺点。首先利用基于复杂网络的自适应增量聚类方法对文档集合进行聚类,找出那些包含有一个或多个文摘句的子主题；随后将文档集合中的每一篇文档都看作模板,利用这些模板构成的云确定子主题和文摘句在模板中的相对位置；最后依次对子主题以及对子主题内部的句子进行排序,从而生成连贯性更好、可读性更强的自动摘要。
Wide spread use of internet lead to accumulation of vast amount of information data. With ever increasing popularity of internet, this amount is ever increasing by the moment. For a simple query, a search engine always returns a series web page a user maybe interested in. Since a large proportion of the search results are repetitive or irrelevant information, the user has to spend a lot of time to look for the information they need. To solve this problem, query-focused multi-document summarization was proposed. When given a set of topic-related documents, a query topic consisting of several complex questions, and a user preference profile, one can generate a brief, well-organized fluent summary for the purpose of answering an information need. Query-focused multi-document summarization aims to improve efficiency of obtaining and using information and to increase utilization of network information, therefore to-provide advantages for the user in today's information world.
     Cloud model, firstly proposed by Academician Li Deyi, is an effective model in transforming qualitative concepts to their quantitative expressions and visa versa. It represents fuzziness, randomness and their relationships of concept of uncertainty. It starts with quantitative representation of qualitative concepts in natural languages in doing research of artificial intelligence with uncertainty. Unfortunately, to the best of our knowledge cloud model is rarely applied in Nature Language Processing (NLP).
     This paper is concerned with Chinese query-focused multi-document summarization based on Cloud model. First, a large-scale open-benchmark corpus as well as reference summaries written by human is constructed. Then, in order to generate concise and fluent summaries which satisfy the user's needs, cloud model is used in key processes of summarization, such as content unit selecting, sentence compression, as well as sentence ordering. Lastly, summaries are evaluated by ROUGE-CN, which is an improved version of ROUGE and can be used to evaluate summaries in Chinese in an automated fashion.
     The essence of this thesis can be summarized as the following:
     First, this paper proposes a summarization unit selecting method based on cloud model. Cloud model is used to consider randomness as well as fuzziness on distribution of summarization unit. In the process of obtaining relevance between summarization unit and query, the scores of relevance between the word and each query word are seen as cloud drops. By obtaining uncertainty of cloud, summarization unit which is more relevant to the query is given higher score. After that, importance in the document set is also obtained to evaluate the sentence's ability to summarize content of the document set. Similarities between a sentence and all sentences in document set are considered as cloud drops. Together these cloud drops become a cloud. We use the cloud to evaluate the sentence's ability to summarize content of the document set, trying to find sentences which can summarize the most content of the document set and avoid under representing the document set. In order to demonstrate the effectiveness of the proposed method, large-scale open benchmark corpuses in English are used in the experiment. We also participated TAC (Text Analysis Conference) 2010 and got satisfactory results.
     Secondly, this paper introduces the process of constructing a large-scale Chinese query-focused multi-document summarization corpus, as well as the process of setting up the Chinese query-focused multi-document summarization system. The Chinese query-focused multi-document summarization corpus includes 1000 documents,100 document sets and queries, as well as 400 summarization references. By modifying the source code of ROUGE, which is an automated evaluation tools in English, this paper realizes automated evaluation of Chinese summaries. When constructing the Chinese summarization system, we use 50 document sets as training data to train parameters of the module for selecting summarization units.
     Thirdly, this paper proposes a Chinese sentence compression method based on multi-dimension cloud and dependency relationships to further improve the quality of summaries. A set of heuristic rules based on analysis of dependency relationships are proposed and used to trim sentence and produce compressed sentences that can be used as multiple candidate sentences. The candidate sentences are then scored by multi-dimension cloud model which considers influence of distribution of words among sentences and documents, as well as relevance between the words and the query. Comparing with the single dimension cloud model, the multi-dimension cloud model can retain uncertainties while the clouds are superposing. Candidate sentence which contains the largest amount of information and is shortest in length will replace the original sentence to construct the summary and allow more room for the summary to include more effective information.
     Lastly, this paper proposes a sentence ordering method that is based on cloud model to make the summary more readily comprehensible. This method takes every source document in any given document set as a template of sentence ordering and combines results of different templates into one single ordering result. The advantage of this method is that it doesn't depend on one single document like the single-template-sentence-ordering method and also avoids the complication of pairwise comparison of the majority-sentence-ordering method. All sentences in document set are clustered into several sub-topics by using adaptive incremental clustering method based on complex networks. Then every document in the document set is seen as a template. All these templates together decide relative position of sub-topics as well as sentences. Sub-topics and sentences in the same topic are sorted in sequence to generate more fluent and more readily comprehensible automated summarization.

引文

[1]李德毅,杜鹢.不确定性人工智能[M].北京：国防工业出版社,2005.
    [2]Luhn, H. P. The automatic creation of literature abstracts [J]. IBM Journal of Research Development,1958,2(2):159-165.
    [3]Baxendale, P. Machine-made index for technical literature- an experiment [J]. IBM Journal of Research Development,1958,2(4):354-361.
    [4]Edmundson, H. P. New methods in automatic extracting [J]. Journal of the ACM,1969, 16(2):264-285.
    [5]Schank, R C. SAM:A story understander. Re-search report 43 [M]. Yale Univ. Press,1977.
    [6]Dejong, G. F. An Overview of the FRUMP System [M]. In:LEHNERT, W., & RINGLE, M.h. (eds), Strategies for Natural Language Processing. Lawrence Erlbaum,1982:149-176.
    [7]Tait, J. I. Generating summaries using a script-based language analyzer [C]. Selected and updated papers from the proceedings of the 1982 European conference on Progress in artificial intelligence,1985:313-317.
    [8]Kuhlen, R. Some Similarities and Differences between Intellectual and Machine Text Understanding for the Purpose of Abstracting [C]. In Proceedings of the Fifth International Research Forum in Information Science (IRFIS 5),1983:87-109.
    [9]Rau, L. F., Jacobs, P. S., Zernik, Uri. Information Extracting and Text Summarization Using Linguistic Knowledge Acquisition [J]. Information Processing & Management,1989,25(4): 419-428.
    [10]Jacobs, P. S., Rau., L. F. Scisor:Extracting Information from On-Line News [J]. Communication of the ACM,1990,33(11):88-97.
    [11]Fum, D., Guida, G., Tasso, C. Forward and Backward Reasoning in Automatic Abstracting [C]. In Proceedings of COLING,1982:83-88.
    [12]姚天顺等.自然语言理解——种让机器懂得人类语言的研究[M].清华大学出版社、广西科学技术出版社,1995.
    [13]徐越,李小滨.对自动文摘的研究与实践[C].第一界人工智能联合学术会议论文集,1990：346-350.
    [14]李小滨,徐越.自动文摘系统EAAS [J]软件学报,1991,2(4)：12-18.
    [15]Kupiec, J., Pedersen, J., Chen, F. A trainable document summarizer [C]. In Proceedings of SIGIR '95,1995:68-73.
    [16]Aone, C., Okurowski, M. E., Gorlinsky, J., et al. A trainable summarizer with knowledge acquired from robust nlp techniques [G]. In Mani, I. and Maybury, M. T., editors, Advances in Automatic Text Summarization, MIT Press,1999:71-80.
    [17]Lin, C. Y. Training a selection function for extraction [C]. In Proceedings of CIKM '99, 1999:55-62.
    [18]Conroy, J. M., O'leary, D. P. Text summarization via hidden markov models [C]. In Proceedings of SIGIR '01,2001:406-407.
    [19]Osborne, M. Using maximum entropy for sentence extraction [C]. In Proceedings of the ACL'02 Workshop on Automatic Summarization,2002:1-8.
    [20]Barzilay, R., Elhadad, M. Using lexical chains for text summarization [C]. In Proceedings ISTS'97,1997:10-17.
    [21]Ono, K., Sumita, K., Miike, S. Abstract generation based on rhetorical structure extraction [C]. In Proceedings of Coling'94,1994:344-348.
    [22]Marcu, D. C. Improving summarization through rhetorical parsing tuning [C]. In Proceedings of The Sixth Workshop on Very Large Corpora,1998:206-215.
    [23]Marcu, D. C. The rhetorical parsing, summarization, and generation of natural language texts [D]. University of Toronto. Adviser-Graeme Hirst,1998.
    [24]李俊杰.非受限域中文自动文摘系统的研究与实现[D].哈尔滨工业大学,1995.
    [25]刘挺,吴岩,王开铸.基于信息抽取和文本生成的自动文摘系统设计[J].计算机研究与发展,1997,16：24-29.
    [26]刘挺,王开铸.1999.基于篇章多级依存结构的自动文摘研究[J].情报学报,1999,36(4)：479-488.
    [27]杨晓兰,宋帆,钟义信.基于选择生成文摘法的自动文摘系统研究与实现[C].全国第四届计算语言学联合学术会议论文集,1997：313-318.
    [28]刘伟权.自然语言理解与汉语文本信息处理理论研究[D].北京邮电大学,1997.
    [29]李蕾,郭祥吴,钟义信.面向特定领域的理解型中文自动文摘系统[J].计算机研究与发展,2000,37(4)：6-10.
    [30]McKeown, K. R., Barzilay R., Evans, D. K., et al. Tracking and summarizing news on a daily basis with columbia's newsblaster [C]. In Proceedings of the Human Language Technology Conference,2002:21-29.
    [31]Lin, C. Y., Hovy E. From Single to Multi-document Summarization:A Prototype System and its Evaluation [C]. In Proceeding of the 40th Anniversary Meeting of the Association for Computational Linguistics,2002:457-464.
    [32]Daniel, N., Radev, D., Allison, T. Sub-event based multi-document summarization [C]. In Proceedings of HLT NAACL Workshop on Text Summarization,2003:9-16.
    [33]Zhang, Z., Goldensohn, S. B., Radev, D. R. Towards CST-Enhanced Summarization [C]. In Proceedings of AAAI/IAAI,2002:439-446.
    [34]Schiffman, B., Mani, I., Concepcion, K. J. Producing Biographical Summaries:Combining Linguistic Knowledge with Corpus Statistics [C]. In Proceedings of ACL (Companion Volume),2001:450-457.
    [35]Harabagiu, S., Maiorano, S. Multi-Document Summarization with GISTexter [C]. In Proceedings of the Third LREC Conference,2002:1456-1463.
    [36]Goldstein, J., Mittal, V. O., Carbonell, J., et al. Multi-Document Summarization by Sentence Extraction [C]. In Proceedings of the ANLP2000 Workshop on Automatic Summarization, 2000:40-48.
    [37]Carbonell, J., Goldstein, J. The use of MMR, diversity-based reranking for reordering documents and producing summaries [C]. In Proceedings of the 21st annual international ACM SIGIR conference on research and development in information retrieval,1998: 335-336.
    [38]Radev, D. R., Otterbacher, J., Qi, H., et al. MEAD ReDUCs:Michigan at DUC 2003 [C]. In Proceedings of DUC2003,2003.
    [39]Zhou, Q., Sun, L., Nie, J. Y. IS_SUM: A Multi-Document Summarizer based on Document Index Graphic and Lexical Chains [C]. In Proceedings of DUC2005,2005.
    [40]McKeown, K., Klavans, J., Hatzivassiloglou, V., et al. Towards multidocument summarization by reformulation: Progress and prospects [C]. In AAAI/IAAI,1999: 453-460.
    [41]Radev, D. R. Jing, H. Y., Budzikowska, M. Centroid-based summarization of multiple documents:sentence extraction, utility-based evaluation, and user studies [C]. In Proceedings of ACL2000,2000:21-29.
    [42]郑义,黄萱菁,吴立德.文本自动综述系统的研究与实现[J].计算机研究与发展,2003,40(11)：1606—1611.
    [43]刘远超,王晓龙,刘秉权等.基于聚类分析策略的用户偏好挖掘[J].计算机应用研究,2005,22(12)：27-29.
    [44]秦兵,刘挺,李生.基于局部主题判定与抽取的多文档文摘技术[J],自动化学报,2004,30(6)：909—910.
    [45]郭庆琳,樊孝忠,柳长安.基于文本聚类和NLU的自动文摘研究[J].北京理工大学学报,2005,25(8)：705-709.
    [46]Liu, D. X., He, Y. X., Ji, D. H., et al. Multi-document Summarization Based on BE-Vector Clustering [C]. In Proceedings of CICLing2006,2006:470-479.
    [47]胡珀,何婷婷,张勇.基于网络化数据挖掘策略的中文多文档自动文摘研究[C].中文信息处理前沿进展一中国中文信息学会二十五周年学术会议,2006：369-380.
    [48]Bosma, W. Query-Based Summarization using Rhetorical Structure Theory [C]. In Proceedings of CLIN04,2005:29-44.
    [49]Wu, Y. C, Tsai, K. C., Lee, Y. S., et al. Light-Weight Multi-Document Summarization Based on Two-Pass Re-Ranking [C]. In Proceedings of DUC2006,2006.
    [50]Jagarlamudi, J., Pingali, P., Varma., V. Query Independent Sentence Scoring Approach to DUC 2006 [C]. In Proceedings of DUC2006,2006.
    [51]Favre, B., Bechet, F., Bellot, P., et al. The LIA-Thales Summarization System at DUC 2006 [C]. In Proceedings of DUC2006,2006.
    [52]Zhao, L., Huang, X. J., Wu, L. D. Fudan University at DUC 2005 [C]. In Proceedings of DUC2005,2005.
    [53]Li, S. J., Ouyang, Y., Sun, B. Peking University at DUC 2006 [C]. In Proceedings of DUC2006,2006.
    [54]林鸿飞,杨志豪,赵晶.基于段落匹配和分布密度的偏重摘要实现机制[J].中文信息学报,2007,21(1)：43-48.
    [55]邵伟,何婷婷,胡珀等.一种面向查询的多文档文摘句选择策略[C].第九届全国计算语言学学术会议,2007：637-642.
    [56]蒋效宇,樊孝忠,陈康.基于用户查询的中文自动文摘研究[J],计算机工程与应用,2008,44(5)：48-50.
    [57]苗家,马军,陈竹敏.一种基于HITS算法的blog文摘方法[J].中文信息学报,2011,25(1)：104-109.
    [58]李芳,何婷婷.面向查询的多模式自动摘要研究[J].中文信息学报.2011,25(2)：9-14.
    [59]李德毅.知识表示中的不确定性[J].中国工程科学,2000,2(10)：73-79.
    [60]李德毅,淦文燕,刘璐莹.人工智能与认知物理学[C].中国人工智能进展(2003),第10届全国学术年会论文集,2003：6-14.
    [61]Zadeh, L.A. Toward a generalized theory of uncertainty (GTU)-an outline [J]. Information Sciences,2005,172 (1-2):1-40.
    [62]Zadeh, L.A. Generalized theory of uncertainty (GTU)—principal concepts and ideas [J]. Computational Statistics and Data Analysis,2006,51(1):15—46.
    [63]Zadeh, L.A. Fuzzy logic=computing with words [J]. IEEE Transaction on Fuzzy System, 1996,4(2):103-111.
    [64]Mendel, J. M. Advances in type-2 fuzzy sets and systems [J]. Information Sciences,2007, 177(1):84-110.
    [65]Mendel, J. M. Computing with words and its relationships with fuzzistics [J]. Information Sciences,2008,177 (4):988-1006.
    [66]Herrera, F., Martinez, L. A 2-tuple fuzzy linguistic representation model for computing with words [J]. IEEE Transactions on Fuzzy Systems,2000,8(6):746—752.
    [67]Wang, J.H., Hao, J. A new version of 2-tuple fuzzy linguistic representation model for computing with words [J]. IEEE Transactions on Fuzzy Systems,2006,14(3),435—445.
    [68]Lawry, J. A methodology for computing with words [J]. International Journal of Approximate Reasoning,2001,28(2-3):51—90.
    [69]Lawry, J. A framework for linguistic modeling [J]. Artificial Intelligence,2004,155(1-2): 1-39,2004.
    [70]王飞跃.词计算和语言动力学系统的计算理论框架[J].模式识别与人工智能,2001,14(4)：377—384.
    [71]张铃,张钹.模糊商空间理论(模糊粒度计算方法)[J].软件学报.14(4)：770—776,2003.
    [72]李德毅,孟海军,史雪梅.隶属云和隶属云发生器[J].计算机研究和发展,1995,32(6)：16-21.
    [73]Li, D.Y., Han, J. W., Shi, X. M. Knowledge representation and discovery based on linguistic atoms [J]. Knowledge-based System,1998,10(7):431-440.
    [74]张屹,李德毅,张燕.隶属云及其在数据发掘中的应用[C].青岛—香港国际计算机会议论文集(下册),1999：890-895.
    [75]李德毅.三级倒立摆的云控制方法及动平衡模式[J].中国工程科学,1999,1(2)：41-46.
    [76]蒋嵘,李德毅,范建华.数值型数据的泛概念树的自动生成方法[J].计算机学报,2000,23(5)：470-476.
    [77]宋远骏,杨孝宗,李德毅,崔东华.多机多任务实时系统云调度策略[J].计算机学报,2000,23(10)：1107-1113、
    [78]杜鹢,李德毅.基于云的概念划分及其在关联采掘上的应用[J1.软件学报,2001,12(2)：196-201.
    [79]李德毅,刘常昱.论正态云模型的普适性[J].中国工程科学,2004,6(8)：28-34.
    [80]刘常昱,冯芒,戴晓军等.基于云X信息的逆向云新算法[J].系统仿真学报,2004, 16(11)：2417-2420.
    [81]张勇,赵东宁,李德毅.相似云及其度量分析方法[J].信息与控制,2004,33(2)：129-132.
    [82]刘常昱,李德毅,杜鹢等.正态云模型的统计分析[J].信息与控制,2005,34(2)：236-239.
    [83]淦文燕,李德毅.一种基于数据场的层次聚类方法[J].电子学报,2006,34(2)：68—-72.
    [84]胡石元,李德仁,刘耀林等.基于云模型和灰色关联度分析法的土地评价因素权重挖掘[J].武汉大学学报,2006,31(5)：423-427.
    [85]罗自强,张光卫,李德毅.一维正态云的概率统计分析[J].信息与控制,2007,36(4)：471-475.
    [86]李德毅,刘坤,孙岩等.涌现计算：从无序掌声到有序掌声的虚拟现实[J].中国科学E辑,2007,37(10)：1248-1257.
    [87]淦文燕,赫南,李德毅等.一种基于拓扑势的网络社区发现方法[J].软件学报,2009,20(8)：2241-2254.
    [88]刘禹,李德毅,张光卫等.云模型雾化特性及在进化算法中的应用[J].电子学报,2009,37(8)：1651-1658.
    [89]李德毅,刘常昱,淦文燕.正态云模型的重尾性质证明[J].中国工程科学,2011,13(4)：21-24.
    [90]陈贵林.一种定性定量信息转换的不确定性模型——云模型[J].计算机应用研究,2010,27(6)：2006-2010.
    [91]付斌,李道国,王慕快.云模型研究的回顾与展望[J].计算机应用研究,2011,28(2)：420-426.
    [92]康海燕,李彦芳,林培光等.信息检索策略性能的云模型评价方法[J].中文信息学报,2005,19(1)：42-47.
    [93]康海燕,樊孝忠,余正涛等.基于云模型的泛概念检索[C].全国第二届优秀博士论文年会论文集,2004：196-201.
    [94]袁晓芳,李红霞,田水承等.基于义类词典和云模型的重大突发事件CBR系统研究[C].第四届国际应急管理论坛暨中国(双法)应急管理专业委员会第五届年会,2009：529-534.
    [95]Long, H., He, Z. H., Li, S. Q., et al. Automated Summarization Evaluation Based on Clouds Model [C]. In Proceeding of China Information Retrieval Conference (CCIR 2009),2009: 9-16.
    [96]淦文燕,刘常昱,李德毅.基于拓扑势的网络热点话题发现研究[J].军事运筹与系统工程,2010,24(3)：41-44.
    [97]代劲,何中市,胡峰.基于云模型的文本特征自动提取算法[J].中南大学学报(自然科学版).2011,42(3)：714-720.
    [98]Jian, W., He, T. T., Chen, J. G, et al. Boosting native bayes text categorization by using cloud model [C].2011 International Conference on Computer, Electrical, and Systems Sciences, and Engineering(CESSE 2011),2011:165-170.
    [99]Radev, D. R., McKeovwn, K. R. Generating Natural Languages Summaries from Multiple On-Line Sources [J]. Computational Linguistics,1998,24(3):21-29.
    [100]McKeown, K., Klavans, J., Hatzivassiloglou, V., et al. Towards multidocument summarization by reformulation:Progress and prospects [C]. In AAAI/IAAI,1999, 453-460.
    [101]Barzilay, R., McKeown, K., and Elhadad, M. Information fusion in the context of multi-document summarization [C]. In Proceedings of ACL '99,1999,550-557.
    [102]Nenkova, A., Vanderwende, L. The impact of frequency on summarization [J]. Technical report, MSR-TR-2005-101,2005.
    [103]Vanderwende, L., Suzuki, H., Brockettt., C. Microsoft Research at DUC 2006:Task-focused summarization with sentence simplification and lexical expansion [C]. In Document Understanding Workshop (DUC 2006),2006.
    [104]Mani, I., Bloedorn, E. Multi-document summarization by graph search and matching [C]. In AAAI/IAAI,1997:622-628.
    [105]Erkan, G, Radev. D. LexPageRank: prestige in multi-document text summarization [C]. In Proceedings of EMNLP'04,2004:365-371.
    [106]Mihalcea, R., Tarau, P. A language independent algorithm for single and multiple document summarization [C]. In Proceedings of IJCNLP'2005,2005:19-24.
    [107]Wan, X. J., Yang, J. W. Improved affinity graph based multi-document summarization [C]. In Proceedings of HLT-NAACL2006,2006:181-184.
    [108]Daume, H., Marcu, D. Bayesian query-focused summarization [C]. In Proceedings of COLINGACL2006,2006:305-312.
    [109]Gupta, S., Nenkova, A., and Jurafsky., D. Measuring importance and query relevance in topic-focused multi-document summarization [C]. In Proceedings of ACL-07,2007: 193-196.
    [110]Wan, X. J., Yang, J. W., Xiao, J. Manifold-ranking based topic-focused multi-document summarization [C]. In Proceedings of IJCAI2007,2007:2903-2908.
    [111]Deerwester, S., Dumais, S., Furnas, G., et al. Indexing by latent semantic analysis [J]. Journal of the American Society for Information Science,1990,41 (6):391-407.
    [112]Blei, D., A. Ng, and M. Jordan. Latent Dirichlet Allocation [J]. Journal of Machine Learning Research,2003,3(4-5):993-1022.
    [113]Chen, Y. T., Chen, B., Wang, H. M. A probabilistic generative framework for extractive broadcast news speech summarization [J]. IEEE Trans on Audio, Speech, and Language Processing,2009,17 (1):95-106.
    [114]Arora, R., Ravindran, B. Latent Dirichlet allocation based multi-document summarization [C]. In Proceedings of the SecondWorkshop on Analytics forNoisyUnstructured Text data. Singapore,2008:91-97.
    [115]Haghigh, I. A., Vanderwende, L. Exploring content models for multi-document summarization [C]. In Proceedings of Human Language Technologies:the Annual Conference of the North American Chap ter of the ACL Boulder. Colorado,2009:362-370.
    [116]Chien, J. T. Latent Dirichlet learning for document summarization [C]. In Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing,2009: 1689-1692.
    [117]Salton, G., Wong, A., Yang, C. S. A Vector Space Model for Automatic Indexing [J]. In Communications of the ACM,1975,18(11):613-620.
    [118]Neto, J. L., Santos, A. D., Kaestner, C. A. A., et al. Document clustering and text summarization [C]. In Proceedings of 4th Int. Conf. Practical Applications of Knowledge Discovery and Data Mining,2000:41-55.
    [119]Lund, K., Burgess, C. Producing high-dimensional semantic spaces from lexical co-occurrence [J]. Behavior Research Methods, Instrumentation, and Computers,1996, 28(2):203-208.
    [120]Jagarlamudi, J., Pingali, P., Varma, V. A Relevance-Based Language Modeling Approach to DUC 2005 [C]. In Proceedings of Document Understanding Conference,2005.
    [121]Lin, C. Y. ROUGE:A Package for Automatic Evaluation of Summaries [C]. In Proceedings of the Workshop on Text Summarization Branches Out, post-conference workshop of ACL 2004,2004:25-27.
    [122]Minnen, G., Carroll, J., Pearce, D. Applied morphological processing of English [J]. Natural Language Engineering,2001,7(3):207-223.
    [123]Carbonell, J. G., Goldstein, J. The use of MMR, diversity-based re-ranking for reordering documents and producing summaries [C]. In Proceedings of SIGIR,1998:335-336.
    [124]Lin, C. Y., Hovy, E. Automatic evaluation of summaries using n-gram co-occurrence statistics [C]. In Proceedings of NLT-NAACL,2003:71-78.
    [125]Nenkova, A., Passonneau, R. J. Evaluating content selection in summarization:The pyramid method [C]. In Proceedings of HLT-NAACL,2004:145-152.
    [126]Nenkova, A., Passonneau, R., McKeown, K. The pyramid method:Incorporating human content selection variation in summarization evaluation [J]. ACM Trans. Speech Lang. Process,2007,4(2):4.
    [127]Hovy, E., Lin, C. Y., Zhou, L. Evaluating DUC 2005 using Basic Elements [C]. In Proceedings of the document understanding workshop in DUC 2005,2005.
    [128]刘星星,何婷婷,龚海军等.网络热点事件发现系统的设计[J].中文信息学报,2008,22(6)：80-85.
    [129]胡慧君,贾焱,刘茂福.基于标签密度的Web页面正文内容提取方法[C].第七届中文信息处理国际会议论文集,2007：374-378.
    [130]Lin, C. Y, Och, F. J. ORANGE:a method for evaluating automatic evaluation methods for machine translation [C]. In Proceedings of the 20th international conference on Computational Linguistics,2004:501-507.
    [131]Papineni, K., Roukos, S., Ward, T., et al. BLEU: a method for automatic evaluation of machine translation [C]. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics,2002:311-318.
    [132]Zhou, L., Lin, C. Y, Munteanu, D. S., et al. ParaEval:Using Paraphrases to Evaluate Summaries Automatically [C]. In Proceedings of the Human Language Technology Conference-North American chapter of the Association for Computational Linguistics annual meeting,2006:447-454.
    [133]Lin, C. Y, Cao, G H., Gao, J. F., et al., An Information-Theoretic Approach to Automatic Evaluation of Summaries [C]. In Proceedings of the Human Language Technology Conference-North American chapter of the Association for Computational Linguistics annual meeting,2007:463-470.
    [134]沈洲,王永成,许一震等.自动文摘系统评价方法的研究与实践[J].情报学报,2001,20(1)：66-72.
    [135]傅间莲,陈群秀.一种新的自动文摘系统评价方法[J].计算机工程与应用,2006,42(18)：176-177.
    [136]闫英杰,林鸿飞,王剑峰.基于混合策略的中文文摘自动评测方法[J].广西师范大学学报(自然科学版),2007,25(2)：165-168.
    [137]黄丽琼,何中市,张杰慧.基于文本相似度的自动文摘评价方法[J].计算机应用研究,2007,24(8)：97-99.
    [138]魏继增,孙济舟,秦兵.多文档文摘评价标准的研究[J].计算机工程与应用,2007,43(2)：180—183.
    [139]He, T. T., Chen, J. G., Ma, L., et al. ROUGE-C:A Fully Automated Evaluation Method for Multi-document Summarization [C]. In Proceedings of GrC'2008:269～274.
    [140]Nenkova, A. Automatic Text Summarization of Newswire:Lessons Learned from the Document Understanding Conference [C]. In Proceedings of AAAI.2005:1436-1441.
    [141]吉春亚.句子减肥术.小学教学研究[J],2011,6：44-45.
    [142]Goldensohn, S. B., Evans, D., Hatzivassiloglou, V., et al. Columbia University at DUC 2004 [C]. In Proceedings of the 2004 Document Understanding Conference (DUC 2004) at HLT/NAACL 2004,2004:23-30.
    [143]Conroy, J., Schlesinger, J., Goldstein. J. CLASSY query-based multi-document summarization [C]. In Proceedings of the 2005 Document Understanding Conference (DUC 2005),2005.
    [144]Jing, H. Sentence reduction for automatic text summarization [C]. In Proceedings of the 6th Applied Natural Language Processing Conference,2000:310-315.
    [145]Knight, K., Marcu, D. Summarization beyond sentence extraction:a probabilistic approach to sentence compression [J]. Artificial Intelligence,2002,139(1):91-107.
    [146]Hori, C., Furui, S. Speech summarization:an approach through word extraction and a method for evaluation [J]. IEICE Transactions on Information and Systems,2004, E87-D(1):15-25.
    [147]Turner, J., Charniak, E. Supervised and unsupervised learning for sentence compression [C]. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics,2005:290-297.
    [148]Clarke, J., Lapata, M. Global Inference for Sentence Compression An Integer Linear Programming Approach [J]. Journal of Artificial Intelligence Research,2008,31(1): 399-429.
    [149]Chen, J. G., He, T. T., Gui, Z. M., et al. Probabilistic unsupervised Chinese sentence compression [C]. In Proceedings of GrC'2009:61-66.
    [150]Xu, W., Grishman, R. A Parse-and-Trim Approach with Information Significance for Chinese Sentence Compression [C]. In Proceedings of ACL-IJCNLP 2009,2009:48-55.
    [151]Chomsky, N. Syntactic Structures [M].The Hague:Mouton,1957.
    [152]Tesniere, L. Elements de syntaxe structurale [M]. Paris:Klincksieck,1959.
    [153]Zajic, D., B. Dorr, Lin, J., Schwartz, R. Multi-Candidate Reduction:Sentence Compression as a Tool for Document Summarization Tasks [J]. Information Processing and Management, 2007,43(6):1549-1570.
    [154]邢福义.汉语复句研究[M].北京：商务印书馆,2002.
    [155]Gardenfors, P. Conceptual spaces:The geometry of thought [M]. The MIT Press,2001.
    [156]王晓峰,洪磊.基于云的概念空间模型研究[J].计算机工程与应用,2010,46(20)202-206.
    [157]Dorr, B., Zajic, D., Schwartz, R. Hedge Trimmer: A Parse-and-Trim Approach to Headline Generation [C]. In Proceedings of the NAACL/HLT text summarization workshop,2003: 1-8.
    [158]Clarke, J., Lapata, M. Models for Sentence Compression:A Comparison across Domains, Training Requirements and Evaluation Measures [C]. In Proceedings of the COLING/ACL 2006,2006,377-384.
    [159]Cohn, T., Lapata, M. Large Margin Synchronous generation and its application to sentence compression [C]. In Proceedings of the EMNLP/CoNLL 2007,2007:73-82.
    [160]Barzilay, R., Elhadad, N., McKeown, K. Inferring strategies for sentence ordering in multidocument news summarization [J]. Journal of Artificial Intelligence Research,2002, 17(1):35-55.
    [161]McKeown, K., Klavans, J., Hatzivassiloglou, V., et al. Towards multidocument summarization by reformulation: Progress and prospects [C]. In Proceedings of AAAI/IAAI,1999,453-460.
    [162]Okazaki, N., Matsuo, Y., Ishizuka, M. Improving chronological sentence ordering by precedence relation [C]. In Proceedings of 20th International Conference on Computational Linguistics (COLING 04),2004:750-756.
    [163]Lapata, M. Probabilistic text structuring: Experiments with sentence ordering [C]. In Proceedings of the annual meeting of ACL,2003:545-552.
    [164]Barzilay, R., Lee, L. Catching the drift:Probabilistic content models, with applications to generation and summarization [C]. In Proceedings of HLT-NAACL 2004: Proceedings of the Main Conference,2004:113-120.
    [165]Ji, P. D., S. Pulman. Sentence ordering with manifold-based classification in multi-document summarization [C]. In Proceedings of Empherical Methods in Natural Language Processing,2006:526-533.
    [166]Bollegala, D., Okazaki, N., Ishizuka, M. A bottom-up approach to Sentence Ordering for Multi-document Summarization [C]. In Proceedings of the Joint 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the Association for Computational Linguistics (COLING-ACL 2006),2006:385-392.
    [167]刘德喜.基于基本要素的多文档自动文摘研究[D].武汉大学,2007.
    [168]蒋效宇,樊孝忠,陈康.用于多文档文摘句排序的改进MO算法[J].华南理工大学学报(自然科学版),2008,36(9)：43-47.
    [169]马亮.面向查询多文档文摘的文摘句选择与排序研究[D].华中师范大学,2009.
    [170]徐永东,王亚东,刘杨等.多文档文摘中基于时间信息的句子排序策略研究[J].中文信息学报,2009,23(4)：27-34.
    [171]解,汪小帆.复杂网络中的社团结构分析算法研究综述[J].复杂系统与复杂性科学,2005,2(3)：1-12.
    [172]Scott, J. Social Network Analysis:A Handbook [M].2nd ed., London:Sage Publications, 2000.
    [173]West, D. B. Introduction to Graph Theory [M]. Upper Saddle River:Prentice Hall,2001.
    [174]Girvan, M., Newman, M. E. J. Community structure in social and biological networks [J]. Proc. of the National Academy of Sciences of the United States of America,2002,99(12): 7821-7826.
    [175]Newman, M. E. J., Girvan, M. Finding and evaluating community structure in networks [J]. Physical Review E,2004,69(2):026113.
    [176]Newman, M. E. J. Fast algorithm for detecting community structure in networks [J]. Phys Rev E,2004,69(6):066133.
    [177]Clauset, A., Newman, M. E. J., Moore, C. Finding community structure in very large networks [J]. Phys Rev E,2004,70(6):066111.
    [178]李芳.面向查询的多模式自动摘要研究[D].华中师范大学,2009.
    [179]Lebanon, G., Lafferty, J. Combining Rankings Using Conditional Probability Models on Permutations [C]. In Proceedings of the 19th International Conference on Machine Learning. Morgan Kaufmann Publishers,2002:363-370.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700