时序多文档文摘相关技术研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
互联网的发展产生了爆炸式增长的文本、图像、音频和视频等多媒体信息。面对信息极大丰富,知识相对匮乏的时代,人们陷入一种咨讯焦虑的困境之中。而且随着时间的不断演化,相关的媒体信息也在逐渐地更新和进化。如何有效地获取、组织信息逐渐成为信息处理领域的一大挑战。本文以信息压缩为目标,着重研究文本压缩技术。
     时序多文档文摘为自动文摘领域的新方向,是传统静态多文档文摘的自然扩展,其处理的对象跨越了同一时段的相关文档集,即处理跨时段的相关文档集。其主要目标是按照一定的压缩比从时序角度自动总结出系列新闻报道的内容进化,以帮助人们快速获取信息。伴随着国际评测DUC2007、TAC2008的举办,相关的研究越来越受到政府、企业界和学术界的重视。时序多文档文摘有着广阔的应用前景,可用于新闻搜索引擎、商业竞争情报分析、趋势预测等领域,通过不断满足人们的需求,创造更大的社会价值。
     本文的研究对象系列新闻报道本身具有比较突出的时序特性,可以认为同一时段的静态多文档文摘是时序多文档文摘的一种特殊情况。因此,时序多文档文摘的研究重点是如何在时序上下文的背景下解决传统静态多文档文摘的内容选择和语言质量控制两大难题。前人的工作对时序信息考虑的比较少,本文着眼于识别时序特性并应用其来深度挖掘时序多文档文摘的抽取式内容选择方法,力图保持文摘内容的重要性、新颖性和覆盖性,重点研究了以下问题:
     1、识别时间表达式并进行归一化。理解文本的语义是自然语言处理的终极目标,而时序语义对于理解文本是不可或缺的。时间表达式识别归一化是时序语义标注的基础。时间表达式识别与归一化的研究为时序多文档文摘的内容选择和语言质量控制奠定了基础,也可以为其它时序信息抽取应用提供支撑。
     2、基于宏微观重要性判别模型的内容选择。本着逐步求精的原则,首先在假设系列新闻报道各时间片相互独立的基础上,通过分析其不断演化的宏微观时序进化特性,探索基于宏微观重要性判别模型的时序多文档文摘内容选择方法。
     3、基于进化流形排序的话题相关内容选择。更进一步,系列新闻报道在时间轴上是连续进化的,在假设当前时间片的内容进化依赖于以前时间片话题内容的基础上,研究话题描述的动态增强对表达用户兴趣不断更新所带来的信息需求的变化,对内容选择的影响。提出迭代反馈机制引导的进化流形排序算法,以模拟系列新闻报道中话题演化的动态性,为时序多文档文摘的内容选择提供了时序自适应的重要性排序。
     4、谱聚类增强的话题相关内容选择优化。在进化流形排序的基础上,研究了通过归一化谱聚类改进内容选择的覆盖性,设计了时序去冗余策略来保证文摘内容更好的新颖性。结合子话题排序和新颖的去冗余策略探索了时序多文档文摘优化的内容选择方法。在国际评测TAC2008中的UpdateSummarization任务上,获得了名列前茅的内容选择评测性能,证明了该方法的优越性。
     本文对时序多文档文摘及其内容选择技术进行了初步探索,提出的方法具有语言无关性,取得了一定成果,为今后的深入研究奠定了基础。
The development of Internet produces the explosive growth of multimedia informa-tion, such as text, picture, audio, video and so on. In the era of greatly rich informationand relative lack of knowledge, people fall into a kind of information anxiety. As timegoes, the relevant multimedia information also gradually updates and evolves. How toeffectively acquire and organize information becomes a challenge in information extrac-tion. This paper emphasizes on studying text compression technology for the goal ofinformation compression.
     Temporal multi-document summarization (TMDS) is a new direction in automaticsummarization. It is the natural extension of multi-document summarization, which cap-tures evolving information of a single topic over time. The greatest difference from tra-ditional static multi-document summarization is that it deals with the dynamic collectionbeyond the same period, say, the relevant document collection across periods. It mainlyaims to automatically summarize series of news reports so as to help people to efficientlyacquire the evolutionary content. With the conduct of international evaluation DUC 2007and TAC 2008, the relevant researches become more and more emphasized by industry,academia, and government. TMDS has a wide application future, which can be used tonews search engine, commercial intelligence analysis, trend prediction. It will bring greatsocial value by satisfying people’s needs.
     The research object in the thesis, series of news report, has strong temporal char-acteristics. It can be considered that static multi-document summarization in the sameperiod is a special situation of TMDS. Therefore, the research keystone of TMDS is howto resolve the two difficult problems of static multi-document summarization in temporalcontext. Previous researches rarely consider temporal information. Our thesis focuseson how to recognize temporal characteristics and use it to deeply mine extractive contentselection of TMDS. We also try to keep the summary content to be important, novel andfull-coverage. The mainly research problems are as follows:
     1. Time Expression Recognition and Normalization. Understanding semantic oftext is the ultimate goal of natural language processing, and temporal semantic is neces-sary for understanding text. Time expression recognition and normalization are the basis of temporal semantic labeling, which build a foundation for content selection and lan-guage quality controal of TMDS, and also support other temporal information extractionapplications.
     2. Macro-micro importance discriminative model based content selection. Basedon the principle of stepwise refinement, we assume that the time slices in series of newsreport are independent. Content selection method of TMDS with macro-micro importancediscriminative model is explored through analyzing the evolutionary macro and microtemporal characteristics.
     3. Evolutionary manifold ranking based topic oriented content selection. Series ofnews report continuously evolve along timeline. Further step, it is assumed that contentevolution in the current time slice is dependent on topic content in the previous time slice.We study how to enhance the expression capability of the static query and embody thedynamic evolution of query, and how these changes in?uence content selection. We pro-pose the evolutionary manifold ranking based on iterative feedback mechanism in order tomodel the dynamic characteristics of topic evolution in series of news report. It providesthe temporally adaptive ranking algorithm for content selection of TMDS.
     4. Topic oriented content selection optimization strengthened by spectral cluster-ing. Based on evolutionary manifold ranking, we adopt normalized spectral clustering toimprove content coverage and design temporal redundancy removal strategy to keep thesummary content to be more novel. We explore the optimization content selection methodby combining sub-topics ordering with novel redundancy removal strategy. In the updatesummarization task of TAC 2008, we receive the competitive evaluation performance,proving the superiority of our approach.
     This thesis explores TMDS and its content selection,which makes some progress.The proposed methods have language independence. It builds a deep foundation for futurework.
引文
1 I. Mani. Automatic Summarization[M]. John Benjamins Publishing Company,2001.
    2 B. Schiffman. Learning to Identify New Information[D]Columbia University, 2005.
    3 H. Luhn. The Automatic Creation of Literature Abstracts[J]. Res. Develop,1BM J,1959, 2(2):159–165.
    4 J. Burger, C. Cardie, V. Chaudhri, et al. Issues, Tasks and Program Structures toRoadmap Research in Question & Answering (q&a)[J]. Document UnderstandingConferences Roadmapping Documents, 2001.
    5 D. Moldovan, M. Pas?ca, S. Harabagiu, et al. Performance Issues and Error Anal-ysis in an Open-domain Question Answering System[C]//Proceedings of the 40thAnnual Meeting on Association for Computational Linguistics. 2001:33–40.
    6 C. Kuan-Yu, L. LUESUKPRASERT, T. Seng-cho. Hot Topic Extraction Based onTimeline Analysis and Multi-dimensional Sentence Modeling[J]. IEEE Transac-tions on Knowledge and Data Engineering 19, 8 (Aug. 2007), 2007:1016–1025.
    7 R. Swan, D. Jensen. Constructing Topic-specific Timelines with Statistical Mod-els of Word Usage[J]. Proceedings of the 6th ACM Conference on KnowledgeDiscovery and Data Mining (SIGKDD), 2000:73–80.
    8 J. Allan, R. Gupta, V. Khandelwal. Temporal Summaries of New Top-ics[C]//Proceedings of the 24th annual international ACM SIGIR conference onResearch and development in information retrieval. 2001:10–18.
    9 B. Baldwin, R. Donaway, E. Hovy, et al. An Evaluation Road Map for Summariza-tion Research[C]//TIDES. 2000.
    10 C.-Y. Lin, E. Hovy. Automatic Evaluation of Summaries Using N-gram Co-occurrence Statistics[C]//NAACL’03: Proceedings of the 2003 Conference of theNorth American Chapter of the Association for Computational Linguistics on Hu-man Language Technology. Morristown, NJ, USA: Association for ComputationalLinguistics, 2003:71–78.
    11 E. Hovy, C. Lin, L. Zhou, et al. Automated Summarization Evaluation with Ba-sic Elements[C]//Proceedings of the Fifth Conference on Language Resources andEvaluation (LREC). 2006.
    12 A. Nenkova, R. Passonneau, K. McKeown. The Pyramid Method: IncorporatingHuman Content Selection Variation in Summarization Evaluation[J]. ACM Trans.Speech Lang. Process., 2007, 4(2):4.
    13 H. Dang, K. Owczarzak. Overview of the Tac 2008 Update SummarizationTask[C]//Text Analysis Conference. 2008.
    14 J. M. Conroy, H. T. Dang. Mind the Gap: Dangers of Divorcing Evaluations ofSummary Content from Linguistic Quality[C]//Proceedings of the 22nd Interna-tional Conference on Computational Linguistics (Coling 2008). Manchester, UK,2008:145–152.
    15 T. Pardo, L. Antiqueira, M. Nunes, et al. Modeling and Evaluating Summaries Us-ing Complex Networks[J]. Computational Processing of the Portuguese Language,2006, 3960:1–10.
    16 L. Antiqueira, M. Nunes, O. Oliveira Jr, et al. Strong Correlations between TextQuality and Complex Networks Features[J]. Physica A: Statistical Mechanics andits Applications, 2007, 373:811–820.
    17 I. Mani. Recent Developments in Temporal Information Extraction (draft)[J]. Ni-colov, N., and Mitkov, R. Proceedings of RANLP, 2004, 3.
    18 Q. Mei, C. Zhai. Discovering Evolutionary Theme Patterns from Text: An Explo-ration of Temporal Text Mining[C]//Proceedings of the eleventh ACM SIGKDDinternational conference on Knowledge discovery in data mining. 2005:198–207.
    19秦兵.基于子主题的多文档文摘技术的研究[D]哈尔滨工业大学, 2005.9.
    20徐永东.多文档自动文摘技术研究[D]哈尔滨工业大学, 2007.5.
    21 M. Moens, R. Angheluta, J. Dumortier. Generic Technologies for Single-and Multi-document Summarization[J]. Information Processing and Management, 2005,41(3):569–586.
    22 A. Nenkova. Understanding the Process of Multi-document Summarization: Con-tent Selection, Rewriting and Evaluation[D]Columbia University, 2006.
    23 R. Barzilay. Information Fusion for Multidocument Summarization: Paraphrasingand Generation[D]Columbia University, 2003.
    24 S. Sekine, C. Nobata. A Survey for Multi-document Summariza-tion[C]//Proceedings of the HLT-NAACL 03 on Text summarization workshop-Volume 5. 2003:65–72.
    25 D. Radev. Generating Natural Language Summaries from Multiple On-lineSources: Language Reuse and Regeneration[D]Columbia University, 1999.
    26 R. Barzilay, N. Elhadad, K. McKeown. Sentence Ordering in Multidocument Sum-marization[C]//Proceedings of the first international conference on Human lan-guage technology research. 2001:1–7.
    27 Z. Xie, X. Li, B. Di Eugenio, et al. Using Gene Expression Programming toConstruct Sentence Ranking Functions for Text Summarization[C]//Proceedings ofColing 2004. 2004:1381–1384.
    28 R. Barzilay, M. Elhadad. Using Lexical Chains for Text Summariza-tion[C]//Proceedings of the ACL Workshop on Intelligent Scalable Text Summa-rization. 1997:10–17.
    29 J. Morris, G. Hirst. Lexical Cohesion Computed by Thesaural Relations as anIndicator of the Structure of Text[J]. Computational linguistics, 1991, 17(1):21–48.
    30 M. Brunn, Y. Chali, C. Pinchak. Text Summarization Using LexicalChains[C]//Workshop on Text Summarization in conjunction with the ACM SIGIRConference. 2001.
    31 W. Mann, S. Thompson, B. Baldwin, et al. Rhetorical Structure Theory: Toward aFunctional Theory of Text Organization[J]. Computers Mathematics with Applica-tions, 1998, 23:133–177.
    32 C. Radev. A Common Theory of Information Theory from Multiple Text Sources,Step One: Cross-document Structure[C]//Proceedings 1st ACL SIGDIAL Work-shop on Discourse and Dialogue. 2000.
    33 H. Chieu, Y. Lee. Query Based Event Extraction Along a Timeline[C]//Proceedingsof the 27th annual international ACM SIGIR conference on Research and develop-ment in information retrieval. 2004:425–432.
    34 Z. Li, B. Wang, M. Li, et al. A Probabilistic Model for Retrospective News EventDetection[C]//Proceedings of the 28th annual international ACM SIGIR conferenceon Research and development in information retrieval. 2005:106–113.
    35 W. Li, M. Wu, Q. Lu, et al. Extractive Summarization Using Inter-and Intra-eventRelevance[C]//The 44th Annual Meeting of the Association for Computational Lin-guistics. 2006, 44:369.
    36 E. Filatova, V. Hatzivassiloglou. Event-based Extractive Summariza-tion[C]//Proceedings of ACL Workshop on Summarization. 2004.
    37 N. Daniel, D. Radev, T. Allison. Sub-event Based Multi-document Summariza-tion[C]//Proceedings of the HLT-NAACL 2003 Workshop on Text Summarization.2003:9–16.
    38 E. FILATOVA, V. HATZIVASSILOGLOU. Marking Atomic Events in Sets of Re-lated Texts[J]. Recent Advances in Natural Language Processing III: Selected Pa-pers from RANLP 2003, 2003:247.
    39 J. Lim, I. Kang, J. J.Bae, et al. Sentence Extraction Using Time Features in Multi-document Summarization[J]. In Proceedings of the Asia Information RetrievalSym-posium 2004:82–93.
    40 A. Jatowt, M. Ishizuka. Temporal Web Page Summarization[J]. 5th Interna-tional Conference On Web Information Systems Engineering, Brisbane, Australia,November 22-24, 2004.
    41 M. L. Q. W. K. Li, W.J.and Wu. Integrating Temporal Distribution InformationInto Event-based Summarization[J]. International Journal of Computer Processingof Oriental Languages, 2006, 19:201–222.
    42 S. Jang, J. Baldwin, I. Mani. Automatic Timex2 Tagging of Korean News[J]. ACMTransactions on Asian Language Information Processing (TALIP), 2004, 3(1):51–65.
    43 N. Vazov. A System for Extraction of Temporal Expressions from FrenchTexts Based on Syntactic and Semantic Constraints[C]//Proceedings of ACL-2001:Workshop on Temporal and Spatial Information Processing, Toulouse, France.2001.
    44 E. Saquete, P. Martinez-barco, R. Mufioz. Recognizing and Tagging TemporalExpressions in Spanish[C]//Workshop on Annotation Standards for Temporal In-formation in Natural Language, LREC 2002 (Third International Conference onLanguage Resources and Evaluation. 2002:44–51.
    45 M. Wu, W. Li, Q. Lu, et al. CTEMP: A Chinese Temporal Parser for Extract-ing and Normalizing Temporal Information[C]//Proceedings of the Second Interna-tional Joint Conference on Natural Language Processing. 2005:694–706.
    46 K. Hacioglu, Y. Chen, B. Douglas. Automatic Time Expression Labeling for En-glish and Chinese Text[C]//Computational Linguistics and Intelligent Text Process-ing, 6th International Conference, CICLing. 2005, 5:548–559.
    47 D. Ahn, S. Adafre, M. Rijke. Towards Task-based Temporal Extraction and Recog-nition[C]//Dagstuhl Seminar Proceedings. 2005, 5151.
    48 D. Ahn, J. van Rantwijk, M. de Rijke. A Cascaded Machine Learning Approach toInterpreting Temporal Expressions[C]//Human Language Technologies 2007: TheConference of the North American Chapter of the Association for ComputationalLinguistics. 2007:420–427.
    49 E. Saquete, P. Martinez-Barco, R. Munoz, et al. Automatic ResolutionRule Assignment to Multilingual Temporal Expressions Using Annotated Cor-pora[C]//Temporal Representation and Reasoning, 2006. TIME 2006. ThirteenthInternational Symposium on. 2006:218–224.
    50 E. Saquete, P. Mart?nez-Barco, R. Munoz, et al. Multilingual Extension of a Tempo-ral Expression Normalizer Using Annotated Corpora[C]//Proceedings of the EACL2006 Workshop on Cross-Language Knowledge Induction. 2006.
    51 R. Dale, P. Mazur. Local Semantics in the Interpretation of Temporal Expres-sions[C]//Proceedings of the Coling/ACL2006 Workshop on Annotating and Rea-soning about Time and Events. 2006.
    52 J. Wiebe, T. O’Hara, T. Ohrstrom-Sandgren, et al. An Empirical Approach to Tem-poral Reference Resolution[J]. Journal of Artificial Intelligence Research, 1998,9(247):93.
    53 D. Shen, J. Sun, H. Li, et al. Document Summarization Using Conditional RandomFields[C]//Proceedings of IJCAI. 2007.
    54 D. Radev, H. Jing, M. Stys′, et al. Centroid-based Summarization of Multiple Doc-uments[J]. Information Processing and Management, 2004, 40(6):919–938.
    55 S. Brin, L. Page. The Anatomy of a Large-scale Hypertextual Web Search En-gine[J]. Computer Networks and ISDN Systems, 1998, 30(1-7):107–117.
    56 J. Kleinberg. Authoritative Sources in a Hyperlinked Environment[J]. Journal ofthe ACM (JACM), 1999, 46(5):604–632.
    57 G. Erkan, D. Radev. LexRank: Graph-based Lexical Centrality as Salience in TextSummarization[J]. Journal of Artificial Intelligence Research, 2004, 22:457–479.
    58 R. Mihalcea, P. Tarau. TextRank: Bringing Order Into Texts[C]//In Proceedings ofEmpirical Methods in Natural Language Processing 2004. 4:6.
    59 I. Mani, E. Bloedorn. Multi-document Summarization by Graph Search and Match-ing[C]//Proceedings of the Fourteenth National Conference on Artificial Intelli-gence and the Ninth Innovative Applications of Artificial Intelligence Conference.1997:622.
    60 I. Mani, E. Bloedorn. Summarizing Similarities and Differences Among RelatedDocuments[J]. Information Retrieval, 1999, 1(1):35–67.
    61 R. Mihalcea, P. Tarau. A Language Independent Algorithm for Single and MultipleDocument Summarization[C]//2005.
    62 T. Haveliwala. Topic-sensitive Pagerank: A Context-sensitive Ranking Algorithmfor Web Search[J]. IEEE Transactions on Knowledge and Data Engineering, 2003,15(4):784–796.
    63 L. Antiqueira, O. Oliveira, L. Costa, et al. A Complex Network Approach to TextSummarization[J]. Information Sciences, 2009, 179(5):584–599.
    64 D. Zhou, J. Weston, A. Gretton, et al. Ranking on Data Manifolds[J]. In Proceed-ings of NIPS’2003, 2003.
    65 D. Zhou, O. Bousquet, T. Lai, et al. Learning with Local and Global Consistency[J].In Proceedings of NIPS’2003, 2003.
    66 X. Wan, J. Yang, J. Xiao. Manifold-ranking Based Topic-focused Multi-documentSummarization[C]//IJCAI. 2007:2903–2908.
    67 ACM New York, NY, USA. Generic Summarization and Keyphrase ExtractionUsing Mutual Reinforcement Principle and Sentence Clustering.
    68 X. Wan, J. Yang, J. Xiao. Towards an Iterative Reinforcement Approach for Simul-taneous Document Summarization and Keyword Extraction[C]//Proceedings of the
    45th Annual Meeting of the Association of Computational Linguistics. 2007:552–559.
    69 F. Wei, W. Li, Q. Lu, et al. Query-sensitive Mutual Reinforcement Chain and itsApplication in Query-oriented Multi-document Summarization[C]//Proceedings ofthe 31st annual international ACM SIGIR conference on Research and developmentin information retrieval. 2008:283–290.
    70 Z. Lin, M.-Y. Kan. Timestamped Graphs: Evolutionary Models of Text for Multi-document Summarization[C]//Proceedings of HLT-NAACL 2007 Workshop onTextGraphs-2. 2007.
    71 J. Ponte, W. Croft. A Language Modeling Approach to Information Re-trieval[C]//Proceedings of the 21st annual international ACM SIGIR conferenceon Research and development in information retrieval. 1998:275–281.
    72 C. ZHAI, J. LAFFERTY. A Study of Smoothing Methods for Language ModelsApplied to Information Retrieval[J]. ACM Transactions on Information Systems,2004, 22(2):179–214.
    73 C. Manning, H. Schu¨tze. Foundations of Statistical Natural Language Process-ing[M]. MIT Press, 1999.
    74 V. Lavrenko, W. Croft. Relevance Based Language Models[C]//Proceedings of the24th annual international ACM SIGIR conference on Research and development ininformation retrieval. 2001:120–127.
    75 J. Jagarlamudi, P. Pingali, V. Varma. A Relevance-based Language Modeling Ap-proach to Duc 2005[C]//Proceedings of the Document Understanding Conference.2005.
    76 A. Berger, J. Lafferty. Information Retrieval as Statistical Transla-tion[C]//Proceedings of the 22nd annual international ACM SIGIR conference onResearch and development in information retrieval. 1999:222–229.
    77 P. Brown, S. Della Pietra, V. Della Pietra, et al. A Statistical Approach to MachineTranslation[J]. Computational Linguistics, 1990, 16(2):79–85.
    78 F. J. Och, H. Ney. Discriminative Training and Maximum Entropy Models for Sta-tistical Machine Translation[C]//ACL’02: Proceedings of the 40th Annual Meetingon Association for Computational Linguistics. Morristown, NJ, USA: Associationfor Computational Linguistics, 2001:295–302.
    79 D. Lawrie. Language Models for Hierarchical Summarization[D]University ofMassachusetts Amherst, 2003.
    80 Generating Impact-based Summaries for Scientific Literature.
    81 J. Carbonell, J. Goldstein. The Use of Mmr, Diversity-based Reranking for Re-ordering Documents and Producing Summaries[C]//Proceedings of the 21st annualinternational ACM SIGIR conference on Research and development in informationretrieval. 1998:335–336.
    82 B. Zhang, H. Li, Y. Liu, et al. Improving Web Search Results Using AffinityGraph[C]//Proceedings of the 28th annual international ACM SIGIR conferenceon Research and development in information retrieval. 2005:504–511.
    83 X. Wan. TimedTextRank: Adding the Temporal Dimension to Multi-documentSummarization[C]//Proceedings of the 30th annual international ACM SIGIR con-ference on Research and development in information retrieval. 2007:867–868.
    84 H. X. X. Y. Z. Jin Zhang, Xueqi Cheng. ICTCAS’s Ictgrasper at Tac 2008:Summarizing Dynamic Information with Signature Terms Based Content Filter-ing(draft)[C]//Proceedings of the TAC2008.
    85 H. Hardy, N. Shimizu, T. Strzalkowski, et al. Cross-document Summarization byConcept Classification[C]//Proceedings of the 25th annual international ACM SI-GIR conference on Research and development in information retrieval. 2002:121–128.
    86 C. Lin, E. Hovy. From Single to Multi-document Summarization: A Prototype Sys-tem and its Evaluation[C]//Proceedings of the 40th Annual Meeting on Associationfor Computational Linguistics. 2002:07–12.
    87 D. H.-T. Dan Gillick, Benoit Favre. The Icsi Summarization System at Tac2008[C]//Proceedings of the TAC2008.
    88 R. McDonald. A Study of Global Inference Algorithms in Multi-document Summa-rization[C]//Proceedings of the 29th European Conference on IR Research. 2007.
    89 K. S. Jones. Automatic Summarising: The State of the Art[J]. Information Pro-cessing & Management, 2007, 43(6):1449–1481.
    90 S. Harabagiu, S. Maiorano. Multi-document Summarization with Gistex-ter[C]//Proc. of LREC. 2002.
    91 N. Reithinger, M. Kipp, R. Engel, et al. Summarizing Multilingual Spoken Nego-tiation Dialogues[C]//Proceedings of the 38th Annual Meeting on Association forComputational Linguistics. 2000:310–317.
    92 Y. Ye, V. Fossum, S. Abney. Latent Features in Automatic Tense Translation be-tween Chinese and English[C]//Proceedings of the Fifth SIGHAN Workshop onChinese Language Processing. 2006:48–55.
    93秦兵,刘挺,李生.多文档自动文摘综述[J].中文信息学报, 2005, 19(006):13–20.
    94 E. Brill. Transformation-based Error-driven Learning and Natural Language Pro-cessing: A Case Study in Part-of-speech Tagging[J]. Computational linguistics,1995, 21(4):543–565.
    95 A. J.F. Towards a General Theory of Action and Time[J]. Artificial Intelligence,1984, 23(2):123–154.
    96 J. Pustejovsky, J. Castano, R. Ingria, et al. TimeML: Robust Specification of Eventand Temporal Expressions in Text[C]//IWCS-5 Fifth International Workshop onComputational Semantics. 2003.
    97 G. Wilson, I. Mani, B. Sundheim, et al. A Multilingual Approach to Annotating andExtracting Temporal Information[C]//Proceedings of ACL Workshop on Temporaland Spatial Information Processing. 2001:81–87.
    98王永庆.人工智能原理与方法[M].西安交通大学出版社, 1998.
    99 G. Luger,史忠植, et al.人工智能――复杂问题求解的结构和策略, 2004.
    100 P. Hart, N. Nilsson, B. Raphael. A Formal Basis for the Heuristic Determination ofMinimum Cost Paths[J]. IEEE transactions on Systems Science and Cybernetics,1968, 4(2):100–107.
    101 C. Chang, C. Lin. A Practical Guide to Support Vector Classification, 2003. http://www.csie.ntu.edu.tw/cjlin/papers/guide/guide.pdf.
    102 R. He, B. Q, T. L, et al. A Novel Heuristic Error-driven Learning for RecognizingChinese Time Expression[J]. Journal of Chinese Language and Computing, 2008,18(4):139–159.
    103 C. Lin. ROUGE: A Package for Automatic Evaluation of Summaries[J]. Proceed-ings of the Workshop on Text Summarization Branches Out, 2004:25–26.
    104 H. Seung, D. Lee. The Manifold Ways of Perception[J]. Science, 2000,290(5500):2268–2269.
    105陈惠勇.流形概念的起源与发展[J].太原理工大学学报:社会科学版, 2007,25(003):53–57.
    106徐蓉,姜峰,姚鸿勋.流形学习概述[J].智能系统学报, 2006, 1(001):44–51.
    107罗四维,赵连伟.基于谱图理论的流形学习算法[J].计算机研究与发展, 2006,43(007):1173–1179.
    108 S. Roweis, L. Saul. Nonlinear Dimensionality Reduction by Locally Linear Em-bedding[J]. Science, 2000, 290(5500):2323–2326.
    109 J. Tenenbaum, V. Silva, J. Langford. A Global Geometric Framework for NonlinearDimensionality Reduction[J]. Science, 2000, 290(5500):2319–2323.
    110 S. Jones, et al. A Statistical Interpretation of Term Specificity and its Applicationin Retrieval.[J]. Journal of Documentation, 1972, 28(1):11–21.
    111 B. Dolan, C. Quirk, C. Brockett. Unsupervised Construction of Large ParaphraseCorpora: Exploiting Massively Parallel News Sources[C]//Proceedings of the 20thinternational conference on Computational Linguistics. 2004.
    112 K. R. Andrew Hickl, F. Lacatusu. LCC’s Gistexter at Duc 2007: Machine Readingfor Update Summarization[J]. Proceedings of the DUC2007.
    113 E. Boros, P. Kantor, D. Neu. A Clustering Based Approach to Creating Multi-document Summaries[C]//Proceedings of the 24th Annual International ACM SI-GIR Conference on Research and Development in Information Retrieval. 2001.
    114 X. Wan, J. Yang. Multi-document Summarization Using Cluster-based Link Anal-ysis[C]//SIGIR’08: Proceedings of the 31st annual international ACM SIGIR con-ference on Research and development in information retrieval. New York, NY,USA: ACM, 2008:299–306.
    115 L. Zelnik-Manor, P. Perona. Self-tuning Spectral Clustering[J]. Advances in NeuralInformation Processing Systems, 2004, 17(1601-1608):16.
    116 M. Brand, K. Huang. A Unifying Theorem for Spectral Embedding and Cluster-ing[C]//Proceedings of the Ninth International Workshop on Aritficial Intelligenceand Statistics, Key West, FL, January. 2003.
    117 U. von Luxburg. A Tutorial on Spectral Clustering[J]. Statistics and Computing,2007, 17(4):395–416.
    118 W. Donath, A. Hoffman. Lower Bounds for the Partitioning of Graphs[J]. IBMJournal of Research and Development, 1973, 17(5):420–425.
    119 M. Fiedler. Algebraic Connectivity of Graphs[J]. Czechoslovak Mathematical Jour-nal, 1973, 23(98):298–305.
    120 U. Von Luxburg, O. Bousquet, M. Belkin. On the Convergence of Spectral Cluster-ing on Random Samples: The Normalized Case[J]. LECTURE NOTES IN COM-PUTER SCIENCE., 2004:457–471.
    121 U. Von Luxburg, O. Bousquet, M. Belkin. Limits of Spectral Clustering[J]. Ad-vances in neural information processing systems, 2004, 17.
    122 D. Wagner, F. Wagner. Between Min Cut and Graph Bisection[J]. Lecture Notes inComputer Science, 1993:744–744.
    123 G. Stewart. Matrix Perturbation Theory[J]. SIAM Review, 1990.
    124 R. Horn, C. Johnson. Matrix Analysis[M]. Cambridge Univ Pr, 1990.
    125张瑾.面向Web话题的多文档文摘关键技术研究[D]中国科学院研究生院,2009.1.
    126 R. K. Prasad Pingali, V. Varma. IIIT Hyderabad at Duc 2007[C]//Proceedings ofthe DUC2007.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700