The development of Internet produces the explosive growth of multimedia informa-tion, such as text, picture, audio, video and so on. In the era of greatly rich informationand relative lack of knowledge, people fall into a kind of information anxiety. As timegoes, the relevant multimedia information also gradually updates and evolves. How toeffectively acquire and organize information becomes a challenge in information extrac-tion. This paper emphasizes on studying text compression technology for the goal ofinformation compression.
     Temporal multi-document summarization (TMDS) is a new direction in automaticsummarization. It is the natural extension of multi-document summarization, which cap-tures evolving information of a single topic over time. The greatest difference from tra-ditional static multi-document summarization is that it deals with the dynamic collectionbeyond the same period, say, the relevant document collection across periods. It mainlyaims to automatically summarize series of news reports so as to help people to efficientlyacquire the evolutionary content. With the conduct of international evaluation DUC 2007and TAC 2008, the relevant researches become more and more emphasized by industry,academia, and government. TMDS has a wide application future, which can be used tonews search engine, commercial intelligence analysis, trend prediction. It will bring greatsocial value by satisfying people’s needs.
     The research object in the thesis, series of news report, has strong temporal char-acteristics. It can be considered that static multi-document summarization in the sameperiod is a special situation of TMDS. Therefore, the research keystone of TMDS is howto resolve the two difficult problems of static multi-document summarization in temporalcontext. Previous researches rarely consider temporal information. Our thesis focuseson how to recognize temporal characteristics and use it to deeply mine extractive contentselection of TMDS. We also try to keep the summary content to be important, novel andfull-coverage. The mainly research problems are as follows:
     1. Time Expression Recognition and Normalization. Understanding semantic oftext is the ultimate goal of natural language processing, and temporal semantic is neces-sary for understanding text. Time expression recognition and normalization are the basis of temporal semantic labeling, which build a foundation for content selection and lan-guage quality controal of TMDS, and also support other temporal information extractionapplications.
     2. Macro-micro importance discriminative model based content selection. Basedon the principle of stepwise refinement, we assume that the time slices in series of newsreport are independent. Content selection method of TMDS with macro-micro importancediscriminative model is explored through analyzing the evolutionary macro and microtemporal characteristics.
     3. Evolutionary manifold ranking based topic oriented content selection. Series ofnews report continuously evolve along timeline. Further step, it is assumed that contentevolution in the current time slice is dependent on topic content in the previous time slice.We study how to enhance the expression capability of the static query and embody thedynamic evolution of query, and how these changes in?uence content selection. We pro-pose the evolutionary manifold ranking based on iterative feedback mechanism in order tomodel the dynamic characteristics of topic evolution in series of news report. It providesthe temporally adaptive ranking algorithm for content selection of TMDS.
     4. Topic oriented content selection optimization strengthened by spectral cluster-ing. Based on evolutionary manifold ranking, we adopt normalized spectral clustering toimprove content coverage and design temporal redundancy removal strategy to keep thesummary content to be more novel. We explore the optimization content selection methodby combining sub-topics ordering with novel redundancy removal strategy. In the updatesummarization task of TAC 2008, we receive the competitive evaluation performance,proving the superiority of our approach.
     This thesis explores TMDS and its content selection,which makes some progress.The proposed methods have language independence. It builds a deep foundation for futurework.
