新闻话题表示模型和关联追踪技术研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
话题发现与追踪研究是对大规模的新闻流数据进行分析,以期发现、追踪、组织其中包含的多个话题。话题被定义为“一个具体事件(或活动)以及与之直接相关的事件(或活动)集合”。自1996年确立研究方向以来,一直是自然语言处理领域的热点。到目前为止,话题发现与追踪相关技术已经被广泛应用,尤其是舆情监控和新知识发现这两个方面。
     本文对话题发现与追踪中的话题关联识别和话题追踪问题进行研究,在表示模型和关联追踪方法上提出了以下改进技术:
     话题关联识别,判断随机两篇报道的话题相关性,即是否描述同一个话题,是话题发现与追踪研究的核心技术。在这方面主要取得了以下研究成果:
     事件模型:对报道表示模型中的特征选择、相似度计算方法以及多向量表示模型的特征集合划分标准进行分析,提出了一种基于事件框架的多向量事件模型,并在使用过程中结合不均衡支持向量机分类模型解决了训练数据中正负样本比例失调的问题。此外,还对模型间的模糊匹配技术进行了初步研究。实验表明,基于事件模型的话题关联识别系统的性能有较大幅度的改进。
     动态信息扩充技术:针对单个报道中内容较少以及内含话题可能发生演化漂移的问题,把处理过的报道对充分利用起来,打破报道对之间的独立性,提出了一种动态扩充方法,并对扩充信息进一步分析研究,挑选出核心信息、名实体信息、依存名词三类信息进行精化,在最大程度上确保表示模型的有效性。实验表明,无论是动态扩充方法还是三种特征精化策略都能很好地改进话题关联识别系统的性能,是进一步改进识别效果的两个有效途径。
     话题追踪,根据一个话题的已知信息在一个报道流中追踪该话题的相关报道,是话题发现与追踪的主要研究内容之一,也是话题发现与追踪中唯一一个有先验知识的研究任务。在这方面主要取得了以下研究成果:
     动态话题模型:针对待追踪话题存在的话题漂移现象,提出了一种新的动态话题模型,这也是上述信息扩充技术的延续和深入。该模型使用一个基于话题的权重计算方法,把训练数据按话题聚类,从话题的角度度量所有追踪到的相关报道特征,在此基础上从全局的角度选择特征用于扩充,在学习相关信息的同时也尽可能地减小伪相关报道中的噪音影响。另外还用最新的话题无关报道来定位过滤当前话题模型中的动态噪音。实验表明,这种话题模型能够很好地动态调整发生了偏移的话题,不仅能够保证追踪性能不衰退,还能使追踪性能进一步提高。基于话题的权重计算方法也可以用于静态模型中的特征度量,并且是有效的。
     联合追踪方法:由于话题追踪中已知相关信息较少,致使追踪性能起点低,且无法处理追踪过程中遇到的新知识,同时也为了充分发挥话题关联识别技术在判断话题关联性方面的特点,提出了一种联合追踪方法。该方法首先从可包含任意话题的训练数据中设计一个独立于具体话题的基于关联特征的追踪方法,然后以线性组合的方式使该方法辅助基于已知信息的追踪方法。实验表明,联合追踪方法能够较好地解决上述问题,更重要的是该方法综合了本文提出的大部分改进技术且使性能获得累计改进。
Topic detection and tracking (TDT) is to analyze a stream of news stories and tryto find, track and thread the embedded topics. A topic is defined as a specific event oractivity plus directly related events or activities. Since being established as a researchfield in 1996, TDT has been an issue in the area of Natural Language Processing. Up tonow, great successes have been made and TDT techniques have been widely used in manyapplications, especially public opinions monitoring and new knowledge mining.This dissertation concentrates on two tasks in TDT: story link detection and topictracking. Some techniques are proposed for the representation model, the link detectionand the tracking methods.
     ? Story link detection is the problem of deciding whether two stories discuss a sametopic in a stream of story pairs. It is the key technique of TDT. The achieved resultson this task are as follows:
     Event model: Based on the analysis of the feature selection, the similarityfunction and the partition criterion of the multi-vector model, an event modelis proposed according to the event framework. When using the event model,we take the uneven SVM model to solve the uneven problem in the trainingdata. The fuzzy matching technique between two models has also been tried.As indicated by the experiments, the performance of the story link detectionsystem using the event model is improved significantly.
     Dynamicinformationextending: Toovercomethelimitationofthestorylength,the sparse data and the possible topic drift in a story, we break the independentassumptionbetweentwostorypairsandproposeatechniqueofdynamicinfor-mation extending. It extends the current story with its previous latest topicallyrelated story. In addition, we also study the refinement of the extended infor-mation. Three kinds of information, including kernel information, noun enti-ties (person, location, organization) and noun dependency of noun entities, areselected to improve the effectiveness of the representation model. The experi-mental results indicate that the dynamic extending and the refinement method are effective and can both improve the performance of story link detectionsystems evidently.
     Topic tracking associates the incoming stories in a stream with a topic pre-identifiedbyafewstoriesandfindsallthestoriesrelatedtothetopic. Itistheonlytaskthathasprior information in TDT research. The achieved results on this task are as follows:
     Dynamic topic model: To overcome the topic drift problem, a dynamic modelis designed to represent a tracked topic, which continues the research on theabove dynamic extending. This model selects the features to update a topicmodel globally from all the incoming related stories. The information in thepseudo-related stories can be ignored in this procedure. Besides, a topic-basedweighting method is proposed, which takes the training data as topic-clusteredand measures a feature from the perspective of topics. Besides, the latest unre-lated story is also used to filter the noise in the topic model. The experimentalresults indicate that the dynamic topic model can well handle drifted topicsand improve the tracking performance.
     Joint tracking method: Since a topic description usually does not provideenough information and the new information in the incoming stories can notbe handled, we propose a joint tracking method, which is also a new way ofusing the techniques of story link detection for topic tracking. This methodfirstly constructs a tracking method using a kind of topic-independent linkage-basedfeaturesfromthedataaboutothertopics, andthenlinearlycombinesthismethod with the predefined related information-based tracking method. Theexperimental result sindicate that the joint tracking method can solve the aboveproblem. More important, it can integrate most of the proposed techniques inthis dissertation and the achieved improvement can be cumulated.
     The futureworkwillfocusonstudyingmoreaboutTDTandothertopic-relatedappli-cations such as network monitoring and topic-based summarization. In addition, althoughour work are tested and evaluated on the Chinese subset of TDT4, they should be inde-pendent of the language and the representation style.
引文
[1] Gantz J. F., Chute C., Manfrediz A.,等.多元化和数字宇宙爆炸:2011年全球信息增长最新预测[R].国际数据公司, 2008.3.
    [2] Allan J. . Topic Detection and Tracking: Event-based Information Organiza-tion[M]. 2002, Norvell, Massachusetts: Kluwer Academic Publishers, InformationRetrieval,vol.12.
    [3]洪宇,张宇,刘挺,等.话题检测与跟踪的评测及研究综述[J].中文信息学报,2007, 21(6):71--87.
    [4] Wayne C. . Topic Detection and Tracking (TDT): Overview and Perspec-tive[C]Proceedings of the Broadcast News Transcription and Understanding Work-shop. Lansdowne, Virginia: Kluwer Academic Publishers, 1998:98.
    [5] Allan J., Carbonell J., Doddington G., et al. Topic Detection and Tracking PilotStudy Final Report[C]Proceedings of Broadcast News Transcription and Under-standing Workshop. Lansdowne, VA: NIST, 1998:194--218.
    [6] NIST. The 2003 Topic Detection and Tracking Task Definition and EvaluationPlan[R]. http://www.itl.nist.gov/iaui/894.01/tests/tdt/tdt2003/evalplan.htm: Na-tional Institute of Standards and Technology(NIST), 2003.
    [7] Allan J., Harding S., Fisher D., et al. Taking Topic Detection from Evaluationto Practice[C]Proceedings of the 38th Annual Hawaii International Conference onSystem Sciences. Big Island, HI, USA: IEEE Computer Society, 2005,vol.4.
    [8] Leuski A., Allan J. . Improving Realism of Topic Tracking Evalua-tion[C]Proceedings of the 25th annual International ACM SIGIR Conference onResearch and Development in Information Retrieval. Tampere, Finland: ACM,2002:89--96.
    [9] Bun K. K., Ishizuka M. . Emerging Topic Tracking System in WWW[J].Knowledge-based System, 2006, 19(3):164--171.
    [10] Sekiguchi Y., Kawashima H., Okuda H., et al. Topic Detection from Blog Docu-ments Using Users' Interests[C]Proceedings of the 7th International Conference onMobile Data Management (MDM 2006). Nara, Japan: IEEE Computer Society,2006:108.
    [11] NieL., DavisonB.D., QiX.. TopicalLinkAnalisisforWebSearch[C]Proceedingsof the 29th Annual International ACM SIGIR Conference on Research and Devel-opment in Information Retrieval. Seattle, Washington, USA: ACM, 2006:91--98.
    [12] Cselle G., Albrecht K., Wattenhofer R. . BuzzTrack: Topic Detection and Trackingin Email[C]Proceedings of the 2007 International Conference on Intelligent UserInterfaces. Honolulu, Hawaii, USA: ACM, 2007:190--197.
    [13] Mori M., Miura T., Shioya I. . Topic Detection and Tracking for News WebPages[C]ACM International Conference on Web Intelligence (WI 2006). HongKong, China: IEEE Computer Society, 2006:338--342.
    [14] Qiu J., Liao L., Dong X. . Topic Detection and Tracking for Chinese News WebPages[C]Proceedings of the 2008 International Conference on Advanced LanguageProcessing and Web Information Technology. Liaoning, China: IEEE ComputerSociety, 2008:114--120.
    [15]宋丹,林鸿飞,杨志豪.基于内容计算和链接分析的web话题跟踪方法[J].情报学报, 2007, 26(4):555--560.
    [16] Iwata T., Watanabe S., Yamada T., et al. Topic tracking model for analyzing con-sumer purchase behavior[C]Proceedings of the 21st international jont conferenceon Artifical intelligence. Pasadena, California, USA: Morgan Kaufmann Publish-ers Inc., 2009:1427--1432.
    [17]陈友,程学旗,杨森.面向网络论坛的突发话题发现[J].中文信息学报, 2010,24(3):29--36.
    [18] Aggarwal C. C. . Data Streams: Models and Algorithms[M]. 2006, Heidelberg,Berlin: Springer, Advances in Database Systems,vol.31.
    [19] Ye H., Cheng W., Dai G. . Design and Implementation of Online Hot Topic Discov-ery Model[J]. Wuhan University Journal of Natural Sciences, 2006, 11(1):21--26.
    [20] Chen K. Y., Luesukprasert L., Chou S. c T. . Hot Topic Extraction Based on Time-line Analysis and Multidimensional Sentence Modeling[J]. IEEE Transactions onKnowledge and Data Engineering, 2007, 19(8):1016--1025.
    [21] He T., Qu G., Li S., et al. Semi-automatic Hot Event Detection[J]. Advanced DataMining and Applications, 4093.
    [22] Aurora P. P., Rafael B. L., JoséR. S. . Topic Discovery Based on Text MiningTechniqures[J]. Information Processing and Management, 2007, 43(3):752--768.
    [23] Leite D. S., Rino L. H. M., Pardo T. A. S., et al. Extractive Automatic Sum-marization: Does More Linguistic Knowledge Make a Difference[C]Proceedingsof the 2nd Workshop on TextGraphs: Graph-Based Algorithms for Natural Lan-guage Processing. Rochester, NY, USA: Association for Computational Linguis-tics, 2007:17--24.
    [24] Chen H. H. . Topic Tracking, Detection and Summarization: Some IE Applica-tions[R]. Department of Computer Schince and Information Engineering, NationalTaiwan University, 2009.
    [25]戴尚学.运用事件侦测与追踪技术于中文多文件摘要之研究[D].中国台湾云林:国立云林科技大学, 2003.
    [26] KellyD., DiazF.,BelkinN.J., etal. AUser-centeredApproachtoEvaluatingTopicModels[C]Proceedings of the 26th European Conference on Information Retrieval.Sunderland, UK: Springer, 2004:27--41.
    [27] Liu H., Milios E., Janssen J. . Focused Crawling by Learning HMM fromUser's Topic-specific Browsing[C]Proceedings of the 2004 IEEE/WIC/ACM Inter-national Conference on Web Intelligence. Beijing, China: IEEE Computer Society,2004:732--732.
    [28]李保利,俞士汶.话题识别与跟踪研究[J].计算机工程与应用, 2003, 39(17):7--10.
    [29]于满泉,骆卫华,许洪波,等.话题识别与跟踪中的层次化话题识别技术研究[J].计算机研究与发展, 2006, 43(3):489--495.
    [30]邱立坤,龙志祎,钟华,等.层次化话题发现与跟踪方法及系统实现[J].广西师范大学学报(自然科学版), 2007, 25(2):157--160.
    [31]赵华,赵铁军,于浩,等.基于查询向量的英语话题跟踪研究[J].计算机研究与发展, 2007, 44(8):1412--1417.
    [32]王会珍,朱靖波,季铎,等.基于反馈学习自适应的中文话题追踪[J].中文信息学报, 2006, 20(3):94--100.
    [33]王会珍,朱靖波,季铎,等.基于多向量模型的中文话题追踪[C]自然语言理解与大规模内容计算.南京:清华大学出版社, 2005:669--671.
    [34]张阔,李涓子,吴刚,等.基于词元再评估的新事件检测模型[J].软件学报,2008, 19(4):817--828.
    [35] Zhang K., Li J., Wu G. . New Event Detection Based on Indexing-tree and NamedEntity[C]Proceedings of the 30th Annual International ACM SIGIR Conference onResearchandDevelopmentinInformationRetrieval. Amsterdam,TheNetherlands:ACM, 2007:215--222.
    [36]苏金树,张博锋,徐昕.基于机器学习的文本分类技术研究进展[J].软件学报,2006, 17(9):1848--1859.
    [37] Okin J. . The Information Revolution: the Not-for-dummies Guide to the History,Technology, and Use of the World Wide Web[M]. New York, United States: Iron-bound Pr, 2005.
    [38] Baeza-Yates R., Ribeiro-Neto B. . Modern Information Retrieval[M]. New York,USA: ACM Press, 1999.
    [39] AllanJ., FengA., BolivarA.. FlexibleIntrinsicEvaluationofHierarchicalCluster-ing for TDT[C]Proceedings of the 2003 ACM CIKM International Conference onInformation and Knowledge Management. New Orleans, Louisiana, USA: ACM,2003:263--270.
    [40] Lavrenko V., Allan J., DeGuzman E., et al. Relevance Models for Topic Detectionand Tracking[C]Proceedings of Human Language Technology Conference (HLT).San Diego, CA, USA: Morgan Kaufmann Publishers, 2002:104--110.
    [41] Garcia E. Description, Advantages and Limitations of the Classic Vector SpaceModel. http://www.miislita.com/term-vector/term-vector-3.html .
    [42] Allan J., Lavrenko V., Frey D., et al. UMass at TDT 2000[C]Proceedings of TopicDetection and Tracking Workshop (TDT-2000). 2000.
    [43] Allan J., Lavrenko V., Malin D., et al. Detections, Bounds, and Timelines: UMassand TDT-3[C]Proceedings of Topic Detection and Tracking (TDT--3). Vienna, VA:2000:167--174.
    [44] Allan J., Lavrenko V., Nallapati R. . UMass at TDT 2002[C]Proceedings of TDTWorkshop. 2002.
    [45] Connell M., Cronen-Townsend S., Feng A., et al. UMass TDT 2003 ResearchSummary[C]Proceedings of the TDT2003 Workshop. 2003.
    [46] Connell M., Feng A., Kumaran G., et al. UMass at TDT 2004[C]Proceedings ofthe TDT2004 Workshop. 2004.
    [47] Papka R., Allan J., Lavranko V. . UMass Approaches to Detection and Tracking atTDT2[C]Proceedings of the TDT1999 Workshop. 1999.
    [48] Chen H. H., Ku L. W. . Description of a Topic Detection Algorithm on TDT3Mandarin Text[C]Proceedings of the TDT Workshop. Vienna,Virginia, 2000:165--166.
    [49] Chang H. C. . Extraction of Topic and Event Keywords from NewsStory[C]Proceedings of 2007 National Computer Symposium. Taipei, 2007.
    [50] Lam W., Tsang C. K., Wong T. L., et al. CUHK's Link Detection System for theTDT2001 Evaluation[C]Proceedings of the TDT2001 Workshop. 2001.
    [51] Kumaran G., Allan J. . Using Names and Topics for New Event Detec-tion[C]Human Language Technology Conference and Conference on EmpiricalMethods in Natural Language Processing. Vancouver, British Columbia, Canada:The Association for Computational Linguistics, 2005:121--128.
    [52] Makkonen J., Ahonen-Myka H., Salmenkivi M. . Simple Semantics in Topic De-tection and Tracking[J]. Information Retrieval, 2004, 7(3-4):347--368.
    [53] Makkonen J., Ahonen-Myka H., Salmenkivi M. . Topic Detection and Trackingwith Spatio-temporal Evidence[C]Proceedings of the 25th European conference onIR research. Pisa, Italy: Springer, 2003,vol.2633:251--265.
    [54] Hoogma N. . The Modules and Methods of Topic Detection and Track-ing[C]Proceedings of the 2nd Twente Student Conference on IT. 2005.
    [55] Shah C., Croft W. B., Jensen D. . Representing Documents with Named Entitiesfor Story Link Detection[C]Proceedings of the 15th ACM international Conferenceon Information and Knowledge Management. Arlington, Virginia, USA: ACM,2006:868--869.
    [56] Wallach H. . Topic Modeling: Beyond Bag-of-words[C]Proceedings of the 23rdInternational Conference on Machine Learning. Pittsburgh, Pennsylvania, U.S.:IMLS/ICML, 2006:977–984.
    [57] Nallapati R. . Semantic Language Models for Topic Detection and Track-ing[C]Proceedings of the 2003 Conference of the North American Chapter ofthe Association for Computational Linguistics on Human Language Technology:Proceedings of the HLT-NAACL 2003 Student Research Workshop. Edmonton,Canada: Association for Computational Linguistics, 2003,vol.3:1--6.
    [58] Croft W. B., Lafferty J. . Language Modeling for Information Retrieval[M]. Berlin,
    [59]郑伟,张宇,邹博伟,等.基于相关性模型的中文话题跟踪研究[C]自然语言理解与大规模内容计算.南京:清华大学出版社, 2005:558--564.
    [60] Lee C., Lee G. G., Jang M. . Dependency Structure Language Model forTopic Detection and Tracking[J]. Information Processing and Management, 2007,43(5):1249--1259.
    [61] Carthy J. . Lexical Chains versus Keywords for Topic Tracking[C]Proceedings ofthe 5th International Conference on Intelligent Text Processing and ComputationalLinguistics. Seoul, Korea: Springer, 2004,vol.2945:507--510.
    [62] Stokes N., Hatch P., Carthy J. . Topic Detection, a New Application for LexicalChaining?[C]Proceedings of the 22nd BCS IRSG Colloquium. 2000:94--103.
    [63]赵林,胡恬,黄萱菁,等.基于知网的概念特征抽取方法[J].通信学报, 2004,25(7):46--54.
    [64]周昭涛,卜东波,程学旗.文本的图表示初探[J].中文信息学报, 2005,19(2):36--43.
    [65] Witschel H. F. . Multi-level Association Graphs- a New Graph-based Modelfor Information Retrieval[C]Proceedings of the HLT-NAACL-07 Workshop onTextgraphs. Rochester, NY: Association for Computational Linguistics, 2007:484--491.
    [66] Eichmann D. . Link Detection[R]. Iowa City: School of Library and InformationScience, the University of Iowa, 2004.
    [67] Ogilvie P. . Extracting and Using Relationships Found in Text for Topic Track-ing[D]. Pittsburgh, Pennsylvania, USA, 2000.
    [68] Luo X. . Information Extraction for New Event Detection[R].http://www.nist.gov/speech/tests/tdt/tdt2004/papers/IBM-NED-TDT2004.ppt:IBM, 2004.
    [69] Larkey L. S., Feng F., Connell M. E., et al. Language-specific Models in Multilin-gual Topic Tracking[C]Proceedings of the 27th Annual International ACM SIGIRConferenceonResearchandDevelopmentinInformationRetrieval. Sheffield,UK:ACM, 2004:402--409.
    [70] Fukumoto F., Suzuki Y. . Using Bilingual Comparable Corpora and Semi-supervised Clustering for Topic Tracking[C]Proceedings of the 21st InternationalConference on Computational Linguistics and 44th Annual Meeting of the Associ-ation for Computational Linguistics. Sydney, Australia: The Association for Com-puter Linguistics, 2006:231--238.
    [71] Farahat A., Chen F., Brants T. . Optimizing Story Link Detection is not Equivalentto Optimizing New Event Detection[C]Proceedings of the 41st Annual Meetingon Association for Computational Linguistics. Sapporo, Japan: Association forComputational Linguistics, 2003,vol.1:232--239.
    [72] ElsayedT.,OardD.W.,DoermannD.,etal. TDT-2004: AdaptiveTopicTrackingatMaryland[C]Working Notes of the TDT-2004 Workshop. Gaithersburg, Maryland,2004.
    [73] Spitters M., Kraaij W. . A Language Modeling Approach to Tracking NewsEvents[C]Proceedings of the TDT2000 Workshop. Gaithersburg, MD, USA,2000:101--106.
    [74] ShahC., EguchiK.. UseofTopicalityandInformationMeasurestoImproveDocu-ment Representation for Story Link Detection[C]Proceedings of the 29th EuropeanConference on IR Research. Rome, Italy: Springer, 2007,vol.4425:393--404.
    [75] Wei F. . StreamMiner: a Classifier Ensemble-based Engine to Mine Concept-drifting Data Streams[C]Proceedings of the 30th International Conference on VeryLarge Data Bases. Toronto, Canada: Morgan Kaufmann, 2004:1257--1260.
    [76] Wang H., Fan W., Yu P. S., et al. Mining Concept-drifting Data Streams Us-ing Ensemble Classifiers[C]Proceedings of the 9th ACM SIGKDD InternationalConference on Knowledge Discovery and Data Mining. Washington, D.C.: ACM,2003:226--235.
    [77] Hulten G., Spencer L., Domingos P. . Mining Time-changing DataStreams[C]Proceedings of the 7th ACM SIGKDD International Conference onKnowledge Discovery and Data Mining. San Francisco, California: ACM,2001:97--106.
    [78] Law Y. N., Zaniolo C. . An Adaptive Nearest Neighbor Classification Algorithmfor Data Streams[C]Proceedings of the 9th European Conference on Principles andPractice of Knowledge Discovery in Databases. Porto, Portugal: Springer, 2005,vol.3721:108--120.
    [79] Li B., Li W., Li Q. . Enhancing Topic Tracking with Temporal Informa-tion[C]Proceedings of the 29th Annual International ACM SIGIR Conference onResearch and Development in Information Retrieval. Seattle, Washington, USA:ACM, 2006:667--668.
    [80] GiannellaC., HanJ., PeiJ., etal. MiningFrequentPatternsinDataStreamsatMul-tiple Time Granularities[C]Proceedings of the NSF Workshop on Next GenerationData Mining. AAAI/MIT Press, 2003:191--212.
    [81] Wayne C. . Multilingual Topic Detection and Tracking: Successful Research En-abled by Corpora and Evaluation[C]Language Resources and Evaluation Confer-ence (LREC). 2000:1487--1494.
    [82] Chen F., Farahat A., Brants T. . Multiple Similarity Measures and Source-pairInformation in Story Link Detection[C]Human Language Technology Conferenceof the North American Chapter of the Association of Computational Linguistics.Boston, MA, USA: Association for Computational Linguistics, 2004:313--320.
    [83] Rajaraman K., Tan A. H. . Topic Detection, Tracking and Trend Analysis UsingSelf-organizing Neural Networks[C]Proceedings of the 5th Pacific-Asia Confer-ence on Knowledge Discovery and Data Mining. Hong Kong, China: Springer,2001,vol.2035:102--107.
    [84] Kumaran G., Allan J., McCallum A. . Classification Models for New Event Detec-tion[R]. Massachusetts, USA: University of Massachusetts, 2004.
    [85] Zhang Y., Callan J. . Combining Multiple Learning Strategies to Improve Trackingand Detection Performance[C]The DARPA Topic Detection and Tracking Work-shop. 2004:129--134.
    [86] Yamron J., Gillick L., Knecht S., et al. Statistical Models for Tracking and Detec-tion[C]Proceedings of the DARPA Topic Detection and Tracking Workshop. 2000.
    [87] Braun R. K., Kaneshiro R. . Exploiting Topic Pragmatics for New Event Detec-tion in TDT-2004[R]. http://www.itl.nist.gov/iad/mig//tests/tdt/2004/papers/SHAI-TDT2004Slides.ppt: National Institute of Standards and Technology, 2003.
    [88] Brown R. D. . Dynamic Stopwording for Story Link Detection[C]Proceedings ofthe 2nd International Conference on Human Language Technology Research. SanDiego, California: Morgan Kaufmann Publishers Inc., 2002:190--193.
    [89]张华平,刘群.计算所汉语词法分析系统ICTCLAS[R].中国北京:中科院计算所, 2002. http://sewm.pku.edu.cn/QA/reference/ICTCLAS/FreeICTCLAS/ .
    [90] Li B., Li W., Lu Q. . Topic Tracking with Time Granularity Reasoning[J]. ACMTransactions on Asian Language Information Processing (TALIP), 2006, 5(4):388--412.
    [91]孙娇华.结合文档标题进行话题跟踪的研究[R].北京:北京城市学院, 2005.
    [92] McLachlan G. J., Krishnan T. . The EM Algorithm and Extensions(2nd Edi-tion)[M]. Hoboken, New Jersey, US: Wiley-Interscience, 2008.
    [93]张学工.关于统计学习理论和支持向量机[J].自动化学报, 2000, 26(1):32--42.
    [94] Cristianini N., Shawe-Taylor J. . An Introduction to Support Vector Machines andOther Kernel-based Learning Methods[M]. Cambridge, United Kingdom: Cam-bridge University Press, 2000.
    [95] Li Y., Shawe-Taylor J. . The SVM with Uneven Margin and Chinese DocumentCategorization[C]Proceedings of The 17th Pacific Asia Conference on Language,Information and Computation. Sentosa, Singapore: Colips Publications, 2003:216--227.
    [96] Platt J. C. . Probabilities for SV machinestes[C]Proceedings of Advances in LargeMargin Classifiers. Cambridge, 2000:61--74.
    [97] Ganapathiraju A., Hamaker J. E., Picone J. . Applications of Support Vector Ma-chines to Speech Recognition[J]. IEEE Transactions on Signal Processing, 2004,52(8):2348--2355.
    [98] LDC. Topic Detection and Tracking - Phase 4[R]. Linguistic Data Consortium,2003. http://projects.ldc.upenn.edu/TDT4/ .
    [99] Topic Detection and Tracking Evaluation. http://www.itl.nist.gov/iad/mig/tests/tdt/.
    [100] GALE Project. http://ciir.cs.umass.edu/research/nightingale.html .
    [101]赵华,赵铁军,于浩,等.面向动态演化的话题检测研究[J].高技术通讯, 2006,16(12):1230--1235.
    [102] Tsymbal A. . The Problem of Concept Drift: Definitions and Related Work[R].Dublin, Ireland: Computer Science Department, Trinity College Dublin, 2004.
    [103] (美)Ross S. M.,龚光鲁(译).应用随机过程:概率模型导论(第9版)[M].人民邮电出版社, 2007.
    [104] K-nearest Neighbor Algorithm. http://en.wikipedia.org/wiki/K-nearest_neighbor_algorithm .
    [105] Decision Tree. http://en.wikipedia.org/wiki/Decision_tree .
    [106] Maximum Entropy. http://en.wikipedia.org/wiki/Maximum_entropy .
    [107] Bayes' Theorem. http://en.wikipedia.org/wiki/Bayes'_theorem .
    [108] Support Vector Machine. http://en.wikipedia.org/wiki/Support_vector_machine .
    [109] Stokes N. . Applications of Lexical Cohesion Analysis in the Topic Detectionand Tracking Domain[D]. Department of Computer Science, University CollegeDublin, 2004.
    [110]洪宇,张宇,范基礼,等.基于语义域语言模型的中文话题关联检测[J].软件学报, 2008, 19(9):2265--2275.
    [111] Nallapati R., Allan J. . Capturing Term Dependencies Using a Language Modelbased on Sentence Trees[C]Proceedings of the 11th international conference on In-formation and knowledge management. ACM Press, 2002:383--390.
    [112] Lo Y. Y., Gauvain J. L. . The LIMSI Topic Tracking System for TDT2001[C]TopicDetection and Tracking Workshop. 2001.
    [113] Ma N., Yang Y., Rogati M. . Applying CLIR Techniques to Event Track-ing[C]Information Retrieval Technology: Asia Information Retrieval Symposium.Beijing, China: Springer, 2004,vol.3411:24--35.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700