基于时序文本挖掘的新闻内容理解与推荐技术研究

英文题名：Exploring Temporal Text Mining for News Content Anatomy and Recommendation
作者：陈伟
论文级别：博士
学科专业名称：计算机科学与技术
中文关键词：时序文本挖掘 ; 新闻内容理解 ; 新闻推荐 ; 突发特征 ; 突发事件 ; 近邻传播聚类 ; 进化聚类 ; 主题分解 ; 自适应频道导航 ; 多主题用户模型
英文关键词：temporal text mining ; news content anatomy ; news recommdation ; bursty feature ; bursty event ; affinity propagation clustering ; evolutionary clustering ; topic decomposition ; adaptive channel navigation ; multiple topic user modeling
学位年度：2010
导师：陈纯 ; 卜佳俊
学科代码：081202
学位授予单位：浙江大学
论文提交日期：2010-04-20

摘要

互联网的诞生及发展,大大促进了信息的传播。作为信息传播的重要手段,网络新闻在互联网上扮演着非常重要的角色,已经成为网民最常使用的网络应用之一。网络新闻是网络上发布的“新近发生的事实的报道”,它较传统新闻传播媒介在时效性、容量、丰富性、易交互性、易检索性,以及多媒体化的呈现方式等方面都有巨大的优势,给人们的生活带来了巨大的便利和帮助。当然,海量的网络新闻也给人们带来了信息过载问题。
     为了更好地满足各类网络用户的需求,提升网络用户的新闻获取体验,研究网络新闻内容的自动理解及推荐技术具有重要的意义。所谓新闻内容理解,是指从大量的新闻数据中抽取出事先未知的、可理解的、最终可用的知识,同时利用这些知识更好地组织新闻以帮助用户更好地获取这些信息。而新闻推荐技术则通过分析网络用户的各类新闻阅读行为,获得用户的喜好信息,结合对新闻内容的理解,向用户推荐其可能感兴趣的新闻。上述问题处理的大多是时序文本,涉及到时序文本挖掘技术的诸多方面。本文基于时序文本挖掘的相关技术,研究新闻内容理解和推荐涉及的多个问题,并提出了解决方案,具体的工作如下：
     本文首先针对时序新闻数据集的事件检测问题,提出了一种基于突发特征分析的新闻突发事件检测方法。引入特征轨迹将构成时序新闻数据集的特征表示为时间序列；提出了一种特征轨迹小波域表示方法,并引入多尺度突发分析算法检测突发特征及突发跨度；提出了一种基于近邻传播聚类算法的突发事件检测算法,将特征突发模式的相似性、特征所在新闻的重合度、以及特征能量(表示特征的突发强度)作为近邻传播算法的输入,将突发特征聚类以构成事件,并引入事件能量衡量事件的突发水平。
     针对时序新闻的在线突发事件检测问题,提出了一种在线的新闻突发事件检测及其进化分析方法。引入一种多尺度滑动窗口实时监控特征轨迹,并利用在线多尺度突发特征检测方法检测出当前时间窗口中具有不同突发跨度的突发特征；引入一个指数型的衰减因子衰减特征轨迹,并基于此计算突发特征之间的关联度；同样利用近邻传播聚类算法将突发特征聚类以检测出突发事件,利用能量衡量事件的突发水平；最后,提出了一种基于余弦相似度的信息检索方法发现事件在时间轴上的进化过程。
     针对时序新闻突发事件检测算法在实时性、准确率等方面存在的问题,进一步提出了一种基于假设检验的在线突发事件检测方法。提出了一种基于随机过程的特征数据流表示方法,并运用分布拟合检验及左边检验检测突发特征；分析突发特征的相关性,引入进化谱聚类算法将相关性较高的突发特征聚类以构成事件。算法具备更高的实时性,并能更准确地检测某些突发特征及事件。
     为了帮助人们更好地了解时序新闻,提出了一种时序新闻主题分解与摘要方法。在时序新闻的关键词一句子关联矩阵上应用非负矩阵分解(Non-negative Matrix Factorization,即NMF)获得子主题信息；通过分析非负矩阵分解获得的编码向量(encoding vector),发现属于每个子主题的事件,并为这些子主题及其包含的事件产生摘要；基于编码矩阵对句子进行排序,选择属于每个子主题的排名最高的若干句子作为该时序新闻的摘要。
     针对视障及老年人群的网络新闻获取需求,提出并实现了一个个性化的有声网络新闻推荐及综合挖掘平台。提出了一种个性化的有声网络新闻推荐的体系架构,支持各类终端通过HTTP协议获取个性化的有声新闻。该架构支持两个层面的个性化,在提供新闻频道自适应导航的同时,能够根据用户对于多类主题的兴趣自动推送相关的新闻。最后设计并实现了该系统(简称网络搜音机服务系统)。除实现上述功能外,基于前述新闻内容理解的工作,系统还集成了热点事件检测、用户兴趣发现及热点事件与用户兴趣的可视化展示等功能,为用户提供有效的信息获取服务。
The rapid growth of the Internet greatly accelerates information propogation. Web news plays a very important role on the Internet, and has already became one of the most widely used Web applications. Web news is the report of the recently happened fact which is publised on the Web. Compared to traditional news media, Web news has many advantages such as freshness, capability, richness, interactivity, searchability etc. It greatly faciliates users to get information from the outside world. However, the massive amount of Web news is also coupled with information overload problems.
     News content anatomy and recommendation can greatly fulfill users' requirements of Web news. News content anatomy is the process of extracting previously unknown, understandable and usable patterns from news content. Based on the analysis of users'usage pattern of Web news, recommendation system automatically pushes users'preferred news to them. Both news content anatomy and recommendation deal with temporal text, and the key of them are the temporal text mining techniques. By exploring temporal text mining, we study multiple problems of news content anatomy and recommendation, as follows:
     We firstly propose a bursty event detction method by analyzing bursty features in temporal news corpus. The features in the copus are represented as feature trail and are then transformed to wavelet domain. We introduce an elastic burst detection algorithm to identify multi-scale bursty features, and model them as a vector. By setting the preference as features' power (bursty level), affinity propagation clustering algorithm is used to group these bursty features with high document overlap and identically distribution in bursty time windows together. Then, events are returned to users with the order of their power.
     We then study a particular news stream monitoring task:timely detecting of bursty events which have happened recently and discovering their evolutionary patterns along the timeline. We use a multi-resolution sliding window to monitor the feature trail and apply an online multi-resolution burst detection method to identify bursty features with different bursty durations within recent time window. We cluster bursty features to form bursty events and associate each event with a power value which reflects its bursty level. An information retrieval method based on cosine similarity is used to discover the event's evolution along the timeline.
     We further introduce an online event detection algorithm in news stream. Firstly, we represent a feature stream as a random process and apply a goodness-of-fit test to find out these features with obvious changes in distribution of term frequency in a news document. Left side significance test is further used to validate bursty features. Then, an evolutionary spectral clustering algorithm is applied to group highly correlated bursty features to form bursty events.
     To help users understand various aspects of a tempoarl news stream, we study topic decomposition and summarization for a temporal-sequenced text corpus of a specific topic. We derive sub-topics by applying Non-negative Matrix Factorization (NMF) to terms-by-sentences matrix of the temporal news stream. And then, we detect incidents of each sub-topic and generate summaries for both sub-topic and its incidents by examining the constitution of its encoding vector generated by NMF. Finally, we rank each sentences based on the encoding matrix and select top ranked sentences of each sub-topic as the tempoal news corpus'summary.
     Finally, we present an architecture for providing personalized phonic Web news in Internet-connected consumer electronics. It provides two types of personalization. An adaptive channel navigation method is introduced to help users reach relevant channels quickly. Besides, a news recommending strategy is proposed to track multiple threads of users'interests and provide users with preferred news. Finally, we implement this system named EagleRadio. EagleRaido can not only provide personalized phonic news, but also integrate some news content anatomy funcitons, such as bursty events dectection, user's interests modeling and visualizaiton.

引文

[1]刘渊.互联网信息服务理论与实证——用户使用、服务提供与行业发展.科学出版社,2007.
    [2]李良荣.新闻学导论.高等教育出版社,2000,北京.
    [3]A. S. Tanenbaum. Computer Networks. Prentice Hall,2003.
    [4]胡泳,范海燕.网络为王.海南出版社,1997.
    [5]中国互联网络信息中心.第25次中国互联网络发展状况统计报告.2010.
    [6]赵新.网络新闻的特点.新闻爱好者(下半月),2008(9).
    [7]赵丽.浅析网络新闻的传播特色.商情,2009(18).
    [8]张海刚.网络新闻的特点.华章,2009(12).
    [9]汤秋黎.试论网络新闻的特征.青年记者，2008(35).
    [10]Q. He, K. Chang, E. P. Lim. Analyzing feature trajectories for event detection. In Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval,2007, pages 207-214. ACM.
    [11]杨晓玲,胡树祥.网络媒体受众新趋势.政工研究动态,2009.
    [12]洪宇,张宇,刘挺,李生.话题检测与跟踪的评测及研究综述.中文信息学报,2007,21(6)：71-87.
    [13]J. Allan. Topic detection and tracking:event-based information organization, 2002.
    [14]C. C. Aggarwal. Data streams:models and algorithms. Springer-Verlag New York Inc,2007.
    [15]M. M. Gaber, A. Zaslavsky, S. Krishnaswamy. Mining data streams:a review. ACM SIGMOD Record,2005,34(2):18-26.
    [16]J.Han,M.Kamber.数据挖掘:概念与技术.机械工业出版社,2006,北京.
    [17]P. Domingos, P. C. Edu, C. H. G. Edu. A general method for scaling up machine learning algorithms and its application to clustering. In In Proceedings of the Eighteenth International Conference on Machine Learning,2001, pages 106-113.
    [18]Y. Zhu, D. Shasha. Statstream:Statistical monitoring of thousands of data streams in real time. In Proceedings of the 28th international conference on Very Large Data Bases,2002, pages 358-369. VLDB Endowment.
    [19]D. E. Shasha, Y. Zhu. High performance discovery in time series:techniques and case studies. Springer-Verlag New York Inc,2004.
    [20]X. Zhang, D. Shasha. Better burst detection. In Proceedings of the 22nd International Conference on Data Engineering,2006, pages 146-146.
    [21]T. Li, Q. Li, S. Zhu, M. Ogihara. A survey on wavelet applications in data mining. ACM SIGKDD Explorations Newsletter,2002,4(2):49-68.
    [22]A. Bulut, A. K. Singh. SWAT:Hierarchical stream summarization in large networks. In Proceedings of the International Conference on Data Engineering,2003, pages 303-314. Citeseer.
    [23]Y. Zhu, D. Shasha. Efficient elastic burst detection in data streams. In Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining,2003, pages 336-345. ACM New York, NY, USA.
    [24]A. C. Gilbert, Y. Kotidis, S. Muthukrishnan, M. J. Strauss. Surfing wavelets on streams:One-pass summaries for approximate aggregate queries. In Proceedings of the International Conference on Very Large Data Bases,2001, pages 79-88. Citeseer.
    [25]A. C. Gilbert, Y. Kotidis, S. Muthukrishnan, M. J. Strauss. One-pass wavelet decompositions of data streams. IEEE transactions on knowledge and data engineering,2003,15(3):541-554.
    [26]F. Cao, M. Ester, W. Qian, A. Zhou. Density-based clustering over an evolving data stream with noise. In Proc. Sixth SIAM Intl Conf. Data Mining,2006, pages 326-337.
    [27]S. Guha, N. Mishra, R. Motwani, L. O. Clustering data streams. In Proc. IEEE Symposium on Foundations of Computer Science,2000, pages 359-366.
    [28]S. Guha, A. Meyerson, N. Mishra, R. Motwani, L. O'Callaghan. Clustering data streams:Theory and practice. IEEE transactions on knowledge and data engineering, 2003:515-528.
    [29]X. Zhang, C. Furtlehner, M. Sebag. Data streaming with affinity propagation. In ECML/PKDD,2008, pages 628-643. Springer.
    [30]J. Beringer, E. E. Hullermeier. Online clustering of data streams. Technical report, 2003.
    [31]C. C. Aggarwal, J. Han, J. Wang, P. S. Yu. A framework for clustering evolving data streams. In Proceedings of the 29th international conference on Very large data bases-Volume 29,2003, pages 81-92. VLDB Endowment.
    [32]H. Wang, W. Fan, P. S. Yu, J. Han. Mining concept-drifting data streams using ensemble classifiers. In Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining,2003, pages 226-235. ACM New York, NY, USA.
    [33]P. Domingos, G Hulten. Mining high-speed data streams. In Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining,2000, pages 71-80. ACM New York, NY, USA.
    [34]G. Hulten, L. Spencer, P. Domingos. Mining time-changing data streams. In Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining,2001, pages 97-106. ACM New York, NY, USA.
    [35]C. C. Aggarwal, J. Han, J. Wang, P. S. Yu. On demand classification of data streams. In Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining,2004, pages 503-508. ACM.
    [36]G. S. Manku, R. Motwani. Approximate frequency counts over data streams. In Proceedings of the 28th international conference on Very Large Data Bases,2002, pages 346-357. VLDB Endowment.
    [37]C. Giannella, J. Han, J. Pei, X. Yan, P. S. Yu. Mining frequent patterns in data streams at multiple time granularities. Next generation data mining,2003, 212:191-212.
    [38]S. Papadimitriou, A. Brockwell, C. Faloutsos. Adaptive, hands-off stream mining. In Proceedings of the 29th international conference on Very large data bases-Volume 29,2003, pages 560-571. VLDB Endowment.
    [39]S. Papadimitriou, A. Brockwell, C. Faloutsos. Adaptive, unsupervised stream mining. The VLDB Journal,2004,13(3):222-239.
    [40]A. Bulut, A. K. Singh. A unified framework for monitoring data streams in real time. In Proceedings of the International Conference on Data Engineering,2005, pages 44-55. Citeseer.
    [41]A. Zhou, S. Qin, W. Qian. Adaptively detecting aggregation bursts in data streams. Lecture Notes in Computer Science,2005,3453:435-446.
    [42]B. Liu. Web data mining. Springer,2007.
    [43]G. Salton. Automatic information organization and retrieval. McGraw Hill Text, 1968.
    [44]R. Baeza-Yates, B. Ribeiro-Neto. Modern information retrieval. Morgan Kaufmann,2005.
    [45]B. Croft, D. Metzler, T. Strohman. Search Engines:Information Retrieval in Practice.2009.
    [46]J. M. Ponte, W. B. Croft. A language modeling approach to information retrieval. In Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval,1998, pages 275-281. ACM New York, NY, USA.
    [47]I. H. Witten, A. Moffat, T. C. Bell. Managing gigabytes:compressing and indexing documents and images. Morgan Kaufmann,1999.
    [48]A. MacFarlane, J. A. McCann, S. E. Robertson. Parallel search using partitioned inverted files. In 7th International Symposium on String Processing and Information Retrieval,2000, pages 209-220. IEEE Computer Society.
    [49]L. A. Barroso, J. Dean, U. Holzle. Web search for a planet:The Google cluster architecture. IEEE micro,2003,23(2):22-28.
    [50]B. A. Ribeiro-Neto, R. A. Barbosa. Query performance for tightly coupled distributed digital libraries. In Proceedings of the third ACM conference on Digital libraries,1998, pages 182-190. ACM.
    [51]B. S. Jeong, E. Omiecinski. Inverted file partitioning schemes in multiple disk systems. IEEE Transactions on Parallel and Distributed Systems,1995,6(2):142-153.
    [52]S. Deerwester, S. T. Dumais, G. W. Furnas, T. K. Landauer, R. Harshman. Indexing by latent semantic analysis. Journal of the American society for information science,1990,41 (6):391-407.
    [53]X. He, D. Cai, H. Liu, W. Y. Ma. Locality preserving indexing for document representation,2004, pages 96-103. ACM New York, NY, USA.
    [54]T. Hofmann. Probabilistic latent semantic indexing. In Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval,1999, pages 50-57. ACM New York, NY, USA.
    [55]T. Hofmann. Probabilistic latent semantic analysis. matrix,1999,50:2-9.
    [56]X. He, P. Niyogi. Locality preserving projections. Advances in neural information processing systems,2003,16:153-160.
    [57]A. Srivastava, M. Sahami. Text Mining:Classification, Clustering, and Applications. Chapman & Hall/CRC,2009:328.
    [58]D. M. Blei, A. Y. Ng, M. I. Jordan. Latent dirichlet allocation. The Journal of Machine Learning Research,2003,3:993-1022.
    [59]C. M. Bishop. Pattern recognition and machine learning. Springer New York:, 2006.
    [60]M. W. Berry. Survey of text mining:clustering, classification, and retrieval. Springer-Verlag New York Inc,2003.
    [61]C. J. C. Burges. A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery,1998,2(2):121-167.
    [62]U. von Luxburg. A tutorial on spectral clustering. Statistics and Computing,2007, 17(4):395-416.
    [63]Y. Yang, T. Pierce, J. Carbonell. A study of retrospective and on-line event detection. In Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval,1998, pages 28-36. ACM New York, NY, USA.
    [64]Y. B. Liu, J. R. Cai, J. Yin, A. W. C. Fu. Clustering text data streams. Journal of Computer Science and Technology,2008,23(1):112-128.
    [65]D. M. Blei, J. D. Lafferty. Dynamic topic models. In Proceedings of the 23rd international conference on Machine learning,2006, pages 113-120. ACM.
    [66]X. Wei, J. Sun, X. Wang. Dynamic mixture models for multiple time series. In Proceedings of the International Joint Conference on Artificial Intelligence,2007, pages 2909-2914.
    [67]C. Wang, D. Blei, D. Heckerman. Continuous time dynamic topic models. In The 23rd Conference on Uncertainty in Artificial Intelligence,2008.
    [68]L. AlSumait, D. Barbar, C. Domeniconi. On-line LDA:adaptive topic models for mining text streams with applications to topic detection and tracking. In Proceedings of the 2008 Eighth IEEE International Conference on Data Mining,2008, pages 3-12. IEEE Computer Society.
    [69]A. Banerjee, S. Basu. Topic models over text streams:A study of batch and online unsupervised learning. In SIAM Data Mining,2007, pages 431-436. Citeseer.
    [70]J. Kleinberg. Bursty and hierarchical structure in streams. Data Mining and Knowledge Discovery,2003,7(4):373-397.
    [71]Q. He, K. Chang, E. P. Lim, J. Zhang. Bursty feature representation for clustering text streams. In Proc. SIAM Conference on Data Mining,2007, pages 491-496. Citeseer.
    [72]G.Doyle, C. Elkan. Accounting for burstiness in topic models. In Proceedings of the 26th Annual International Conference on Machine Learning,2009, pages 281-288. ACM New York, NY, USA.
    [73]X. Wang, C. X. Zhai, X. Hu, R. Sproat. Mining correlated bursty topic patterns from coordinated text streams. In Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining,2007, pages 784-793. ACM.
    [74]G P. C. Fung, J. X. Yu, P. S. Yu, H. Lu. Parameter free bursty events detection in text streams. In Proceedings of the 31st international conference on Very large data bases,2005, pages 181-192. VLDB Endowment.
    [75]Q. Mei, C. X. Zhai. Discovering evolutionary theme patterns from text:an exploration of temporal text mining. In Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining,2005, pages 198-207. ACM.
    [76]X. Wang, A. McCallum. Topics over time:a non-markov continuous-time model of topical trends. In Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining,2006, pages 424-433. ACM.
    [77]A. e Gohr, A. Hinneburg, G.Halle-Wittenberg, R. e Schult, M. Spiliopoulou. Topic Evolution in a Stream of Documents. In Proc. SIAM Conference on Data Mining, 2009, pages 859-870.
    [78]X. Wang, K. Zhang, X. Jin, D. Shen. Mining common topics from multiple asynchronous text streams. In Proceedings of the Second ACM International Conference on Web Search and Data Mining,2009, pages 192-201. ACM.
    [79]L. Yao, D. Mimno, A. McCallum. Efficient methods for topic model inference on streaming document collections. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining,2009, pages 937-946. ACM New York, NY, USA.
    [80]秦兵,刘挺,李生.多文档自动文摘综述.中文信息学报2005,19(6)：13-20.
    [81]A. Jatowt, M. Ishizuka. Temporal web page summarization. Lecture Notes in Computer Science,2004,3306:303-312.
    [82]A. Jatowt, M. Ishizuka. Temporal multi-page summarization. Web Intelligence and Agent Systems,2006,4(2):163-180.
    [83]X. Wan. TimedTextRank:adding the temporal dimension to multi-document summarization.'In Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval,2007, pages 867-868. ACM.
    [84]J. Zhang, X. Cheng, H. Xu. Dynamic Summarization:Another Stride Towards Summarization. In IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology Workshops,2007, pages 64-67.
    [85]J. Zhang, H. B. Xu, X. Q. Cheng. Research on dynamic summarization for evolutionary web information. Jisuanji Xuebao/Chinese Journal of Computers,2008, 31(4):696-701.
    [86]C. C. Chen, M. C. Chen. TSCAN:a novel method for topic summarization and content anatomy. In Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval,2008, pages 579-586. ACM New York, NY, USA.
    [87]G. Adomavicius, A. Tuzhilin. Toward the next generation of recommender systems:A survey of the state-of-the-art and possible extensions. IEEE transactions on knowledge and data engineering,2005,17(6):734-749.
    [88]G. Adomavicius, R. Sankaranarayanan, S. Sen, A. Tuzhilin. Incorporating contextual information in recommender systems using a multidimensional approach. ACM Transactions on Information Systems (TOIS),2005,23(1):103-145.
    [89]R. Burke. Hybrid recommender systems:Survey and experiments. User Modeling and User-Adapted Interaction,2002,12(4):331-370.
    [90]D. H. Widyantoro, T. R. Ioerger, J. Yen. Learning user interest dynamics with a three-descriptor representation. Journal of the American Society for Information Science and Technology,2001,52(3):212-225.
    [91]J. J. Rocchio. Relevance feedback in information retrieval. The SMART retrieval system:experiments in automatic document processing,1971:313-323.
    [92]D. Billsus, M. J. Pazzani. Adaptive news access. Lecture Notes in Computer Science,2007,4321:550-570.
    [93]J. Ahn, P. Brusilovsky, J. Grady, D. He, S. Y. Syn. Open user profiles for adaptive news systems:help or harm? In Proceedings of the 16th international conference on World Wide Web,2007, pages 11-20. ACM.
    [94]D. H. Widyantoro, T. R. Ioerger, J. Yen. An adaptive algorithm for learning changes in user interests. In Proceedings of the eighth international conference on Information and knowledge management,1999, pages 405-412. ACM New York, NY, USA.
    [95]R. K. Pon, A. F. Cardenas, D. Buttler, T. Critchlow. Tracking multiple topics for finding interesting articles. In Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining,2007, pages 560-569. ACM.
    [96]D. Billsus, M. J. Pazzani. A personal news agent that talks, learns and explains. In Proceedings of the third annual conference on Autonomous Agents,1999, pages 268-275. ACM New York, NY, USA.
    [97]I. Schwab, W. Pohl, I. Koychev. Learning to recommend from positive evidence. In Proceedings of the 5th international conference on Intelligent user interfaces,2000, pages 241-247. ACM.
    [98]L. Ardissono, L. Console, I. Torre. An adaptive system for the personalized access to news. Ai Communications,2001,14(3):129-147.
    [99]M. Claypool, A. Gokhale, T. Miranda, P. Murnikov, D. Netes, M. Sartin. Combining content-based and collaborative filters in an online newspaper. In Proceedings ofACMSIGIR Workshop on Recommender Systems,1999. Citeseer.
    [100]P. Melville, R. J. Mooney, R. Nagarajan. Content-boosted collaborative filtering for improved recommendations. In Proceedings of the National Conference on Artificial Intelligence,2002, pages 187-192. MIT Press.
    [101]M. Degemmis, P. Lops, G. Semeraro. A content-collaborative recommender that exploits WordNet-based user profiles for neighborhood formation. User Modeling and User-Adapted Interaction,2007,17(3):217-255.
    [102]G. Semeraro, M. Degemmis, P. Lops, P. Basile. Combining learning and word sense disambiguation for intelligent user profiling. In Proc. of the 20th Int. Joint Conf. on Artificial Intelligence,2007, pages 2856-2861.
    [103]D. D. Lewis, Y. Yang, T. G. Rose, F. Li. Rcvl:A new benchmark collection for text categorization research. The Journal of Machine Learning Research,2004, 5:361-397.
    [104]J. Allan, R. Papka, V. Lavrenko. On-line new event detection and tracking. In Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval,1998, pages 37-45. ACM New York, NY, USA.
    [105]W. Lam, H. M. L. Meng, K. L. Wong, J. C. H. Yen. Using contextual analysis for news event detection. International Journal of Intelligent Systems,2001, 16(4):525-546.
    [106]Y. Yang, J. Zhang, J. Carbonell, C. Jin. Topic-conditioned novelty detection. In Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining,2002, pages 688-693. ACM.
    [107]G Kumaran, J. Allan. Text classification and named entities for new event detection. In Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval,2004, pages 297-304. ACM New York, NY, USA.
    [108]K. Zhang, J. Zi, L. G. Wu. New event detection based on indexing-tree and named entity. In Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval,2007, pages 215-222. ACM.
    [109]K. Zhang, J. Z. Li, G Wu, K. H. Wang. A new event detection model based on term reweighting. Ruan Jian Xue Bao(Journal of Software),2008,19(4):817-828.
    [110]T. Brants, F. Chen, A. Farahat. A system for new event detection. In Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval,2003, pages 330-337. ACM New York, NY, USA.
    [111]M. Misiti, Y. Misiti, G. Oppenheim. Wavelet Toolbox 4, User's Guide. The MathWorks,2008.
    [112]B. J. Frey, D. Dueck. Clustering by passing messages between data points. Science,2007,315(5814):972-976.
    [113]Apache Lucene Project. http://lucene.apache.org.2009.
    [114]Topic Detection and Tracking Evaluation Project. http://www.itl.nist.gov/iad/mig//tests/tdt/.2007.
    [115]Z. Li, B. Wang, M. Li, W. Y. Ma. A probabilistic model for retrospective news event detection. In Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval,2005, pages 106-113.ACM.
    [116]Z. Yuan, Y. Jia, S. Yang. Online Burst Detection Over High Speed Short Text Streams. Lecture Notes in Computer Science,2007,4489:717-725.
    [117]M. Vlachos, C. Meek, Z. Vagena, D. Gunopulos. Identifying similarities, periodicities and bursts for online search queries. In Proceedings of the 2004 ACM SIGMOD international conference on Management of data,2004, pages 131-142. ACM New York, NY, USA.
    [118]K. K. W. Chu, M. H. Wong. Fast time-series searching with scaling and shifting. In Proceedings of the eighteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems,1999, pages 237-248. ACM New York, NY, USA.
    [119]T. Kahveci, A. Singh. Variable length queries for time series data. In Proceedings of the international conference on data engineering,2001, pages 273-282. IEEE Computer Society Press.
    [120]Z. Dezs, E. Almaas, A. Lukacs, B. Racz, I. Szakadat. Dynamics of information access on the web. Physical Review E,2006,73(6):66132.
    [121]W. Feller. An introduction to probability theory and its applications, vol 2. Wiley India Pvt. Ltd.,2008.
    [122]盛骤,谢式千,潘承毅.概率论与数理统计.高等教育出版社,2001.
    [123]D. Chakrabarti, R. Kumar, A. Tomkins. Evolutionary clustering. In Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining,2006, pages 554-560. ACM.
    [124]Y. Chi, X. Song, D. Zhou, K. Hino, B. L. Tseng. Evolutionary spectral clustering by incorporating temporal smoothness. In Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining,2007, pages 153-162. ACM.
    [125]J. Shi, J. Malik. Normalized cuts and image segmentation. IEEE Transactions on pattern analysis and machine intelligence,2000,22(8):888-905.
    [126]F. R. Bach, M. I. Jordan. Learning spectral clustering, with application to speech separation. The Journal of Machine Learning Research,2006,7:1963-2001.
    [127]Document Understanding Conferences. http://www-nlpir.nist.gov/projects/duc/index.html.2004.
    [128]W. Xu, X. Liu, Y. Gong. Document clustering based on non-negative matrix factorization. In Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval,2003, pages 267-273. ACM New York, NY, USA.
    [129]G. P. C. Fung, J. X. Yu, H. Liu, P. S. Yu. Time-dependent event hierarchy construction. In Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining,2007, pages 300-309. ACM New York, NY, USA.
    [130]D. D. Lee, H. S. Seung. Learning the parts of objects by non-negative matrix factorization. Nature,1999,401(6755):788-791.
    [131]D. D. Lee, H. S. Seung. Algorithms for non-negative matrix factorization. In Advances in neural information processing systems,2001, pages 556-562.
    [132]D. Kalman. A singularly valuable decomposition:The SVD of a matrix. The College Mathematics Journal,1996,27(1):2-23.
    [133]G. Strang. Introduction to linear algebra. Wellesley Cambridge Pr,2003.
    [134]Y. Gong, X. Liu. Generic text summarization using relevance measure and latent semantic analysis. In Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval,2001, pages 19-25. ACM New York, NY, USA.
    [135]H. Zha. Generic summarization and keyphrase extraction using mutual reinforcement principle and sentence clustering. In Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval,2002, pages 113-120. ACM.
    [136]R. Mihalcea. Language independent extractive summarization. In Proceedings of the ACL 2005 on Interactive poster and demonstration sessions,2005, pages 49-52. Association for Computational Linguistics.
    [137]X. Wan, J. Yang, J. Xiao. Manifold-ranking based topic-focused multi-document summarization. In Proceedings of IJCAI,2007, pages 2903-2908.
    [138]D. Wang, T. Li, S. Zhu, C. Ding. Multi-document summarization via sentence-level semantic analysis and symmetric matrix factorization. In Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval,2008, pages 307-314. ACM.
    [139]J. H. Lee, S. Park, C. M. Ahn, D. Kim. Automatic generic document summarization based on non-negative matrix factorization. Information Processing and Management,2009,45(1):20-34.
    [140]J. M. Conroy, D. P. O'Leary. Text summarization via hidden markov models. In Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval,2001, pages 406-407. ACM.
    [141]D. Shen, J. T. Sun, H. Li, Q. Yang, Z. Chen. Document summarization using conditional random fields. In Proceedings of IJCAI,2007, pages 2862-2867.
    [142]L. Li, K. Zhou, G. R. Xue, H. Zha, Y. Yu. Enhancing diversity, coverage and balance for summarization through structure learning. In Proceedings of the 18th international conference on World wide web,2009, pages 71-80. ACM New York, NY, USA.
    [143]W. B. Frakes, R. Baeza-Yates. Information retrieval:data structures and algorithms. Prentice-Hall, Inc. Upper Saddle River, NJ, USA,1992.
    [144]M. Vlachos, C.Meek, Z. Vagena, D. Gunopulos. Identifying similarities, periodicities and bursts for online search queries,2004, pages 131-142. ACM New York, NY, USA.
    [145]C. Y. Lin. Rouge:A package for automatic evaluation of summaries. In Proceedings of the Workshop on Text Summarization Branches Out (WAS 2004),2004, pages 74-81.
    [146]A. Nenkova. Automatic text summarization of newswire:Lessons learned from the document understanding conference. In Proceedings of the national conference on artificial intelligence,2005, pages 1436-1441. MIT Press.
    [147]B. Smyth, P. Cotter. Intelligent navigation for mobile internet portals. In IJCAI Workshop on AI Moves to IA:Workshop on Artificial Intelligence, Information Access, and Mobile Computing,2003. Citeseer.
    [148]B. Smyth, K. McCarthy, J. Reilly. Mobile Portal Personalization:Tools and Techniques. Lecture Notes in Computer Science,2005,3169:255-271.
    [149]W. Lee, S. Kang, S. Lim, M. K. Shin, Y. K. Kim. Adaptive Hierarchical Surrogate for Searching Web with Mobile Devices. IEEE Transactions on Consumer Electronics,2007,53(2):796-803.
    [150]S. Park, S. Kang, Y. K. Kim. A channel recommendation system in mobile environment. IEEE Transactions on Consumer Electronics,2006,52(1):33-39.
    [151]G.Schatter, B. Zeller. Design and Implementation of an Adaptive Digital Radio DAB using Content Personalization on the Basis of Standards. IEEE Transactions on Consumer Electronics,2007,53(4):1353-1361.
    [152]Y. Blanco-Fernandez, J. Pazos-Arias, A. Gil-Solla, M. Ramos-Cabrer, M. Lopez-Nores. Providing Entertainment by Content-based Filtering and Semantic Reasoning in Intelligent Recommender Systems. IEEE Transactions on Consumer Electronics,2008,54(2):727-735.
    [153]T. Pessemier, T. Deryckere, K. Vanhecke, L. Martens. Proposed architecture and algorithm for personalized advertising on iDTV and mobile devices. IEEE Transactions on Consumer Electronics,2008,54(2):709-713.
    [154]F. Qiu, J. Cho. Automatic identification of user interest for personalized search. In Proceedings of the 15th international conference on World Wide Web,2006, pages 727-736. ACM.
    [155]E. Voorhees. Overview of TREC 2002. In In Proceedings of the 11th Text Retrieval Conference (TREC 2002),2001, pages 1-15.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700