互联网文本信息挖掘与个性化推荐的研究

英文题名：Research on Internet Text Mining and Personalized Recommendation
作者：温源
论文级别：博士
学科专业名称：通信与信息系统
中文关键词：话题发现 ; 自动摘要 ; 聚类算法 ; 协同过滤 ; 个性化推荐
英文关键词：Topic Discovery ; Automatically Summarization ; Clustering Algorithm ; Collaborative Filtering ; Personalized Recommendations
学位年度：2014
导师：刘云
学科代码：081001
学位授予单位：北京交通大学
论文提交日期：2014-06-01
答辩委员会主席：卿斯汉

摘要

随着互联网技术的发展,网站的普及以及大量文本数据的出现,互联网已经成为了人们获取信息资源的一条重要渠道。但是网络数据成千上万,一个人无论如何用多久的时间也不可能完成对整个互联网的探索。因此简化对网络的探索过程,提高网络信息的检索效率就成为了当前网络时代的研究方向。好的信息挖掘方法可以提高人们的信息检索效率,能够提供准确、及时、可靠的网络信息汇总,提供适合人们阅读的摘要。同时,随着网络技术的发展,越来越多的网站出现了不需要人工搜索,就可获得信息的新途径,这些新途径就是信息推荐。在合适的时机,给合适的对象提供相关信息或相关产品推荐,能够提升用户浏览兴趣,提高网站的服务体验,并且增加用户对网站的粘度。推荐方法是继搜索引擎之后的又一大信息获取方法,该方法在未来有着很大的应用前景,不但对于互联网新闻消息、相关文本推荐有帮助,而且在电子商务、公司产品推广以及新产品扩展和传播等领域均具有重要的应用价值。鉴于此,本论文结合交叉学科的研究方法,针对现有互联网文本信息的特点提出网络热点话题发现算法以及网络自动摘要生成模型,并且通过研究网络用户之间的兴趣联系和用户偏好进而提出个性化推荐算法。本文分别从互联网文本数据采集与处理、文本信息聚类算法、热点信息挖掘、网络新闻摘要提取方法、协同过滤推荐算法、基于社团关系的信息推荐等方向和角度,对互联网的文本数据挖掘及个性化推荐进行了研究。
     论文的主要研究内容如下：
     1.研究了互联网文本信息采集与预处理技术,中文分词与聚类方法,并针对互联网文本信息的特点,提出了一种网络热点事件的发现算法。该方法通过引入文本词语的突发度量值,结合词语位置对权重的影响因素,完善了词语权重计算的准确度。此外,本文提出一种基于预设密度的聚类算法,该算法通过以相似的文本为核心的类簇,获得合理划分的文本主题。从而在不需要事先指定事件数的情况下,自动发现该时间段内的热点事件。实验结果表明,该算法在发现互联网热点事件的检测中有较好的效果。
     2.研究了对网络文本信息自动生成摘要的方法。该方法使得文本信息得以压缩,使用摘要的形式来表示文本,从而可以提供用户快速获取文本的主要内容。通过分析了互联网新闻自动摘要的特殊情况,针对多文本信息的摘要,提出了摘要主题的概念。局部主题就是在把互联网新闻划分成句子后,根据分层聚类形成的结果,产生的信息集合。其次,利用互联网新闻常附有人工评论信息的条件,进一步提高文本摘要的准确度。通过将新闻正文及评论的语句映射为网络节点,再引入网络中分析节点权重的HITS算法,来计算处于不同位置的句子的影响力。根据评论信息对新闻正文语句的影响程度,改进传统算法中计算这些语句的权重大小,进而影响了摘要句的选取。实验表明,使用评论信息的摘要算法比没有使用评论信息的摘要算法的效果更好。该研究为互联网条件下的信息抽取和自动摘要以及未来进一步的文本信息压缩提供了基础。
     3.研究了基于协同过滤的推荐算法。在传统的协同过滤基础上,改进了协同过滤推荐算法中的用户相似度计算,进而提高了推荐的准确度。通过考虑不同用户的共同喜好,以及他们各自偏好对相似度的影响,进而提出一种基于对数的相似度计算公式。并且在实际应用中,使用微博数据检验了改进后的推荐算法。对微博聚类形成不同的话题类,然后获得用户与这些话题类的关系网络,从而利用改进的协同过滤算法做推荐。实验的结果表明,基于微博数据的推荐能够有效的命中验证集中的数据,具有良好的推荐效果。新的推荐算法与传统的协同过滤算法相比,较大幅度的提高了推荐准确率,具有更好的个性化推荐效果。
     4.从推荐系统的角度出发,通过提出了两种不同社团形成模型,研究在不同社团形成条件下的适合的推荐方法。对此,提出了两种适合社团内相似度计算的模型,并与传统相似度模型对比,测试了几种相似度计算模型在以社团为推荐条件下的实际应用效果。实测中,以公认的Movielens数据集为验证数据,验证了基于社团形成的模型不但在推荐的准确度,以及推荐的多样性等方面都优于传统的热传导模型及概率传递模型。通过比较两种社团形成的模型,发现非严格划分的社团模型,与严格划分社团模型相比,拥有更高的推荐准确度与推荐多样性值。因此该种模型更适合推荐系统,尤其适合为个性化推荐提供服务。
With the development of Internet technology, the popularity of the websites and the emergence of large number of texts data, the Internet has become an important channel for people to obtain information resources. But with tens of thousands of data on Internet, it is impossible for a person to complete the exploration of the entire Internet. Thus, simplifying process of exploring the network and improve the efficiency of retrieving information on Internet have become popular research directions of the Internet age. Good information mining method can improve the efficiency of information retrieval. It can provide accurate, timely, and reliable network information collection, to provide for people to read a summary timely. Meanwhile, with the development of network technology, more and more websites appear without manual searching. These new approaches are information recommendations. At the right time to provide right relevant information or related products, it can enhance the user browsing interests and increase the viscosity of the user for the websites. The recommended method is another major information access method in the future. It has a great prospect, and has great value not only for Internet news or related texts recommendation, but also for e-commerce, promotion of the company's products and new product dissemination. In view of this, the paper combines interdisciplinary research methods, and proposes Internet hot topics detection and network auto summary generation model. The paper makes personalized recommendation algorithm based on research in user preferences and user interest. This paper focus on the fields of the Internet data acquisition, text message clustering algorithm, hot information mining, network news summarization methods, collaborative filtering recommendation algorithm and community-based recommendation.
     Major works and innovations of the paper include the following aspects:
     1) This paper has a research on the Internet text information collection and pre-process technology, Chinese word segmentation and clustering methods. And then it proposes a hotspot event discovery algorithm based on the characteristic of text information on Internet. By introducing the text word burst metric and considering influence the position of words, this paper improves the accuracy of calculating the weight values. This paper presents a reasonable division of the text theme by preset-density based maximum link clustering Algorithm and treats similar texts as the core of the clusters. So it can automatically discover the hot events of a period. Experimental results show that this algorithm has a better result in finding the internet hotspot events.
     2) The paper has a research on automatically generated text summaries of the Internet texts. The algorithm allows text information to be compressed, and uses abstract forms to represent text, which can provide users with quick access to the text of the main content. The algorithm analyzes the Internet news summaries information for multiple texts, and then put forward the concept of summary topics. The summary topics generate information clusters according to the results of hierarchical clustering by dividing Internet news into sentences. Secondly, the use of artificial comment of Internet news further improves the accuracy of text summarization. Text and comments statements are mapped into network nodes, and then introducing into the HITS algorithm for analysis of network node weights to calculate the different influences of location of the sentences. Comment information has an influence of the news body text. It significantly improvements the right selection of the summary by improving the weight of these statements. Experimental results show that the algorithm with use of comments is better than the algorithm without using comments. The study provides a basis for further Internet information extraction and automatically summarization.
     3) This paper has studied the collaborative filtering recommendation algorithm. This paper has improved the accuracy of recommendation by an improved collaborative filtering algorithm based on the conventional computing method. By considering the preferences of different users and the similarity of their respective preferences, it presents a similarity formula based on logarithm. In practical applications, it uses the real data of micro-blog to test the improved recommendation algorithm. By clustering of micro-blog to form different topic categories, it gets the relationship between users and these topics categories, and then takes advantage of the improved collaborative filtering algorithm to recommend. Experimental results show that the recommendation result can effectively hit the micro-blog data validation data set. Compared to traditional collaborative filtering algorithms, the new recommendation algorithm dramatically increased the recommendation accuracy, with better personalized recommendations effect.
     4) This paper has a research on the perspective of the recommendation system. It presents two different models of formation communities, and studies which recommended method is suitable under the conditions of different community formation. It proposes two suitable similarity calculation models in the community, and then compares them with the traditional similarity model and tests several similarity calculation models under the conditions of different community formations. Measured in Movielens dataset to verify that the model based on the formation of communities is better than traditional heat conduction model and probabilistic transmission model not only in terms of the accuracy of the recommendation but also in the diversity of recommendation. At last it compares two models of forming communities and finds that for non-strict division of community model has a higher accuracy and diversity of recommendation, compared with the strict division of community model. Thus, the non-strictly divided communities' model is more suitable for recommendation system, especially for the personalized recommendation.

引文

[1]http://www.cnnic.net.cn/
    [2]http://www.baidu.com/
    [3]何明升.复杂巨系统：互联网—社会研究的一个新视角[J].学术交流,2005(7).
    [4]http://www.dmoz.org/
    [5]http://dir.yahoo.com/
    [6]http://site.baidu.com/
    [7]http://www.hao123.com/
    [8]http://en.wikipedia.org/wiki/Mail_list
    [9]Board R. A. Really simple syndication:RSS 2.0.1 Specification (revision 6)[J]. Retrieved March,2005.
    [10]Lawrence S, Giles C L. Searching the world wide web[J]. Science,1998,280(5360):98-100.
    [11]Mukhopadhyay D, Mukherjee S, Ghosh s, et al. Architecture of a scalable dynamic parallel web crawler with high speed downloadable capability for a web search engine[J]. arXiv preprint arXiv:1102.0676,2011.
    [12]Huang C, Zhao H. Chinese word segmentation:A decade review[J]. Journal of Chinese information processing,2007,21(3):8-20.
    [13]Cutting D, Pedersen J. Optimization for dynamic inverted index maintenance[C]. Proceedings of the 13th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval,1989. ACM.
    [14]Page L, Brin S, Motwani R, et al. The PageRank citation ranking:bringing order to the web[J]. 1999.
    [15]H. S. Park, J. O. Yoo, S. B. Cho. A context-aware music recommendation system using fuzzy Bayesian networks with utility theory [J]. Fuzzy Systems and Knowledge Discovery, Springer Berlin Heidelberg,2006.970-979.
    [16]B. N. Miller, I. Albert, S. K. Lam. MovieLens unplugged:experiences with an occasionally connected recommender system [C]. Proceedings of the 8th international conference on Intelligent user interfaces, ACM,2003,263-266.
    [17]Buchter O, Wirth R. Discovery of association rules over ordinal data:A new and faster algorithm and its application to basket analysis[J]. Research and Development in Knowledge Discovery and Data Mining,1998:36-47.
    [18]H. C. Chen, A. Chen. A music recommendation system based on music data grouping and user interests [C]. Conference on Information and Knowledge Management:Proceedings of the tenth international conference on Information and knowledge management,2001,5(10),231-238.
    [19]Hix S. A global ranking of political science departments[J]. Political Studies Review, 2004,2(3):293-313.
    [20]Y. Hu, Y. Koren, C. Volinsky. Collaborative filtering for implicit feedback datasets:Data Mining [C]. ICDM'08, Eighth IEEE International Conference,2008,263-272.
    [21]P. Resnick, N. Iacovou, M. Suchak. GroupLens:an open architecture for collaborative filtering of net news [C]. ACM conference on Computer supported cooperative work,1994,175-186.
    [22]H. Thomas. Latent semantic models for collaborative filtering [J]. ACM Transactions on Information Systems (TOIS),2004,22(1),89-115.
    [23]M. Balabanovic, Y. Shoham. Fab:Content-Based Collaborative Recommendation [J]. Communication of the ACM,1997,40(3),66-72..
    [24]F. Abbattista, M. Degemmis, N. Fanizzi N. Learning User Profiles for Content-Based Filtering in e-Commerce [C]. Proceedings of AI Workshop, Metodi,2002.
    [25]E. Rich. Users are Individuals:Individualizing User Models [J]. International Journal of Man-Machine Studies,1983,18(3),199-214.
    [26]J. Alspector, A. Koicz, N, Karunanithi. Feature-based and Clique-based User Models for Movie Selection:A Comparative Study [J]. User Modeling and User-Adapted Interaction,1997, 7(4),279-304.
    [27]Deshpande M, Karypis G Item-based top-n recommendation algorithms!J]. ACM Transactions on Information Systems (TOIS),2004,22(1):143-177.
    [28]Linden G, Smith B, York J. Amazon, com recommendations:Item-to-item collaborative filtering[J]. Internet Computing, IEEE,2003,7(1):76-80.
    [29]J. Herlocker, J. Konstan, L. Terveen. Evaluating Collaborative Filtering Recommender Systems [J]. ACM Trans on Information Systems (TOIS),2004,22(1),5-53.
    [30]R. Burke. Hybrid Systems for Personalized Recommendations [J]. Intelligent Techniques for Web Personalization, Springer Berlin Heidelberg,2005,133-152.
    [31]Hu Y, Koren Y, Volinsky C. Collaborative filtering for implicit feedback datasets:Data Mining[C]. ICDM'08. Eighth IEEE International Conference on,2008.
    [32]Miller B N, Albert I, Lam S K, et al. MovieLens unplugged:experiences with an occasionally connected recommender system[C]. Proceedings of the 8th international conference on Intelligent user interfaces. ACM,2003.
    [33]Sarwar B, Karypis G, Konstan J, et al. Application of dimensionality reduction in recommender system-a case study[R].DTIC Document,2000.
    [34]G. Linden, B. Smith, J. York. Amazon.com recommendations:Item-to-item collaborative filtering [J]. Internet Computing, IEEE,2003,7(1),76-80.
    [35]M. Deshpande, G. Karypis. Item-based top-n recommendation algorithms [J]. ACM Transactions on Information Systems (TOIS),2004,22(1),143-177.
    [36]Ghosh S, Mundhe M, Hernandez K, et al. Voting for movies:the anatomy of a recommender system[C]. Proceedings of the third annual conference on Autonomous Agents. ACM,1999.
    [37]Holme P, Liljeros F, Edling C R, et al. Network bipartivity [J]. Physical Review E, 2003,68(5):56107.
    [38]Lv L, Liu W. Information filtering via preferential diffusion[J]. Physical Review E, 2011,83(6):66119.
    [39]Zhou T, Ren J, Medo M, et al. Bipartite network projection and personal recommendation[J]. Physical Review E,2007,76(4):46115.
    [40]Zhang Y C, Blattner M, Yu Y K. Heat conduction process on community networks as a recommendation model[J]. Physical review letters,2007,99 (15430115):154301.
    [41]Liu J G. Zhou T, Guo Q. Information filtering via biased heat conduction[J]. Physical Review E,2011,84(3):37101.
    [42]J. G. Liu, T. Zhou, Q. Guo. Information filtering via biased heat conduction [J], Physical Review E,2011,84(3),037101.
    [43]Y. C. Zhang, M. Blattner, Y. K. Yu. Heat conduction process on community networks as a recommendation model [J]. Physical review letters,2007,99(15) 154301.
    [44]T. Zhou, J. Ren, M. Medo, Y. C. Zhang. Bipartite network projection and personal recommendation [J]. Physical Review E,2007,76(4),046115.
    [45]T. Zhou, R. Q. Su, R. R. Liu. Accurate and diverse recommendations via eliminating redundant correlations [J], New Journal of Physics,2009,11(12),123008.
    [46]J. G Liu, T. Zhou, H. A. Che. Effects of high-order correlations on personalized recommendations for bipartite networks [J]. Physica A, Statistical Mechanics and its Applications, 2010,389(4),881-886.
    [47]T. Zhou, L. L. Jiang, R. Q. Su, Y. C. Zhang. Effect of initial configuration on network-based recommendation [J]. Europhysics Letters,2007,81(5),58004.
    [48]M. S. Shang, C. H. Jin, T. Zhou, Y. C. Zhang. Collaborative filtering based on multi-channel diffusion [J]. Physica A:Statistical Mechanics and its Applications,2009,388(23),4867-4871.
    [49]J. Liu, G. S. Deng. Link prediction in a user-object network based on time-weighted resource allocation [J]. Physica A:Statistical Mechanics and its Applications 2009,388(17),3643-3650.
    [50]J. G. Liu, Q. Guo, Y. C. Zhang. Information filtering via weighted heat conduction algorithm [J]. Physica A:Statistical Mechanics and its Applications,2011,390(12),2414-2420.
    [51]D. H. Wang, Z. Li, Z. R. Di. Bipartite producer-consumer networks and the size distribution of firms [J]. Physica A:Statistical Mechanics and its Applications,2006,363(2),359-366.
    [52]T. Zhou, L. L. Jiang, R. Q. Su, Y. C. Zhang. Effect of initial configuration on network-based recommendation [J]. Europhysics Letters,2008,81(5),58004.
    [53]T. Zhou, L. Lu, Y. C. Zhang. Predicting missing links via local information. [J]. The European Physical Journal B,2009,71(4),623-630.
    [54]C. J. Zhang, A. Zeng. Behavior patterns of online users and the effect on information filtering [J]. Physica A:Statistical Mechanics and its Applications,2012,391(4),1822-1830.
    [55]C. Wartena, R. Brussee, M. Wibbels. Using tag co-occurrence for recommendation [C]. Intelligent Systems Design and Applications, Ninth International Conference, IEEE,2009, 273-278.
    [56]M. S. Shang, Z. K. Zhang. Diffusion-based recommendation in collaborative tagging systems [J]. Chinese Physics Letters,2009,26(11),118903.
    [57]Y. Ding, X. Li. Time weight collaborative filtering [C]. Proceedings of the 14th ACM international conference on Information and knowledge management,2005,485-492.
    [58]F. E. Walter, S. Battiston, F. Schweitzer. A model of a trust-based recommendation system on a social network [J]. Autonomous Agents and Multi-Agent Systems,2008,16(1),57-74.
    [59]P. Bonhard, M. Sasse. Knowing me, knowing you-Using profiles and social networking to improve recommender systems [J]. BT Technology Journal,2006,24(3),84-98.
    [60]Liang H, Xu Y, Li Y, et al. Collaborative filtering recommender systems using tag information: Web Intelligence and Intelligent Agent Technology,2008. WI-IAT'08. IEEE/WIC/ACM International Conference on,2008[C]. IEEE.
    [61]Zhang Z, Zhou T, Zhang Y. Personalized recommendation via integrated diffusion on user-item-tag tripartite graphs[J]. Physica A:Statistical Mechanics and its Applications, 2010,389(1):179-186.
    [62]Wartena C, Brussee R, Wibbels M. Using tag co-occurrence for recommendation:Intelligent Systems Design and Applications,2009. ISDA'09. Ninth International Conference on,2009[C]. IEEE.
    [63]GW.Flake, S.Lawrence, C.L. Giles, and F.M. Coetzee. Self-organization and Identification of Web Communities[J]. Computer,2002,35(3),66-71.
    [64]S.H. Straogatz. Exploring Complex Networks[J]. Nature,2001,410,268-276.
    [65]D.J. Watts and S.H. Strogatz. Collective Dynamics of Small World Networks[J]. Nature, 1998(393),440-442.
    [66]Girvan M, Newman M E J. Community structure in social and biological networks[C]. Proceedings of the National Academy of Sciences of the United States of America,2002, 99(12):7821-7826.
    [67]Yan, Tak W., and Hector Garcia-Molina. The SIFT information dissemination system[J]. ACM Transactions on Database Systems (TODS) 24.4 (1999):529-565.
    [68]Goldberg, Ken, et al. Eigentaste:A constant time collaborative filtering algorithm[J]. Information Retrieval 4.2 (2001):133-151.
    [69]Kautz, Henry, Bart Selman, and Mehul Shah. Referral Web:combining social networks and collaborative filtering[J]. Communications of the ACM 40.3 (1997):63-65.
    [70]Srivastava, Jaideep, et al. Web usage mining:Discovery and applications of usage patterns from web data[J]. ACM SIGKDD Explorations Newsletter 1.2 (2000):12-23.
    [71]Brusilovsky, Peter. Adaptive hypermedia[J]. User modeling and user-adapted interaction 11 (2001):87-110.
    [72]Wasserman, Stanley. Social network analysis:Methods and applications[M]. Vol.8. Cambridge university press,1994.
    [73]http://www.cs.rochester.edu/u/kautz/referralweb/
    [74]Kautz, Henry, Bart Selman, and Mehul Shah. Referral Web:combining social networks and collaborative filtering[J]. Communications of the ACM 40.3 (1997):63-65.
    [75]http://tieba.baidu.com/
    [76]http://www.weibo.com/
    [77]Brants T, Chen F,Farahat A. A System for New Event Detection[C]. Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Informaion Retrieval, New York,2003.pp:330-337.
    [78]CorsoD G, GulliA, RomaniF. Ranking a Stream of News[C]. Proceedings ofthe 14th International Conference on World Wide Web, New York,2005.pp:97-106.
    [79]Holz F,Teresniak S. Towards Automatic Detection and Tracking of Topic Change[J]. Computational Linguistics and Intelligent Text Processing,2010.pp:327-339.
    [80]Yao J, Cui B, Huang Y. Bursty Event Detection from Collaborative Tags[J]. World Wide Web Internet and Web Information Systems,2012,15(2).pp:171-195.
    [81]Zhang C, Fan X,Chen X. Hot Topic Detection on Chinese Short Text[J]. Communications in Computer and Information Science,2011.pp:207-212.
    [82]Sudhamathy G,Venkateswaran C. Web Log Clustering Approaches-a Survey[J]. International Journal on Computer Science and Engineering,2011,3(7).pp:2896-2902.
    [83]刘挺,吴岩,王开铸.串频统计和词形匹配相结合的汉语自动分词系统[J].中文信息学报,1998,12(1)：17-25
    [84]孙宾.现代汉语文本的词语切分技术[J].北京大学计算语言学研究所,2003.http://icl.pku.edu.cn/bswen/nlp/reportl-sementation.html
    [85]张华平,刘群.基于N-最短路径方法的中文词语粗分模型[J].中文信息学报,2002,16(5)：1-7
    [86]http://www.ltp-cloud.com/
    [87]Ronen Feldman,Ido Dagan.KDT-Knowledge Discovery in Textual Databases[C].In Proceedings of the 1 st Annual Conference on Knowledge Discovery and Data Mining. Montreal. 1995,112-117
    [88]J. Mothe,C. Chrisment, T. Dkaki. Information mining-use of the document dimensions to analyse interactively a document set[J]. European Colloquium on Information Retrieval Research. 2001,6-20
    [89]H. Karanikas, C. Tjortjis, B. Theodoulidis. An Approach to Text Mining using Information Extraction[C]. Proceedings of the Fourth European Conference on Principles and Practice of Knowledge Discovery in Databases. Lyon, France.2000,9:13-16
    [90]Alexandros Karatzoglou, Ingo Feinerer. Kernel-based machine learning for fast text mining in R[J]. Computational Statistics & Data Analysis,2010,54(2):290-297
    [91]Hui-Chuan Chu, Ming-Yen Chen, Yuh-Min Chen. A semantic-based approach to content abstraction and annotation for content management J]. Expert Systems with Applications, 2009,36(2):2360-2376
    [92]Pui Cheong Fung, G. Xu Yu, J. Wai Lam. Stock Prediction:Integrating text mining approach using real-time news[C]. Proceedings of 2003 IEEE International Conference on Computational Intelligence for Financial Engineering,2003,395-402
    [93]Harte, H. Lu, Y. Osborn, S. Dehoney, D. Chin, D. Refining the extraction of relevant documents from biomedical literature to create a corpus for pathway text mining[C]. Proceedings of the 2003 IEEE:Bioinformatics Conference,2003:644-645
    [94]Li Jingyang, Sun Maosong. Non-independent Term Selection for Chinese Text Categorization[J]. Tsinghua Science And Technology.2009,14(1)
    [95]Xu G, Zhang Y, Li L. Web Mining and Social Networking:Techniques and Applications[M]. Berlin:Springer,2010.pp:127-156.
    [96]Lee S, Lee S, Kim K. Bursty Event Detection from Text Streams for Disaster Management[C]. Proceedings of the 21st International Conference Companion on World Wide Web, New York, 2012.pp:679-682.
    [97]Yllias Chali, Sadid a. Hasan. Query-focused multi-document summarization:Automatic data annotations and supervised learning approaches[J]. Natural Language Engineering.2012,18(1): 109-145.
    [98]Danushka Bollegala, Naoaki Okazaki, Mitsuru Ishizuka. A bottom-up approach to sentence ordering for multi-document summarization[J]. Information Processing and Management.2010, 46(1):89-109.
    [99]Hien Nguyen, Santos, E., Russell, J. Evaluation of the Impact of User-Cognitive Styles on the Assessment of Text Summarization [J]. IEEE Transactions on Systems, Man and Cybernetics, Part A:Systems and Humans.2011,41(6):1038-1051.
    [100]H.P. Luhn. The Automatic Creation of Literature Abstracts[J]. IBM Journal of Research and Development.1958,2(8):159-165.
    [101]B. A. Mathis, J. E. Rush. Abstracting. Encyclopedia of Computer and Technology[J]. Vol.1, New York:Marcel Dekker Inc.,1975,102-142.
    [102]J. J. Pollock, A. Zamora. Automatic abstracting research at chemical abstracts service[J]. Journal of Chemical Information and Computer Sciences.1975,15(4):226-232.
    [103]T. Nomoto, Yuji Matsumoto. Data reliability and its effects on automatic abstracting[C]. Proceedings of the Fifth Workshop on Very Large Corpora, Beijing,1997:113-126.
    [104]苏海菊,王永成.中文科技文献文摘的自动编写[J].情报学报.1989,8(6)：433-439.
    [105]姚天顺,等.自然语言理解-一种让机器懂得人类语言的研究[M].北京：清华大学出版社，1995.
    [106]吴立德.大规模中文文本处理[M].上海：复旦大学出版社,1997
    [107]Christohper D. Manning, Hinrich Schutze. Foundations of Statistical Natural Language Processing[M]. Beijing:Publishing House of Electronics Industry,2005:182-189.
    [108]郭玉菁,万敏.面向非受限领域的综合式自动中文文摘[J].清华大学学报,2002,42(1)：139-142.
    [109]Fung, Pascale, Grace Ngai, and Chi-Shun Cheung. Combining optimal clustering and hidden Markov models for extractive summarization[C]. Proceedings of the ACL 2003 workshop on Multilingual summarization and question answering-Volume 12. Association for Computational Linguistics,2003.
    [110]Tang, J., Wang, X., Gao, H., Hu, X., & Liu, H. (2012). Enriching short text representation in microblog for clustering[J]. Frontiers of Computer Science,6(1),88-101.
    [111]Bouras Christos, Tsogkas Vassilis. A clustering technique for news articles using Wordnet[J]. Knowledge-Based Systems, vol.36, pp.115-128,2012.
    [112]梅家驹,竺一鸣,高蕴琦,殷鸿翔.同义词词林[M].上海上海辞书出版社,1983
    [113]Lian Jie, Liu Yun. Web Data Preprocessing and Automatic Abstract for the Research of Public Opinion[J]. Journal of Beijing Jiaotong University, vol.34, no.5, pp.45-53,2010.
    [114]Resnick P, Varian H R. Recommender systems[J]. Communications of the ACM,1997,40(3): 56-58
    [115]Agrawal R, Imielinski T, Swami A. Mining Association Rules between Sets of Items in Large Database[C]. In Proceedings of ACM SIGMOD,1993,207-216
    [116]Savasere A., Omiecinski E., Navathe S.An efficient algorithm for mining association rules in large databases[C]. Proceedings of the 21st International Conference of Very Large Databases, Zurich, Switzerland,1995,432-444
    [117]Han J, Pei J, Yin Y. Mining frequent patterns without candidate generation[C]. In Proceedings of 2000 ACM-SIGMOD International Conference on Management of Data,Dallas:ACM Press, 2000,1-12
    [118]Park J S, Chen M S, Yu P S. An Effective Hash Based Algorithm for Mining Association Rules[C]. In Proceedings of 1995 ACMSIGMOD International Conference Management of Data, San Jose:ACM Press,1995,175-186
    [119]Mladenic D. Machine learning for better Web browsing[C]. AAAI 2000 Spring Symposium Technical Reports on Adaptive User Interfaces. Menlo Park, CA:AAAI Press,2000,82-84
    [120]Pazzani M, Muramatsu J, Billsus D. Syskill&Webert:Identifying Interesting Web Sites[C]. In Proceedings of the Thirteenth National Conference on Artificial Intelligence. Menlo Park:AAAI Press,1996,54-61
    [121]Goldberg D, Nichols D, Oki B M, et al. Using Collaborative Filtering to Weave an Information Tapestry[J]. Communications of the ACM,1992,35(12):61-70.
    [122]Resnick P, Iacovou N, Suchak M, et al. GroupLens:an open architecture for collaborative filtering of netnews[C]. Proceedings of the 1994 ACM conference on Computer supported cooperative work.1994.
    [123]http://grouplens.org/datasets/movielens/
    [124]J. Herlocker, J. Konstan, L. Terveen. Evaluating Collaborative Filtering Recommender Systems [J]. ACM Trans on Information Systems (TOIS),2004,22(1),5-53.
    [125]R. Burke. Hybrid Systems for Personalized Recommendations [J]. Intelligent Techniques for Web Personalization, Springer Berlin Heidelberg,2005,133-152.
    [126]H. Zhang, A. C. Berg, M. Maire, J. Malik. SVM-KNN:Discriminative nearest neighbor classification for visual category recognition[C]. Computer Vision and Pattern Recognition, IEEE Computer Society Conference,2006,2,2126-2136.
    [127]C. H. Ho, C. J. Lin. Large-scale linear support vector regression[J]. Journal of Machine Learning Research,2012,13,3323-3348.
    [128]Wasserman, Stanley. Social network analysis:Methods and applications[M]. Vol.8. Cambridge university press,1994.
    [129]Aiello, William, Fan Chung, and Linyuan Lu. A random graph model for massive graphs[C]. Proceedings of the thirty-second annual ACM symposium on Theory of computing. ACM,2000.
    [130]Newman, Mark EJ, Stephanie Forrest, and Justin Balthrop. Email networks and the spread of computer viruses[J]. Physical Review E 66.3 (2002):035101.
    [131]Newman, M. E. J. Who is the best connected scientist? A study of scientific co-authorship networks[J]. Complex networks. Springer Berlin Heidelberg,2004.337-370.
    [132]Adamic, Lada A., and Eytan Adar. Friends and neighbors on the web[J]. Social networks 25 (2003):211-230.
    [133]T. Zhou, L.-Y. Lu, Y.-C. Zhang. Predicting missing links via local information[J]. The European Physical Journal B,71 (2009) 623-630.
    [134]L.-Y. Lu, T. Zhou. Link prediction in weighted networks:the role of weak ties[J]. EPL (Europhysics Letters) 89 (2010):18001.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700