基于微博数据的用户影响力分析研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
近年来,随着互联网的飞速发展,网络已经成为人们日常生活中获取信息的主要渠道。微博作为近年来快速发展起来的网络新兴媒体,已积累上亿用户。微博平台包含信息量大,信息更新速度快,常常使用户淹没在信息的海洋,帮助用户找到影响力大的用户所发表的微博信息具有重要意义。微博平台推出的检索功能是帮助用户找寻微博信息的良好途径。传统的信息检索包含相关性,权威性,时效性三个关键因素。微博平台由于内容更新快速,发表内容用语不规范,所以时效性和权威性往往具有更加重要的意义。本文的影响力分析也是对权威性的研究。
     本文利用微博数据,对用户的影响力进行分析研究,主要成果包括以下内容:
     1.微博数据的获取。本文研究初期,从微博平台抓取大量用户数据,包括用户的详细信息,用户关注关系,回复转发关系等。这部分数据是本文研究的基础工作,也可作为微博其他研究的基础数据。
     2.本文对于微博用户影响力的研究,目标是识别用户在不同领域的不同影响力。本文从用户发表的微博内容及用户之间的关注关系对微博用户所属领域进行划分,并得出用户在各个领域的权重。通过半自动的标注样本验证,该划分方法具有比较准确的效果。
     3.本文在对用户发表的微博内容做文本分析的同时,通过并行的新词识别算法识别微博内容中的新词,并利用搜索引擎的相关搜索对重要文本特征做语义扩展,解决了微博文本内容短小,特征稀疏,无意义特征过多,有区分度的特征较少等一系列问题。
     4.本文利用用户在不同领域的分类权重,基于用户间的回复和转发微博关系,构建领域相关的影响力传播模型,经过对比验证,该方法具有不错的效果。
In recent years, with the rapid development of the Internet, the influence of traditional media such as television, newspapers and radio has been gradually caught up with by new media from the internet. Internet has become the main channel that people used to send and receive information in the daily lives. As a fast developing new media on the Internet, micro blogging has accumulated hundreds of millions of users. The micro blogging platform contains a large amount of information and the update speed of the information is fast so that it often makes users could not find the information they need. It is important to help users find the information which was sent by people who have a great influence. Micro blogging content search system launched by micro blogging platform is a good way to help users find the micro blogging content from large amount of information. Traditional information retrieval system have three key factors which are relevant, authoritative and timeliness. The content of the micro blogging platform published and updated is very fast, and the content is not standardized, so that the timeliness and authoritativeness tend to be more important. The analysis of micro blogging users is also for researching the authoritative of users.
     In this paper, I study on micro blogging users' influence by using micro blogging data, and the major achievements is as follows.
     The crawling of micro blogging data. In the beginning of my research, I crawled a large amount of micro blogging data from the micro blogging platform, including detailed information of the users, users' concerned relationship, reply and repost relationship. These micro blogging data are the basis of my research of this paper and it can also be used as the basic data of other micro blogging relative research.
     The purpose of research on influence of micro blogging users in this paper is to identify the influence of different users in different areas. In this paper,I divided the users in different areas by using two different features which are the content of the micro blogging and the concerned relationship of the users. During this research, I also calculate the weight of the micro blogging users in different areas. The classification of the micro blogging users has high accuracy by some semi-automatic annotation sample validation
     During I did the text mining on the micro blogging content, I used the parallel new word recognition algorithm to recognition the new words in the micro blogging content and used the relative search of the searching engine to do semantic extensions on some important text feature. By using these, we solved a series of questions which are only belong to the micro blogging content such as text content is short, the features vector are sparse, and too many meaningless features.
     In this paper, I build a topic relative influence propagation model based on the reply and repost relationship between micro blogging users by using the users' different classification weights in different areas. The propagation model has a good performance by doing some contrast experiment.
引文
[1]Shaomei Wu.; Jake M. Hofman. Who says what to whom on Twitter. In WWW 2011 (2011).
    [2]Leavitt A.; Burchard E.; Fisher D. and Gilbert S. (2009). The Influentials:New Approaches for Analyzing Influence on Twitter. In Web Ecology Project.
    [3]Cha M.;Haddadi H.; Benevenuto F. and Gummadi K. P. (2010). Measuring User Influence inTwitter:The Million Follower Fallacy. In Association for the Advancement of Artificial Intelligence.
    [4]Naohiro Matsumura.:Yukio Ohsawa. Influence Diffusion Model in Text-Based Communication. In Transactions of the Japanese Society for Artificial Intelligence.
    [5]Zhang J. Ackerman M. Adamic L.:Expertise networks in online communities: structure and algorithms. In:J. WWW 2007 (2007)
    [6]Weng J.; Lim E.-P. and Jiang J. (2010). TwittererRank:Finding Topic-sensitive Influential Twitterers.In ACM WSDM'10.
    [7]Meng Zhang.:Caihong Sun* and Wenhui Liu. IDENTIFYING INFLUENTIAL USERS OF MICRO-BLOGGING SERVICES:A DYNAMIC ACTION-BASED NETWORK APPROACH. In Pacific Asia Conference on Information Systems 2011.
    [8]Xiao Yu.:Xu Wei and Xia Lin. Algorithms of BBS Opinion Leader Mining Based on Sentiment Analysis. In Web Information Systems and Mining. Proceedings International Conference WISM 2010.
    [9]J. Tang, J. Sun, C. Wang, and Z. Yang. Social Influence Analysis in Large-scale Networks. Proc. of 15th Intl. Conf. on Knowledge Discovery and Data Mining (SIGKDD 2009).Paris, France,2009, pp.807-816.
    [10]田军伟.基于社会网络的用户兴趣模型研究.电子科技大学硕士论文,2010.
    [11]Jianfeng Zhang.:Yunqing Xial and Bin Ma. Thread Cleaning and Merging for Microblog Topic Detection. In IJCNLP 2011.
    [12]Beaux Sharifi.:Mark-Anthony Hutton and Jugal K. Kalita. Experiments in Microblog Summarization. In IEEE Socialcom 2010 (2010).
    [13]J. Ross Quinlan. C4.5:Programs for Machine Learning. Morgan Kaufmann, San Mateo, CA,1993.
    [14]Y. Yang. Expert network:Effective and efficient learning from human decisions in text categorization and retrieval. In SIGIR-94,1994.
    [15]A. McCallum and K. Nigam. A comparison of event models for naive bayes text classification. In AAAI-98 Workshop on Learning for Text Categorization,1998.
    [16]C. Cortes and V. Vapnik. Support vector networks. Machine Learning,20:273-297,1995.
    [17]周赞.垂直搜索引擎spider技术的研究与应用
    [18]Tian Zhu, Bin Wu, Bai Wang. Social Influence and Role Analysis based on Community Structure in Social Network. Lecture Notes in Computer Science Advanced Data Mining and Applications-5th International Conference ADMA 2009 v 5678 LNAI pages 788-795
    [19]Haveliwala T. H. (2003). Topic-Sensitive PageRank:A Context-Sensitive Ranking Algorithm
    [20]E.H. Han and G. Karypis. Centroid-based document classification algorithms: Analysis & experimental results. Technical Report TR-00-017 Department of Computer Science University of Minnesota Minneapolis 2000. Available on the WWW at URL http://www.cs.umn.edu/-karypis.
    [21]Tan S. An improved centroid classifier for text categorization Expert Systems with Applications (2007) doi:10.1016/j.eswa.2007.06.028.
    [22]Haodi Feng, Kang Chen, Xiaotie Deng, Weimin Zheng. Accessor Variety Criteria for Chinese Word Extraction Computational Linguistics. Volume 30 Number 11 March 2004 pp.75-93(19)
    [23]Zhai Z.W. Hua X.:Identifying opinion leaders in BBS. J. IEEE Proceedings of Web Intelligence and Intelligent Agent Technology (2008)
    [24]Newman M.E. Girvan M.:Finding and evaluating community structure in networks. J. Physical Review E (2004)
    [25]Cha M.; Mislove A.; and Gummadi K. P.2009. A Masurement-Driven Analysis of Information Propagation in the Flickr Social Network. In WWW.
    [26]Ginsberg J.; Mohebbi M. H.; Patel R. S.; Brammer L.; Smolinski M. S.; and Brilliant L.2009. Detecting Influenza Epidemics Using Search Engine Query Data. Nature 457.
    [27]Goyal A.; Bonchi F.; and Lakshmanan L. V. S.2010. Learning Influence Probabilities In Social Networks. In ACM WSDM.
    [28]Watts D. and Dodds P.2007. Influentials Networks and Public Opinion Formation. Journal of Consumer Research.
    [29]S. Abrol and L. Khan. Twinner:understanding news queries with geo-content using twitter. In GIR'10:Proceedings of the 6th Workshop on Geographic Information Retrieval pages 1-8 New York NY USA 2010. ACM.
    [30]Q. Mei, D. Cai, D. Zhang, and C. Zhai. Topic modeling with network regularization. In Proceedings of the 17th International World Wide Web Conference (WWW'08), pages 101-110,2008.
    [31]S. Papadimitriou and J. Sun. Disco:Distributed co-clustering with map-reduce. In Proceedings of IEEE International Conference on Data Mining (ICDM'08),2008.
    [32]P. Singla and M. Richardson. Yes, there is a correlation:-from social networks to personal behavior on the web. In Proceeding of the 17th international conference on World Wide Web (WWW 08), pages 655-664,2008.
    [33]A. Anagnostopoulos, R. Kumar, and M. Mahdian. Influence and correlation in social networks. In Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining (KDD'08), pages 7-15,2008.
    [34]A. Das, M. Datar, A. Garg, and S. Rajaram. Google news personalization:Scalable online collaborative filtering. In Proceeding of the 16th international conference on World Wide Web (WWW 07),2007.
    [35]Chih-Chung Chang and Chih-Jen Lin, LIBSVM:a library for support vector machines. ACM Transactions on Intelligent Systems and Technology,2:27:1-27:27,2011.
    [36]Kansheng Shi, Haitao Liu et al. AN IMPROVED KNN TEXT CLASSIFICATION ALGORITHM BASED ON DENSITY, proceedings of IEEE CCIS2011, pp.113-117
    [37]Jiang Bian, Yan dong Liu, Ding Zhou, Eugene Agichtein, Hongyuan Zha. Learning To Recognize Reliable Users and Content in Social Media with Coupled Mutual Reinforcement[C]. In WWW 2009,2009,51-60.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700