基于微博客的社区挖掘研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
作为当今最火的互联网应用,微博客正以燎原之势俘获广大用户,在2010年10月份的一份统计中,仅仅是新浪微博就有超过五千万的用户量,而twitter(推特)用户数量更是突破了两个亿,成为世界上使用用户最多的互联网应用之一。
     面对这样庞大的用户数量,无论网络管理者还是网络使用者都面临一个全新的课题:如何找到与自身相关的人群来互动,也就是传统上的社群概念。为了解决这个问题,我们打算基于数据挖掘的理论,寻找一种有效的社区挖掘算法。和传统的社区挖掘不同,本算法应用领域将是基于真实信息并有着庞大数量级的微博客用户,这要求算法的领域模型和以往将有着比较大的区别并且在时间复杂度上有了更高的要求。针对微博客的特点,我们尝试建立了聚类模型,以朴素贝叶斯模型为基础,构造了用户和社区之间的概率评分机制,给出了对用户进行社区划分的一种思路。此外,为了解决社区挖掘中寻找中心节点的需求,我们还研究了用户重要性算法,将微博客用户抽象成一个模型,给出了多维变量的一个评分标准。为了验证以上算法,我们还进行了离线实验。采取网络爬虫从国内著名微博客服务商新浪微博处获取七十万左右的用户数据,在这个数据集上进行前文提到的用户重要性算法和社区挖掘算法的实验,都取得了不错效果。
     另外,我们将通过社群挖掘技术研究如何寻找微博客平台上的“意见领袖”,这将使得针对网络的分析和管理更加有的放矢。
As the most fashion internet application, Micro-blogging attracts cyber citizen. In the report from sina on Oct, 2010, there are over 50 million people who use sina Weibo. And the user of twitter is even more over 0.2 billion.
     The Micro-blogging fast increase leads to a new question: how can we find a companion from so many people, which we call community. We will find the answer to this question aim to some data mining knowledge. It is different from the tradition community-finding, because the micro-blogging has its own feature, which needs the algorithm must be more efficient and the model must be more complex. We will research Community-finding, build a model base on the Bayesian model comparison, and find a way to quantify the relationship between one user and one community. We also research on user-influence in the micro-blogging. We build the model from the data of user, quantify the influence of users. To check these algorithms, we do many experiments on the off-line data. We collect data from the sina Weibo through the web crawler, and obtain over 700,000 user data. After these experiments, we prove our algorithms are effective
     Besides this, we also try to finding the opinion leader in our study. We brings an algorithm about user influence, and the demo based on this theory show it's effective to find the opinion leader.
引文
[1]汝信,陆学芝.《2010年中国社会形势分析与预测》.社会科学文献出版社. 2009年12月出版. 9~14
    [2]中华人民共和国国务院新闻办公室.《中国互联网状况》.人民出版社.2010年6月出版. 2~3
    [3] S. Milstein, A. Chowdhury, G. Hochmuth et al. Twitter and the micro-messaging revolution: Communication, connections, and immediacy-140 characters at a time. O'Reilly Report. 2008. 19~25
    [4] Huberman, Bernardo A., Romero, Daniel M. and Wu, Fang. Social Networks that Matter: Twitter Under the Microscope. arXiv:0812.1045v1. 2008. 18~20
    [5] Java. A, Song. X, Finin. T et al. Why we twitter: Understanding micro-blogging usage and communities. In Proceedings of the 13th ACM SIGKDD. 2007. 103~114
    [6] McFedries, P. 2007. Technically speaking: All a-twitter. IEEE Spectrum. 2007. 44(10). 84~92
    [7] Stevens, V. Trial by Twitter: The Rise and Slide of the Year’s Most Viral Microblogging platform. TESL-EJ. 2008. 12(1). 73~80
    [8] Krishnamurthy. B, Gill. P, Arlitt. M. A few chirps about twitter. In Proceedings of WOSP '08. 2008. 19~24
    [9] Huberman, Bernardo A., Romero, Daniel M. and Wu Fang. Social Networks that Matter: Twitter Under the Microscope. arXiv:0812.1045v1. 2008. 33~43
    [10] Dejin Zhao , Mary Beth Rosson. How and why people Twitter: the role that micro-blogging plays in informal communication at work. In Proceedings of the ACM 2009 international conference on Supporting group work. 2009. 201~213
    [11] Günther.O, Krasnova.H, Riehle.D et al. Modeling Microblogging Adoption in the Enterprise. In AMCIS 2009 Proceedings. 2009. 178~189
    [12] Jun Zhang, Yan Qu, Jane Cody et al. A case study of micro-blogging in the enterprise: use, value, and related issues. In CHI '10 Proceedings of the 28th international conference on Human factors in computing systems. 2010. 123~132
    [13] Ken Wakita, Toshiyuki Tsurumi. Finding Community Structure in Mega-scale Social Networks. In Proceedings of the 16th international conference.2007. 1275~1276
    [14] D. Gibson, J. Kleinberg, P. Raghavan. Inferring web communities from link topology. In HYPERTEXT’98: Proceedings of the ninth ACM conference on Hypertext and hypermedia: links, objects, time and space—structure in hypermedia systems. 1998. 225~334
    [15] J. M. Kleinberg. Authoritative sources in a hyperlinked environment. In SODA’98: Proceedings of the ninth annual ACM-SIAM symposium on Discrete algorithms. 1998. 668~677
    [16] Brin.S, Page.L. The anatomy of a large-scale hypertextual Web search engine. Computer Networks and ISDN Systems. 1998, 30(7). 107~117
    [17] L.Page. S.Brin. R.Motwani. and T.Winograd. The Pagerank citation ranking: Bringing order to the web. Technical report. Stanford University. 1998.
    [18] PK Reddy, M Kitsuregawa. An approach to relate the Web communities through bipartite graphs. Proceedings of the second WISE. 2001,23(3). 204~268.
    [19] GW Flake, S Lawrence, CL Giles, Efficient Identification of Web Communities. Proceedings of the sixth ACM SIGKDD international conference, 2000, 6(2), 28-32
    [20] M. E. J. Newman, M. Girvan. Finding and evaluating community structure in networks. Phys. Rev. 2004. E 69. 175~191
    [21] Aaron Clauset, M. E. J. Newman, Christopher Moore. Finding community structure invery large networks. Phys. Rev. 2004. E 70. 67~86
    [22] M.E.J.Newman. Finding community structure in networks using the eigenvectors of matrices. Physical Review. 2006.E v74. 75~79
    [23] Hotho A, R J schke, C Schmitz et al. Information retrieval in folksonomies: search and ranking. Sure Y and Dominguez. In The semantic Web: Research and Applications. Berlin: Springer. 2006. 411~426.
    [24] Bao S, Xue G, Wu x et al. Optimizing web search using social annotations. Carey W and Zurko M. Proceedings of the 16th international conference on World Wide Web. New York: ACM. 2009. 501~510
    [25] H. Kwak, C.Lee, H. Park et al. What is twitter, a social network or a news media In WWW’10: Proceedings of the 19th international conference on World Wide Web. 2010. 591~600
    [26] D. Kempe, J. Kleinberg,é. Tardos. Influential nodes in a diffusion model for social networks. In ICALP 2005: Proceedings of the 32nd International Colloquium on Automata, Languages and Programming. 2005. 1127~1138
    [27] Jianshu Weng, Ee-Peng Lim, Jing Jiang et al. TwitterRank: Find Topic-sensitive Influential Twitterers. In Proceedings of the third ACM international conference on Web search and data mining. 2010. 480~502
    [28] Changhun Lee, Haewoon Kwak. Finding Influentials Based on Temporal Order of Information Adoption in Twitter. In Proceedings of the 19th international conference on World wide web. 2010. 1137~1138
    [29] R.G.Miller. Beyond ANOVA. basics of applied statistics. Wiley Series in Probability And Mathematical Statistics. Wiley. 1986. 367~369
    [30]朱广宇,毕军,钱大琳,雷黎.一种多维度评价向量的排序模型及一致性证明.北京交通大学学报.2007.第31卷第3期. 35~49
    [31] Gardner J. Exponential Smoothing: The State of the Art. Journal of Forecasting, 1985(4). 1~28
    [32]朱广宇,严洪森.一种预测模型库的评价遴选组合模型.控制与策略. 2004,19(7). 726~731
    [33] Matthew Nisbet. A Two Step Flow of Influence: Opinion Leader Campaigns on Climate Change. Science Communication. 2009. 280~289
    [34] M.Newman. The structure and function of complex networks. SIAM Review. 2003. 45( 2). 167~256
    [35]刘耀庭.社交网络结构研究硕士学术论文.浙江大学图书馆. 2008
    [36] T.H.Cormen, C.E.Leiserson, R.L.Rivest et al. Introduction to Algorithms(2nd ed). MIT Press, Cambridge, MA. 2001.595~599
    [37] J.M.Anthonisse. Technical Report BN 9/71. Stitching Mathematics Centrum, Amsterdam. 1971. 71~75
    [38] T.Toni, M.P.H. Stumpf. Simulation-based model selection for dynamical systems in systems and population biology. Oxford Univ Press. 2010.26(1). 104~110
    [39] H. Jeffreys. The Theory of Probability (3rd ed). Oxford Univ Press.(1961. 432~435
    [40] Dijkstra, E. W. A note on two problems in connation with graphs. Numerische Mathematics 1. 1959. 269~271

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700