大规模社交网络中局部兴趣社区发现研究

设为首页

收藏本站

网站地图 | English | 公务邮箱

远程访问

NSTL服务站

大规模社交网络中局部兴趣社区发现研究

详细信息本馆镜像全文| 推荐本文 | | 获取CNKI官网全文

英文题名：Research on Local Interested Community Detection in Large-Scale Social Network
作者：尹红军
论文级别：博士
学科专业名称：计算机软件与理论
中文关键词：社交网络 ; 兴趣建模 ; 社区发现 ; 微博营销 ; 个性化PageRank
英文关键词：social network ; user interest modeling ; community detect ; microblogging
英文关键词：marketing ; personalized PageRank
学位年度：2014
导师：李京
学科代码：081202
学位授予单位：中国科学技术大学
论文提交日期：2014-04-01

摘要

随着web2.0时代的到来,越来越多的数据呈现在互联网上,更多的体现用户在网络上的互相交互。人们既生产大量网络数据、又同时对其进行消费。人们的生产、生活、学习、娱乐也越来越离不开互联网。社交网络把现实中人与人之间的关系建立在互联网上,加强人们之间的交流和互动,促进信息更快的在世界范围内流动。随着Facebook上市,社交网络也越来越多的受到人们的关注。Facebook是一种强关系的社交网络,用户可以利用其进行朋友之间的关系的促进改善和维护；微博Twitter是一种弱关系的社交网络,在其上容易形成意见领袖和信息快速传播,有利于社交网络的广告推广和营销；Linkedin是专注于商务人士拓展业务,求职招聘等进行商务交流的专业社交平台。国内也有很多社交网络如腾讯微博、嘀咕、9911、随心微博、新浪微博、搜狐微博、Follow5、网易微博、品品米、MySpace聚友网、百度i贴、同学网、饭否等,其中比较知名的新浪微博比较类似于Twitter。
     截至2012年12月份,国内知名社交网络新浪微博用户规模达到5亿；时至2012年7月国外著名社交网络Twitter用户数量超过5亿；另外一个世界知名社交网站FaceBook用户数量达到10亿以上。据国外知名数据公司PingDorn的数据显示,全球已有几十亿社交网络用户,同时社交网络链接和网页插件已经占据全世界所有网站的四分之一。对社交网络进行分析,发现社交网络中各种社区对于商品推荐、广告推送、朋友推荐以及对社交网络进行划分具有十分重要的意义。
     本文在调研分析大规模社交网络发展和研究的基础上,主要就如何有效挖掘社交网络中兴趣社区做了深入研究。本文先就其两个子问题社交网络中个性化兴趣的建模和个性化PageRank高效计算进行研究。在完成兴趣建模和个性化PageRank高效计算的基础上,进行大规模社交网络的兴趣社区探测。
     首先,采用用户好友关系信息、用户发布和转发微博信息作为兴趣信息,针对普通用户和特殊用户的不同,分别提出以关注对象为兴趣的三层模型和以发布微博为兴趣的两层模型的模型表示方法。针对微博内容为兴趣建模,提出基于LDA改进的微博兴趣分类方法。针对用户兴趣改变的问题,提出基于用户微博内容为反馈的贝叶斯方法,同时还提出基于兴趣社区发现为目的的用户兴趣偏好模型。最后通过以用户标签作为参考对模型进行评估,得出模型在标签较充足时能有80%以上的查准率和查全率。
     其次,个性化PageRank作为信息检索和数据挖掘领域的重要算法,随着数据规模的不断增大,有必要对其进行优化和加速。传统迭代方法比较耗费时间和空间,本文使用基于Monte Carlo随机步方法。MapReduce适合数据密集型计算,不适合大量的迭代,本文提出基于MPI的分布式算法。改进先前的二路合并方法到基于Fibonacci的方法,从理论上性能有30%左右的提高,在大量真实数据的实验上得出该方法相对基础方法性能提高10%到40%。
     最后,由于社区结构信息包含成员关系连接信息和成员本身的个性化信息,提出考虑结构信息和节点自身属性特征的基于个性化PageRank的社区发现方法。针对日益发展的社交网络的数据大规模性,提出局部的社区分析方法和将算法改进适应在分布式计算架构MapReduce之上。由于大部分社区探测方法不适合用在具有千万级甚至更大用户规模的社区分析,而Metis方法是少有的能处理如此大规模的网络分析工具,本文将所提出的基于个性化PageRank方法与Metis方法进行比较,凸显本文提出方法具有更好的社区探测能力,能找到聚簇性很强的局部社区。另外,本文通过MapReduce扩展实验说明了方法的可扩展性和高效性。
With the advent of web2.0era, more and more data is presented on the network, users interact with each other more reflected on the network. People are producers of network data, but also the consumer of network data. People's production and living, learning, entertainment are more and more inseparable from the Internet. The reality of the relationship between people based on the Internet by social networks, strengthen exchanges and interaction between people and promote a faster flow of information across the world. Listed as Facebook, social networking attract a growing number of people's attention. Facebook is a social network based on strong relationships and help maintain and improve relations between friends; Twitter is a weak relationship social networks, contributes to opinion leaders and the rapid dissemination of information, in favor of advertising marketing in social networks; Linkedin is a professional social platform focusing on the business people to expand their business, job recruitment and other business communication. There are also a lot of social networks in China such as Fan Fou, Di Gu, Suixin Weibo, Sohu Weibo, Follow5, Sina Weibo, Tencent Weibo, NetEase Weibo, Pin Pin Mi, classmates network, MySpace,9911, Baidu I Tie etc., in which the more well known Sina Weibo is similar to Twitter.
     As of December2012, the number of users in the well-known social network Sina Weibo reached500million; July2012the number of foreign social network Twitter users reached517million; another world-renowned social networking site FaceBook number of users reached1billion. According to data monitoring data known foreign companies PingDom released, social networking links and web plugins have occupied25percent of all network traffic worldwide, has billions of social network users worldwide. On social network analysis, it has a very important meaning to find a variety of community on social network for commodity recommendation, advertising push, friends recommendation, as well as divide the social network.
     Based on the analysis of the development and research of large-scale social network, this paper mainly made a thorough study on how to effectively tap the communities of interest in large-scale social network. This paper first study of its two sub-problems on social network including user personalized interest modeling and personalized PageRank efficient computing. Upon completion of interest modeling and efficient personalized PageRank calculation, we perform to detect large-scale interest community on the social network.
     First, we use the relationship of users'friends, microblogging users published and forwarded as interest information. Different for ordinary users and specific users, we propose the three-level models using the concerned object as interest for ordinary users and two-level models using released micro-Bo as interest for specific users. In order to using Microblogging content as interest for modeling, we improve and propose microblogging interest classification based on LDA. For the user interest changing problem, we propose a bayesian method based on user microblogging content as feedback. Furthermore we raise user's preference model for the purpose of user interest community detection. Finally, we use user tags as a reference to evaluate the model, the model results can have more than80%precision and recall with adequate user tags.
     Secondly, personalized PageRank as an important algorithms in information retrieval and data mining area. With the increasing size of the data, it is necessary to optimize and accelerate the algorithms. Traditional iterative method is relatively time-consuming and space-consuming, we use a method based on Monte Carlo random walk. MapReduce is suited for data-intensive computing, but not suitable for a large number of iterations, this paper presents a distributed algorithm based on MPI. Improved two way consolidation method to the previous method of Fibonacci-based, theoretically performance has increased by about30%, which is relatively basic method derived performance increased by10%to40%of the large number of experiments on real data.
     Finally, because community containe the information of the members'structure and members'personalized information, we present a based on personalized PageRank community detecting methods considering structural information as well as the characteristics of the nodes themselves. For the growing mass data of the social network, we propose local communities analysis method and improve the algorithm to implement it on distributed computing MapReduce. As most of the community detection method is not suitable for the analysis of social network with ten million or even larger user scale, and Metis approach is rarely tools able to handle such a large-scale network analysis. We compare the Metis method and the method proposed in this paper, the proposed method has better ability to detect the community, can find clustered strong local communities. In addition, we employ MapReduce experimental which prove the scalability and efficiency of the method.

引文

陈抒然.2007.面向个性化服务的用户兴趣建模及应用研究(Doctoral dissertation,重庆：重庆大学).
    何黎,何跃,霍叶青.2011.微博用户特征分析和核心用户挖掘[J].情报理论与实践,34(11)：121-125.
    刘淇.2013.基于用户兴趣建模的推荐方法及应用研究(Doctoral dissertation,中国科学技术大学).
    孙威.2012.微博用户兴趣挖掘与建模研究[D](Doctoral dissertation,大连理工大学).
    许欢庆,& 王永成.2004.基于加权概念网络的用户兴趣建模.上海交通大学学报,38(1),34-38.
    应晓敏,刘明,﹠窦文华.2004.一种面向个性化服务的客户端细粒度用户建模方法.计算机工程与科学,25(6),39-42.
    Acquisti A, Gross R.2006. Imagined communities:Awareness, information sharing, and privacy on the Facebook[C]//Privacy enhancing technologies. Springer Berlin Heidelberg,:36-58.
    Adamic L, Adar E.2005. How to search a social network[J]. Social Networks,27(3):187-203.
    Agirre E, Soroa A.2009. Personalizing pagerank for word sense disambiguation[C]//Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics. Association for Computational Linguistics,33-41.
    Amazon Web Service.2008. AWS [EB/OL]. http://aws.amazon.com/
    Amazon.2008. Amazon[EB/OL]. http://www.amazon.com/
    Analytics P.2009. Twitter Study-August [J]. San Antonio, TX:Pear Analytics. Available at:www. pearanalytics. com/blog/wp-content/uploads/2010/05/Twitter-Study-August-2009. pdf,2009.
    Andersen, R., Chung, F., & Lang, K.2006. Local graph partitioning using pagerank vectors. In Foundations of Computer Science,2006. FOCS'06.47th Annual IEEE Symposium on (pp. 475-486). IEEE.
    Andersen, R., Chung, F., & Lang, K.2007. Local partitioning for directed graphs using PageRank. In Algorithms and Models for the Web-Graph (pp.166-178). Springer Berlin Heidelberg.
    Avrachenkov K, Litvak N, Nemirovsky D, et al.2007. Monte Carlo methods in PageRank computation:When one iteration is sufficient[J]. SIAM Journal on Numerical Analysis,45(2): 890-904.
    Bagrow J P.2008. Evaluating local community methods in networks[J]. Journal of Statistical Mechanics:Theory and Experiment,2008(05):P05001.
    Bahmani B, Chowdhury A, Goel A.2010. Fast incremental and personalized PageRank[J]. Proceedings of the VLDB Endowment,4(3):173-184.
    Bahmani B, Chakrabarti K, Xin D.2011. Fast personalized pagerank on mapreduce[C]//Proceedings of the 2011 ACM SIGMOD International Conference on Management of data. ACM,973-984.
    Balabanovic M, Shoham Y.1995. Learning information retrieval agents:Experiments with automated web browsing[C]//On-line Working Notes of the AAAI Spring Symposium Series on Information Gathering from Distributed, Heterogeneous Environments.13-18.
    Balabanovic, M., & Shoham, Y.1997. Fab:content-based, collaborative recommendation. Communications of the ACM,40(3),66-72.
    Berendt, B., Hotho, A., & Stumme, G.2002. Towards semantic web mining. In The Semantic Web—ISWC 2002 (pp.264-278). Springer Berlin Heidelberg.
    Berkhin P. (2005) A survey on pagerank computing[J]. Internet Mathematics,2(1):73-120.
    Berners-Lee, T., Hendler, J., & Lassila, O.2001. The semantic web.Scientific american,284(5), 28-37.
    Blei, D. M., Ng, A. Y, & Jordan, M. I.2003. Latent dirichlet allocation, the Journal of machine Learning research,3,993-1022.
    Biro, I., Szabo, J., & Benczur, A. A.2008. Latent dirichlet allocation in web spam filtering. In Proceedings of the 4th international workshop on Adversarial information retrieval on the web (pp.29-32). ACM.
    Bilenko, M., White, R. W., Richardson, M., & Murray, G. C.2008. Talking the talk vs. walking the walk:salience of information needs in querying vs. browsing. In Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval (pp. 705-706). ACM.
    Billsus, D., & Pazzani, M. J.2000. User modeling for adaptive news access. User modeling and user-adapted interaction,10(2-3),147-180.
    Borgs, C., Chayes, J., Mahdian, M., & Saberi, A.2004. Exploring the community structure of newsgroups. In Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining(pp.783-787). ACM.
    Breyer L A.2002. Markovian page ranking distributions:some theory and simulations[J]. Preprint.
    Burke M, Marlow C, Lento T.2010. Social network activity and social well-being[C]//Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM,1909-1912
    Centola D.2010. The spread of behavior in an online social network experiment[J]. science, 329(5996)：1194-1 197.
    Cetintemel,U.,Franklin,M.J.,﹠Giles,C.L.2000.Self-adaptive user profiles for large-scale data
    delivery.In Data Engineering,2000 Proceedings.16th InternationαlConference on (pp.
    622-633).IEEE.
    Cha M,Haddadi H,Benevenuto E et al.201 0.Measuring User Influence in Twitter：The MillionFollower Fallacy[J].ICWSM,10：10-17.
    Chaiken R,Jenkins B,Larson P A,et al.2008.SCOPE：easy and efficient parallel processing ofmassive data sets[J].Proceedings of the VLDB Endowment,1(2)：1265-1 276.
    Chakrabarti S.2007.Dynamic personalized pagerank in entity-relation graphs[C]//Proceedings ofthe 16th intemational conference on World Wide Web.ACM,571-580.
    Chard K,Caton S,Rana O,et al.2010.Social cloud：Cloud computing in social networks[C]//CloudComputing(CLOUD),2010 IEEE 3rd Intemational Conference on.IEEE,99-106.
    Chen W Wang C,Wang Y.2010.Scalable influence maximization for prevalent viral marketing in1arge-scale social networks[C]//Proceedings of the 16th ACM SIGKDD intemational conferenceon Knowledge discovery and data mining.ACM,1029-1038.
    Chen Y, Huang C,Zhai K.2009.Scalable Community Detection A1gorithm with MapReduce[J].Commun ACM.53：359-366.
    Cheong,M.,﹠Lee,V 2009.Integrating web-based intelligence retrieval and decision-making fromthe twitter trends knowledge base.In Proceeding of the 2nd ACM workshop onSocial web searchand mining(pp.1-8).Hong Kong,China：ACM.doi：10.1145/1651437.1651439
    Cheung C M K.Lee M K O.2010.A theoretical model of intentional social action in online socialnetworks[J].Decision support systems,49(1)：24-30.
    Cheung C M K,Chiu P Y, Lee M K O.2011.Online social networks：Why do students usefacebook?[J].Computers in Human Behavior,27(4)：1337-1343.
    Chung,F.2005.Laplacians and the Cheeger inequality for directed graphs.Annals of Combinatorics,9(1),1-19.
    Cialdini R B,Trost M R.1998.Social influence：Social norms,conformity and compliance[J].
    Claypool,M.,Le,P.,Wased,M.,﹠Brown,D.2001.Implicit interest indicators.In Proceedings ofthe 6th international conference on Intelligentuser interfaces(pp.33-40).ACM.
    Clerkin,P.,Cunningham,P.,﹠Hayes,C.2002.Ontology discovery for the semantic web usinghierarchical clustering.
    CNNIC.2012.CNNIC.[EB/OL].http：//www.cnniC.net.cn/
    Co1eman J S.1989.Social capital in the creation of human capital[M].University of Chicago Press.
    Colomer-de-Simon P’Boguna M.2012.Clustering of random scale-free networks[J].Physical Review E,86(2):026120.
    Cranmer, Skyler J. and Bruce A. Desmarais 2011. "Inferential Network Analysis with Exponential Random Graph Models." Political Analysis,19(1):66-86.
    Culler D, Karp R, Patterson D, et al.1993. LogP:Towards a realistic model of parallel computation[M]. ACM.
    Cvetkovic, D. M., & Rowlinson, P. E. T. E. R.2004. Spectral graph theory.Topics in algebraic graph theory,88-112.
    Dagum L, Menon R.1998.OpenMP:an industry standard API for shared-memory programming[J]. Computational Science & Engineering, IEEE,5(1):46-55.
    Daoud M, Tamine-Lechani L, Boughanem M, et al.2009. A session based personalized search using an ontological user profile[C]//Proceedings of the 2009 ACM symposium on Applied Computing. ACM,1732-1736.
    Dean J, Ghemawat S.2008. MapReduce:simplified data processing on large clusters[J]. Communications of the ACM,51(1):107-113.
    Dholakia U M, Bagozzi R P, Pearo L K.2004. A social influence model of consumer participation in network-and small-group-based virtual communities[J]. International Journal of Research in Marketing,21(3):241-263.
    Diaz, A., Gervas, P.2007. User-model based personalized summarization.Information Processing & Management,43(6),1715-1734.
    Du, N., Wu, B., Pei, X., Wang, B., & Xu, L.2007. Community detection in large-scale social networks. In Proceedings of the 9th WebKDD and 1 st SNA-KDD 2007 workshop on Web mining and social network analysis (pp.16-25). ACM..
    Eiron N, McCurley K S, Tomlin J A.2004, Ranking the web frontier[C]//Proceedings of the 13th international conference on World Wide Web. ACM,309-318.
    Ellison N B, Steinfield C, Lampe C.2007. The benefits of Facebook "friends:" Social capital and college students' use of online social network sites[J]. Journal of Computer-Mediated Communication,12(4):1143-1168.
    Ellison N B, Vitak J, Gray R, et al.2014. Cultivating social resources on social network sites: Facebook relationship maintenance behaviors and their role in social capital processes[J]. Journal of Computer-Mediated Communication.
    Eucalyptus.2010. Eucalyptus. [EB/OL]. https://www.eucalyptus.com/
    Ferragina P, Gulli A.2008. A personalized search engine based on Web-snippet hierarchical clustering[J]. Software:Practice and Experience,38(2):189-225.
    Fogaras D, Racz B.2004. Towards scaling fully personalized pagerank[M]//Algorithms and Models
    for the Web-Graph. Springer Berlin Heidelberg,105-117. Fogaras D, Racz B, Csalogany K, et al.2005. Towards scaling fully personalized pagerank: Algorithms, lower bounds, and experiments[J]. Internet Mathematics,2(3):333-358.
    Foltz, P. W., & Dumais, S. T.1992. Personalized information delivery:An analysis of information filtering methods. Communications of the ACM,35(12),51-60.
    Fortunato S.2010. Community detection in graphs[J]. Physics Reports,486(3):75-174.
    Fortunato S, Barthelemy M.2007. Resolution limit in community detection[J]. Proceedings of the National Academy of Sciences,104(1):36-41. Frakes, W. B., & Baeza-Yates, R.1992. Information retrieval:data structures and algorithms.
    Furnas, G. W., Landauer, T. K., Gomez, L. M., & Dumais, S. T.1984. Statistical semantics:Analysis of the potential performance of keyword information systems. In Human factors in computer systems (pp.187-242). Ablex Publishing Corp.
    Gao H, Li Q, Zheng X.2013. Label Micro-blog Topics Using the Bayesian Inference Method[M]//Intelligence and Security Informatics. Springer Berlin Heidelberg,19-28.
    Gargi U, Lu W, Mirrokni V S, et al.2011. Large-Scale Community Detection on YouTube for Topic Discovery and Exploration[C]//ICWSM.2011.
    Gibson, D., Kleinberg, J., & Raghavan, P.1998. Inferring web communities from link topology. In Proceedings of the ninth ACM conference on Hypertext and hypermedia:links, objects, time and space---structure in hypermedia systems:links, objects, time and space---structure in hypermedia systems (pp.225-234). ACM.
    Girvan M, Newman M E J.2002. Community structure in social and biological networks[J]. Proceedings of the National Academy of Sciences,99(12):7821-7826.
    Godoy, D., & Amandi, A.2006. Modeling user interests by conceptual clustering. Information Systems,31(4),247-265.
    Gropp W, Lusk E, Doss N, et al.1996. A high-performance, portable implementation of the MPI message passing interface standard[J]. Parallel computing,22(6):789-828.
    Guo Z, Li Z, Tu H.2011.Sina microblog:an information-driven online social network[C]//Cyberworlds (CW),2011 International Conference on. IEEE,160-167.
    Hadoop.2010. Hadoop. [EB/OL]. http://hadoop.apache.org/
    Haveliwala T H.2003. Topic-sensitive pagerank:A context-sensitive ranking algorithm for web search[J]. Knowledge and Data Engineering, IEEE Transactions on,15(4):784-796.
    HedstrSm P, Sandell R, Stern C.2000. Mesolevel Networks and the Diffusion of Social Movements: The Case of the Swedish Social Democratic Partyl[J]. American Journal of Sociology,106(1): 145-172.
    Herlocker J L, Konstan J A, Riedl J.2000. Explaining collaborative filtering recommendations[C]//Proceedings of the 2000 ACM conference on Computer supported cooperative work. ACM,241-250.
    Ho M H, Cheng M C, Chang Y S, et al.2001. A GA-based dynamic personalized filtering for Internet search service on multi-search engine[C]//Electrical and Computer Engineering. Canadian Conference on. IEEE,1:271-276.
    Hofmann, T., & Puzicha, J. C.2004. U.S. Patent No.6,687,696. Washington, DC:U.S. Patent and Trademark Office.
    Hong, L., Doumith, A. S., & Davison, B. D.2013. Co-factorization machines:modeling user interests and predicting individual decisions in twitter. In Proceedings of the sixth ACM international conference on Web search and data mining (pp.557-566). ACM.
    Horng J T, Yeh C C.2000. Applying genetic algorithms to query optimization in document retrieval[J]. Information processing & management,36(5):737-759.
    Housman, E. M., & Kaskela, E. D.1970. State of the art in selective dissemination of information. Engineering Writing and Speech, IEEE Transactions on,13(2),78-83.
    Hughes, A. L., & Palen, L.2009. Twitter adoption and use in mass convergence and emergency events. International Journal of Emergency Management,6(3),248-260.
    InfoQ.2010. [EB/OL]. http://www.infoq.com/news/2010/07/facebook-hadoop-summit.
    Isard M, Budiu M, Yu Y, et al.2007. Dryad:distributed data-parallel programs from sequential building blocks[J]. ACM SIGOPS Operating Systems Review,41 (3):59-72.
    Java A, Song X, Finin T, et al.2007. Why we twitter:understanding microblogging usage and communities[C]//Proceedings of the 9th WebKDD and 1st SNA-KDD 2007 workshop on Web mining and social network analysis. ACM,56-65.
    Jeh G, Widom J.2003. Scaling personalized web search[C]//Proceedings of the 12th international conference on World Wide Web. ACM,271-279.
    Jin, E. M., Girvan, M., & Newman, M. E.2001. Structure of growing social networks. Physical review E,64(4),046132.
    Jones C, Volpe E H.2011. Organizational identification:Extending our understanding of social identities through social networks[J]. Journal of Organizational Behavior,32(3):413-434.
    John S.2000. Social network analysis:a handbook[J].
    Kadushin C.2012. Understanding social networks:Theories, concepts, and findings[M]. Oxford University Press.
    Kamvar S, Haveliwala T, Manning C, et al.2003. Exploiting the block structure of the web for computing pagerank[J]. Stanford University Technical Report.
    Kang U, Tsourakakis C E, Faloutsos C.2009. Pegasus:A peta-scale graph mining system implementation and observations[C]//Data Mining,2009.1CDM'09. Ninth IEEE International
    Conference on. IEEE,229-238.
    Kannan, R., Vempala, S., & Vetta, A.2004. On clusterings:Good, bad and spectral. Journal of theACM(JACM),51(3),497-515.
    Karypis, G., & Kumar, V.1998. A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM Journal on scientific Computing,20(1),359-392.
    Kernighan, B.W, & Lin, S..1970. An efficient heuristic procedure for partitioning graphs. The Bell system technical journal,49(1),291-207.
    Kim, C. J., & Nelson, C. R.1999. State-space models with regime switching:classical and Gibbs-sampling approaches with applications. MIT Press Books,1.
    Kim H R, Chan P K.2003. Learning implicit user interest hierarchy for context in personalization[C]//Proceedings of the 8th international conference on Intelligent user interfaces.ACM,101-108.
    Kleinberg J M.1999. Authoritative sources in a hyperlinked environment[J]. Journal of the ACM(JACM),46(5)：604-632.
    Kurant,M.,Gjoka,M.,Wang,Y.,A1mquist,Z.W,Butts,C.T.,﹠Markopoulou,A.2012.Coarse-grained topology estimation via graph sampling. In Proceedings of the 2012 ACM workshop on Workshop on online social networks (pp.25-30). ACM.
    Kwak H, Lee C, Park H, et al.2010. What is Twitter, a social network or a news media?[C]//Proceedings of the 19th international conference on World wide web. ACM,591-600.
    Lam, W., & Mostafa, J.2001. Modeling user interest shift using a bayesian approach. Journal of the American society for Information Science and Technology,52(5),416-429.
    Last.fm.2012. Last.fm. [EB/OL]. http://www.last.fm/
    La'mmel R.2008. Google's MapReduce programming model—Revisited[J]. Science of computer programming,70(1):1-30.
    Lempel R, Moran S.2001. SALSA:the stochastic approach for link-structure analysis[J]. ACM Transactions on Information Systems (TOIS),19(2):131-160.
    Leskovec, J., Lang, K. J. Dasgupta, A., & Mahoney, M. W.2008. Statistical properties of community structure in large social and information networks. In Proceedings of the 17th international conference on World Wide Web (pp.695-704). ACM.
    Lewis K, Gonzalez M, Kaufman J.2012. Social selection and peer influence in an online social network[J]. Proceedings of the National Academy of Sciences,109(1):68-72.
    Liben-Nowell D, Kleinberg J.2007. The link-prediction problem for social networks[J]. Journal of the American society for information science and technology,58(7):1019-1031.
    Lihua, W., Lu, L., Jing, L., & Zongyong, L.2005. Modeling user multiple interests by an improved GCS approach. Expert Systems-with Applications,29(4),757-767.
    Lin K Y, Lu H P.2011. Why people use social networking sites:An empirical study integrating network externalities and motivation theory[J]. Computers in Human Behavior,27(3): 1152-1161.
    Lin N.2002. Social capital:A theory of social structure and action[M]. Cambridge University Press.
    Liu J, Dolan P, Pedersen E R.2010. Personalized news recommendation based on click behavior[C]//Proceedings of the 15th international conference on Intelligent user interfaces. ACM,31-40.
    Litvak N.2004. Monte Carlo methods of PageRank computation[J].
    Lopez-Pujalte C, Guerrero Bote V P, Anegon F M.2002. A test of genetic algorithms in relevance feedback[J]. Information processing & management,38(6):793-805.
    Lu, X., Wang, B., Zha, L., & Xu, Z.2011. Can mpi benefit hadoop and mapreduce applications?.In Parallel Processing Workshops (ICPPW),2011 40th International Conference on (pp.371-379). IEEE.
    Malewicz G, Austern M H, Bik A J C, et al.2010. Pregel:a system for large-scale graph processing[C]//Proceedings of the 2010 ACM SIGMOD International Conference on Management of data. ACM,135-146.
    Marlow C, Naaman M, Boyd D, et al.2006. HT06, tagging paper, taxonomy, Flickr, academic article, to read[C]//Proceedings of the seventeenth conference on Hypertext and hypermedia. ACM,31-40.
    Mazer J P, Murphy R E, Simonds C J.2007. I'll see you on "Facebook":The effects of computer-mediated teacher self-disclosure on student motivation, affective learning, and classroom climate[J]. Communication Education,56(1):1-17.
    Mladenic D.1996. Personal Web Watcher:design and implementation[J].
    Mladenic, D.1999. Text-learning and related intelligent agents:a sarvcy.IEEE Intelligent Systems,14(4),44-54.
    Mislove, A., Marcon, M., Gummadi, K. P., Druschel, P., & Bhattacharjee, B.2007. Measurement and analysis of online social networks. In Proceedings of the 7th ACM SIGCOMM conference on Internet measurement (pp.29-42). ACM.
    Moukas, A. G.1997. Amalthaea:information filtering and discovery using a multiagent evolving system (Doctoral dissertation, Massachusetts Institute of Technology).
    Murray K E, Waller R.2007. Social networking goes abroad[J]. International Educator,16(3): 56-59.
    Najork M A.2007. Comparing the effectiveness of hits and salsa[C]//Proceedings of the sixteenth ACM conference on Conference on information and knowledge management. ACM,157-164.
    Narendra, K.S., Thathachar, M. A. L.1989. Learning Automata:An Introduction. Englewood Cliffs, NI:Prentice Hall.
    Newman M E J.2003. The structure and function of complex networks[J]. SIAM review,45(2): 167-256.
    Newman, M. E.2004. Fast algorithm for detecting community structure in networks. Physical review E,69(6),066133.
    Newman, M., Barabasi, A. L., & Watts, D. J. (Eds.).2006. The structure and dynamics of networks[M]. Princeton University Press.
    Newman, M. E., & Park, J.2003. Why social networks are different from other types of networks. Physical Review E,68(3),036122.
    Nooy. D., Wouter.2012. "Graph Theoretical Approaches to Social Network Analysis." in Computational Complexity:Theory, Techniques, and Applications (Robert A. Meyers, ed.). Springer, pp.2864-2877.
    Olston C, Reed B, Srivastava U, et al.2008. Pig latin:a not-so-foreign language for data processing[C]//Proceedings of the 2008 ACM SIGMOD international conference on Management of data. ACM,1099-1110.
    Openstack.2010. Openstack. [EB/OL]. https://www.openstack.org/
    Page L, Brin S, Motwani R, et al.1999. The PageRank citation ranking:Bringing order to the web[J].
    Palonen T, Hakkarainen K.2013. Patterns of interaction in computersupported learning:A social network analysis[C]//Fourth International Conference of the Learning Sciences.334-339.
    Papadopoulos, S., Kompatsiaris, Y., Vakali, A., & Spyridonos, P.2012. Community detection in social media. Data Mining and Knowledge Discovery,24(3),515-554.
    Pazzani M, Billsus D.1997. Learning and revising user profiles:The identification of interesting web sites[J]. Machine learning,27(3):313-331.
    Pazzani, M. J., Muramatsu, J., & Billsus, D.1996. Syskill & Webert:Identifying interesting web sites. In AAAI/IAAI, Vol.1 (pp.54-61).
    Perez J R, de Pablos P O.2003. Knowledge management and organizational competitiveness:A framework for human capital analysis[J]. Journal of Knowledge Management,7(3):82-91.
    Phan, X. H., & Nguyen, C. T.2006. Jgibblda:A java implementation of latent dirichlet allocation
    (Ida)using gibbs sampling for parameter estimation and inference.2013-03-20]. http://jgibblda.sourceforge. net.
    Phan, X. H., Nguyen, L. M., & Horiguchi, S.2008. Learning to classify short and sparse text & web with hidden topics from large-scale data collections. In Proceedings of the 17th international conference on World Wide Web (pp.91-100). ACM.
    Powell J.2009.33 Million people in the room:How to create, influence, and run a successful business with social networking[M]. Que Publishing.
    Raacke J, Bonds-Raacke J.2008. MySpace and Facebook:Applying the uses and gratifications theory to exploring friend-networking sites[J]. Cyberpsychology & behavior,11(2):169-174.
    Raghavan U N, Albert R, Kumara S.2007. Near linear time algorithm to detect community structures in large-scale networks[J]. Physical Review E,76(3):036106.
    Ranger, C., Raghuraman, R., Penmetsa, A., Bradski, G., & Kozyrakis, C.2007. Evaluating mapreduce for multi-core and multiprocessor systems. In High Performance Computer Architecture,2007. HPCA 2007. IEEE 13th International Symposium on (pp.13-24). IEEE.
    Rashotte L.2007. Social influence. The blackwell encyclopedia of social psychology. VolIX. Maiden:Blackwell Publishing.
    Riketta, M. & Nienber, S.2007. Multiple identities and work motivation:The role of perceived compatibility between nested organizational units. British Journal of Management,18, S61-77.
    Ros A.2005. The isoperimetric problem[J]. Global theory of minimal surfaces,175-209.
    Sanchez-Franco M J, Roldan J L.2010. Expressive aesthetics to ease perceived community support: Exploring personal innovativeness and routinised behaviour as moderators in Tuenti[J]. Computers in Human Behavior,26(6):1445-1457.
    Sanchez-Franco M J, Villarejo-Ramos A F, Martin-Velicia F A.2011. Social integration and post-adoption usage of Social Network Sites An analysis of effects on learning performancefJ]. Procedia-Social and Behavioral Sciences,15:256-262.
    Salton, G., Wong, A., & Yang, C. S.1975. A vector space model for automatic indexing. Communications of the ACM,18(11),613-620.
    Sarl6s T, Benczur A A, Csalogany K, et al.2006. To randomize or not to randomize:space optimal summaries for hyperlink analysis[C]//Proceedings of the 15th international conference on World Wide Web. ACM,297-306.
    Sarma A D, Gollapudi S, Panigrahy R.2011. Estimating pagerank on graph streams[J]. Journal of the ACM (JACM),58(3):13.
    Sarwar B, Karypis G, Konstan J, et al.2001. Item-based collaborative filtering recommendation algorithms[C]//Proceedings of the 10th international conference on World Wide Web. ACM, 285-295.
    Shepherd, M., Watters, C., & Marath, A. T.2002. Adaptive user modeling for filtering electronic news. In System Sciences,2002. HICSS. Proceedings of the 35th Annual Hawaii International Conference on (pp.1180-1188). IEEE.
    Sheth, B., & Maes, P.1993. Evolving agents for personalized information filtering. In Artificial Intelligence for Applications,1993. Proceedings., Ninth Conference on (pp.345-352). IEEE.
    Spielman, D. A., & Teng, S. H.2004. Nearly-linear time algorithms for graph partitioning, graph sparsification, and solving linear systems. In Proceedings of the thirty-sixth annual ACM symposium on Theory of computing (pp.81-90). ACM.
    Sriram, B., Fuhry, D., Demir, E., Ferhatosmanoglu, H., & Demirbas, M.2010. Short text classification in twitter to improve information filtering. In Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval(pp.841-842). ACM.
    Sunder V S.2009. Perron-Frobenius theorem[J].
    Tan, A. H., & Teo, C.1998. Learning user profiles for personalized information dissemination. In Neural Networks Proceedings,1998. IEEE World Congress on Computational Intelligence. The 1998 IEEE International Joint Conference on (Vol. 1,pp.183-188). IEEE.
    Tang J, Sun J, Wang C, et al.2009. Social influence analysis in large-scale networks[C]//Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM,807-816.
    Travers J, Milgram S.1969. An experimental study of the small world problem[J]. Sociometry, 32(4):425-443.
    Tsourakakis C, Appel A P, Faloutsos C, et al.2008. Hadi:Fast diameter estimation and mining in massive graphs with hadoop[M]. Carnegie Mellon University, School of Computer Science, Machine Learning Department.
    Tuomela R.1995. The importance of us:A philosophical study of basic social notions[J].
    van Ham F, Schulz H J, Dimicco J M.2009. Honeycomb:Visual analysis of large scale social networks[M]//Human-Computer Interaction-INTERACT 2009. Springer Berlin Heidelberg, 429-442.
    Viswanath B, Post A, Gummadi K P, et al.2011. An analysis of social network-based sybil defenses[J]. ACM SIGCOMM Computer Communication Review,41(4):363-374.
    Walter F E, Battiston S, Schweitzer F.2008. A model of a trust-based recommendation system on a social network[J]. Autonomous Agents and Multi-Agent Systems,16(1):57-74.
    Warneke D, Kao O.2009. Nephele:efficient parallel data processing in the cloud[C]//Proceedings of the 2nd workshop on many-task computing on grids and supercomputers. ACM,2009:8.
    Wasserman S.1994. Social network analysis:Methods and applications[M]. Cambridge university press.
    Watts D J, Dodds P S.2007. Influential, networks, and public opinion formation[J]. Journal of consumer research,34(4):441-458.
    Webb, G. I., Pazzani, M. J., & Billsus, D.2001. Machine learning for user modeling. User modeling and user-adapted interaction,11(1-2),19-29.
    Wei, X., & Croft, W. B.2006. LDA-based document models for ad-hoc retrieval. In Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval (pp.178-185). ACM.
    Wellman, Barry.2008. "Review:The development of social network analysis:A study in the sociology of science." Contemporary Sociology,37:221-222.
    White, R. W., Bailey, P., & Chen, L.2009. Predicting user interests from contextual information. In Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval (pp.363-370). ACM.
    Widyantoro, D. H., Ioerger, T. R., & Yen, J.2001. Learning user interest dynamics with a three-descriptor representation. Journal of the American Society for Information Science and Technology,52(3),212-225.
    Wiener E, Pedersen J O, Weigend A S.1995. A neural network approach to topic spotting[C]//Proceedings of SDAIR-95,4th annual symposium on document analysis and information retrieval.1995:317-332.
    Williams, D.1991. Probability with martingales. Cambridge university press.
    Yang H, Dasdan A, Hsiao R L, et al.2007. Map-reduce-merge:simplified relational data processing on large clusters[C]//Proceedings of the 2007 ACM SIGMOD international conference on Management of data. ACM,1029-1040.
    Zacks, S., & Barzily, Z.1981. Bayes procedures for detecting a shift in the probability of success in a series of Bernoulli trials. Journal of Statistical Planning and Inference,5(2),107-119.
    Zhang, Y, Li, X., & Wang, T.2013. Identifying Influencers in Online Social Networks:The Role of Tie Strength. International Journal of Intelligent Information Technologies (IJIIT),9(1),1-20. doi:l 0.4018/jiit.2013010101
    Zheng H, Yoshinaga N, Kaji N, et al.2012. A study on microblog classification based on information publicness[C]//DEIM Forum.
    Zhihua C.2013. Modeling Research on Micro-blog Users[C]//Proceedings of the 2nd International Conference on Computer Science and Electronics Engineering ICCSEE.
    Zhong B, Hardin M, Sun T.2011. Less effortful thinking leads to more social networking? The associations between the use of social network sites and personality traits[J]. Computers in Human Behavior,27(3):1265-12

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700