以数据为中心的在线社会网络若干安全问题研究

英文题名：Data-centric Research on Several Security Problems in Online Social Networks
作者：王永刚
论文级别：博士
学科专业名称：计算机软件与理论
中文关键词：在线社会网络 ; 垃圾标签 ; 传播控制 ; 好友亲密度 ; 访问控制
英文关键词：Online Social Network ; Tag Spam ; Difusion Control ; Intimacy Degree ; Access Control
学位年度：2013
导师：陈钟
学科代码：081202
学位授予单位：北京大学
论文提交日期：2013-06-01

摘要

近几年，在线社会网络(Online Social Network，简称OSN)在人们的工作和生活中扮演着越来越重要的角色，成为人们维持社会关系的新纽带。随着在线社会网络的发展，在为用户提供实时、便捷的社交服务的同时，其暴露出来的安全问题也日益突出：数以万计的虚假帐号，铺天盖地的垃圾信息，大肆传播的网络谣言等等，这些安全隐患一方面对在线社会网络所提供的各项服务构成安全威胁，另一方面对用户的合法体验构成负面影响从而降低用户对在线社会网络的粘性，间接造成大量经济损失。
     目前，在线社会网络中的安全问题正成为学术领域的研究热点。本文试图以数据为中心，从数据的生命周期出发，来研究在线社会网络中数据产生、传输和接收三个阶段各自所暴露出来的安全问题。各个阶段的安全问题与本文设计的安全方案描述如下：
     1.在数据的产生阶段，针对恶意用户主观产生的垃圾标签，本文提出了一种基于信誉模型的垃圾标签降级算法DSpam，DSpam根据用户之间是否通过评判标签质量而产生交互的事实来计算相对信誉值，如果未产生评判标签的交互，则采用余弦相似度算法的结果作为两个用户的相对信誉值；如果产生评判标签的交互，则采用反馈信誉与推荐信誉相结合的方法，一方面基于标签评判的累积结果而产生反馈信誉，另一方面充分利用在线社会网络的社交属性，基于好友反馈信誉的加权推荐而产生推荐信誉。在用户搜索标签所对应资源时，DSpam将资源按照标签所对应的用户相对于搜索发起人的信誉值来进行排序，从而保证存在垃圾标签污染的资源尽可能地排在搜索结果列表中的末端。实验表明，在基于降级来抵御垃圾标签的方法中，本文提出的DSpam方法应用所得结果的SpamFactor值要低于现有的基于Boolean的方法、基于Occurrence的方法和基于Coincidence的方法。此外，与现有降级方法无法抵御垃圾标签的定向攻击相比，DSpam在抵御大规模恶意用户发动的定向攻击方面有着较好的表现，并且DSpam方法中基于连续负反馈的信誉锐减机制使得其能够在一定程度上抵御垃圾标签的伪装攻击。
     2.在数据的传输阶段，针对在线社会网络中虚假信息传播问题，本文提出了一种基于PageRank的虚假信息传播控制方法Fidic，该方法在对特定主题的虚假信息进行传播控制时，将社交网络中的用户类比成网页，将用户间信息的传播行为类比成网页间的链接关系，利用PageRank来计算虚假信息传播过程中所涉及的用户的等级值并排序，根据对等级值较高的用户的传播行为进行控制从而缩小虚假信息传播所达到的覆盖面。此外，本文还提出了社交网络中信息传播效果的评估方法，对信息在社交网络中传播所造成的影响进行定量分析。实验表明，与随机控制、基于出度的控制和基于入度的控制等方法相比，Fidic在相同的用户控制比例下能够使虚假信息所传达的范围最小，在虚假信息传播所达的相同范围情况下能够使所控制的用户比例最小。
     3.在数据的接收阶段，针对在线社会网络现有访问控制粒度较粗的问题，本文提出了一种基于好友亲密度的访问控制方法iSac，该方法对好友之间在社交网络中的交互行为数据进行统计，并通过有监督的机器学习来对用户的各类在线社交行为制定权重，进一步依据用户与其所有好友的交互行为数据来得出该用户所有好友的亲密度的排序结果，从而完成基于交互的好友亲疏关系的量化过程。用户可以依据好友亲密度的量化结果来制定个性化的、智能化的访问控制策略。实验表明，本方法的好友亲密度计算结果具有较低的误盖率，进而保证了细粒度的访问控制的有效实施。此外，本文还深入阐释了好友亲密度在隐私保护、内容推荐、Sybil抵御等方面的应用。
During recent years, Online Social Network has been playing a role of more andmore importance in people’s work and life. OSN has been a new tie for people tokeep social connections. With the development of OSN, while it provides real-timeand convenient social services, the security problems of OSN are becoming seriousday by day: thousands of Sybil accounts, large quantities of spam, wide spread ofonline rumors. On one side, these problems have caused threats to the servicesthat ONS provides; on the other side, they have negative impact on legitimateusers’ experiences, which reduce the viscosity of users with respect to OSN andfurthermore cause heavy economic losses.
     Currently, the security problems in OSN have become research focus of theacademic feld. Based on the lifetime of data, this paper tries to be data-centricand studies the security issues within the generation, difusion and reception stagesof data in OSN. The security issues and corresponding solutions within each stageare described as follows:
     1. In the generation stage of data, with respect to the tag spam generated bymalicious users, this paper proposes DSpam, a tag spam demotion algorith-m which is based on the reputation model. DSpam calculates the relative reputation between two users based on their interaction of judging the qual-ity of each other’s tags. If there is no such interaction, then cosine-basedsimilarity degree is calculated and used as the relative reputation; if there issuch interaction, DSpam adopts both feedback reputation and recommenda-tion reputation. The feedback reputation is based on the accumulative resultsof tag quality judgment. Considering the social properties of OSN, the recom-mendation reputation is based on the friends’ feedback reputation and theirrecommendation weights. When a client searches the resources with respectto a tag, DSpam ranks the search results by the relative reputations of theannotators of the corresponding tags with respect to the client. Therefore, aresource with tag spam will be ranked in the end of the result list. The exper-iment proves that DSpam can obtain lower SpamFactor values compared toexisting demotion algorithms such as Boolean-based, Occurrence-based, andCoincidence-based. Besides, considering that existing demotion algorithmscannot defend against collusive attacks of tag spam, DSpam performs well indefending against collusive attacks launched by large quantities of malicioususers. The rapid decrement of reputation based on consecutive negative feed-back makes DSpam can defend against trick attacks of tag spam to a certainextent.
     2. In the difusion stage of data, with respect to the difusion of fake informationin OSN, this paper proposes a fake information difusion control method Fidic,which is based on PageRank. When controlling the difusion of information ofa certain theme, Fidic regards users in OSN as webpages, regards the difusionbehavior of users with respect to the information as hyperlinks between web-pages, and adopts PageRank to calculate the corresponding users’ rankings oftheir importance within the difusion. Users of higher rankings are controlledearlier so as to obtain a smaller coverage to which the fake information canreach. Besides, this paper also proposes the evaluation method of the efect ofinformation difusion in OSN so as to do corresponding quantitative analysis.The experiment proves that, compared to random-based, outdegree-based and indegree-based, Fidic can obtain the smallest coverage of fake information dif-fusion when the percentage of users controlled is set. Fidic also can obtain thesmallest percentage of users controlled when the coverage of fake informationdifusion is set.
     3. In the reception stage of data, with respect to current coarse-grained accesscontrol mechanisms in OSN, this paper proposes iSac, a social access controlmethod based on intimacy degrees of friends. iSac does statistics to the on-line social behaviors between users, calculates the weights of all types of socialatomic behaviors by supervised machine learning, and then gives the rankingresult of intimacy degrees of all the friends of a client based on all the socialbehavior data between the client and his friends in OSN. Users in OSN canmake personalized and intelligent access control policies based on the quanti-tative result of friend intimacy degrees. The experiment proves that, iSac haslower miss-covering-rate in the calculation of friend intimacy degrees, whichguarantees the efective implementation of fne-grained social access control.Besides, this paper also discusses the application of friend intimacy degree inprivacy protection, content recommendation and Sybil defense in OSN.

引文

[1] Flickr. http://www.flickr.com/[Z]
    [2] Del.icio.us. http://del.icio.us/[Z]
    [3]人人网. http://www.renren.com/[Z]
    [4]开心网. http://www.kaixin001.com/[Z]
    [5]新浪微博. http://www.weibo.com/[Z]
    [6] Zhichen Xu, Yun Fu, Jianchang Mao, Difu Su. Towards the Semantic Web:Collaborative Tag Suggestions[C].2006
    [7] YouTube. http://www.youtube.com/[Z]
    [8] MyWeb. http://myweb.yahoo.com/[Z]
    [9]何克勤.基于标签的推荐系统模型及算法研究[D]. Master’s thesis,2011
    [10] Beate Krause, Christoph Schmitz, Andreas Hotho, Gerd Stumme. The Anti-social Tagger: Detecting Spam in Social Bookmarking systems[C]. AIRWeb.2008,61–68
    [11] Chanju Kim, Kyu-Baek Hwang. Naive Bayes Classifer Learning with FeatureSelection for Spam Detection in Social Bookmarking[C]. Proc of ECML PKDDDiscovery Challenge Workshop.2008,32–37
    [12] Amgad Madkour, Tarek Hefni, Ahmed Hefny, Khaled S. Refaat. Using Seman-tic Features to Detect Spamming in Social Bookmarking Systems[C]. Proc ofECML PKDD Discovery Challenge Workshop.2008,55–62
    [13] Pat Langley, Wayne Iba, Kevin Thompson. An Analysis of Bayesian Classi-fers[C]. Proc. of the10th National Conference on Artifcial Intelligence. MITPress,1992,223–228
    [14] Thomas M. Cover, Joy A. Thomas. Elements of Information Theory (2. ed.)[M].Wiley,2006
    [15] Glenn S. Iwerks, Hanan Samet, Kenneth P. Smith. Continuous K-NearestNeighbor Queries for Continuously Moving Points with Updates[C]. VLDB.2003,512–523
    [16] Rasmussen Carl Edward, Williams Christopher K. I. Gaussian Processes forMachine Learning (Adaptive Computation and Machine Learning)[M]. TheMIT Press,2005
    [17] Yoav Freund, Robert E. Schapire. Experiments with a New Boosting Algorith-m[C]. ICML.1996,148–156
    [18] Simon Haykin. Neural Networks: A Comprehensive Foundation[M],2nd. Up-per Saddle River, NJ, USA: Prentice Hall PTR,1998
    [19] Jean-Francois Chevalier, Pierre Gramme. RANK for Spam Detection ECML-Discovery Challenge[C]. Proc of ECML PKDD Discovery Challenge Workshop.2008,21–31
    [20] Nicolas Neubauer, Klaus Obermayer. Predicting Tag Spam Examining Cooc-currences, Network Structures and URL Components[C]. Proc of ECML PKDDDiscovery Challenge Workshop.2008,63–74
    [21] Antonia Kyriakopoulou, Theodore Kalamboukis. Combining Clustering withClassifcation for Spam Detection in Social Bookmarking Systems[C]. Proc ofECML PKDD Discovery Challenge Workshop.2008,47–54
    [22] Antonia Kyriakopoulou, Theodore Kalamboukis. Using Clustering to EnhanceText Classifcation[C]. SIGIR.2007,805–806
    [23] George Karypis. CLUTO-a Clustering Toolkit[R]. Tech. rep., University ofMinnesota,2002
    [24] Gerard Salton, Michael McGill. Introduction to Modern Information Re-trieval[M]. McGraw-Hill Book Company,1984
    [25] Anestis Gkanogiannis, Theodore Kalamboukis. An Algorithm for Text Cate-gorization[C]. SIGIR.2008,869–870
    [26] Gerard Salton,(Editor) The SMART Retrieval System-Experiments in Auto-matic Document Processing[M]. Englewood, Clifs, New Jersey: Prentice Hall,1971
    [27] Anestis Gkanogiannis, Theodore Kalamboukis. A novel supervised learningalgorithm and its use for Spam Detection in Social Bookmarking Systems[C].Proc of ECML PKDD Discovery Challenge Workshop.2008,13–20
    [28] Toine Bogers, Antal van den Bosch. Using Language Models for Spam Detectionin Social Bookmarking Systems[C]. Proc of ECML PKDD Discovery ChallengeWorkshop.2008,1–12
    [29] Andreas Hotho, Robert Jaschke, Christoph Schmitz, Gerd Stumme. BibSon-omy: A Social Bookmark and Publication Sharing System[C]. Proceedings ofthe Conceptual Structures Tool Interoperability Workshop at the14th Interna-tional Conference on Conceptual Structures. Aalborg University Press,2006,87–102
    [30] ChengXiang Zhai, John D. Laferty. A Study of Smoothing Methods for Lan-guage Models Applied to Information Retrieval[J]. ACM Trans Inf Syst.2004,22(2):179–214
    [31] ChengXiang Zhai, John D. Laferty. A Study of Smoothing Methods for Lan-guage Models Applied to Ad Hoc Information Retrieval[C]. SIGIR.2001,334–342
    [32] Ralf Krestel, Ling Chen. Using Co-occurrence of Tags and Resources to IdentifySpammers[C]. Proc of ECML PKDD Discovery Challenge Workshop.2008
    [33] James Surowiecki,(Editor) The Wisdom of Crowds[M]. New York: Anchor,2005
    [34] Zolta′n Gyo¨ngyi, Hector Garcia-Molina, Jan Pedersen. Combating Web Spamwith TrustRank[C]. VLDB.2004,576–587
    [35] Lawrence Page, Sergey Brin, Rajeev Motwani, Terry Winograd. The PageRankCitation Ranking: Bringing Order to the Web[Z],1999
    [36] Slideshare. http://slideshare.net/[Z]
    [37] Rawsugar. http://rawsugar.com/[Z]
    [38] Georgia Koutrika, Frans Adjie Efendi, Zolta′n Gyo¨ngyi, Paul Heymann, HectorGarcia-Molina. Combating Spam in Tagging Systems[C]. AIRWeb.2007
    [39] Ennan Zhai, Huiping Sun, Sihan Qing, Zhong Chen. SpamClean: TowardsSpam-Free Tagging Systems[C]. CSE (4).2009,429–435
    [40] Luis von Ahn, Manuel Blum, Nicholas Hopper, John Langford. CAPTCHA:Using Hard AI Problems for Security[C]. EUROCRYPT.2003,294–311
    [41] Haifeng Yu, Michael Kaminsky, Phillip Gibbons, Abraham Flaxman. Sybil-Guard: Defending against Sybil Attacks via Social Networks[C]. SIGCOMM.2006,267–278
    [42] Haifeng Yu, Phillip Gibbons, Michael Kaminsky, Feng Xiao. SybilLimit: ANear-Optimal Social Network Defense against Sybil Attacks[C]. IEEE Sympo-sium on Security and Privacy.2008,3–17
    [43] Alan Mislove, Ansley Post, Peter Druschel, Krishna Gummadi. Ostra: Lever-aging Trust to Thwart Unwanted Communication[C]. NSDI.2008,15–30
    [44] Dinh Nguyen Tran, Bonan Min, Jinyang Li, Lakshminarayanan Subramanian.Sybil-Resilient Online Content Voting[C]. NSDI.2009,15–28
    [45] Paul-Alexandru Chirita, Wolfgang Nejdl, Cristian Zamfr. Preventing shillingattacks in online recommender systems[C]. WIDM.2005,67–74
    [46] Bamshad Mobasher, Robin Burke, Runa Bhaumik, Chad Williams. TowardTrustworthy Recommender Systems: An Analysis of Attack Models and Algo-rithm Robustness[J]. ACM Trans Internet Techn.2007,7(4)
    [47] Haifeng Yu, Chenwei Shi, Michael Kaminsky, Phillip Gibbons, Feng Xiao. DSy-bil: Optimal Sybil-Resistance for Recommendation Systems[C]. IEEE Sympo-sium on Security and Privacy.2009,283–298
    [48] Michael O’Mahony, Neil Hurley, Nicholas Kushmerick, Guenole Silvestre. Col-laborative Recommendation: A Robustness Analysis[J]. ACM Trans InternetTechn.2004,4(4):344–377
    [49] Sepandar Kamvar, Mario Schlosser, Hector Garcia-Molina. The EigentrustAlgorithm for Reputation Management in P2P Networks[C]. WWW.2003
    [50] Sofus A. Macskassy, Matthew Michelson. Why do People Retweet? Anti-Homophily Wins the Day![C]. ICWSM.2011
    [51] David Kempe, Jon Kleinberg,E′va Tardos. Maximizing the Spread of Influencethrough a Social Network[C]. KDD.2003,137–146
    [52] Granovetter Mark. Threshold Models of Collective Behavior[J]. AmericanJournal of Sociology.1978,83(6):1420–1443
    [53] Goldenberg Jacob, Libai Barak, Muller Eitan. Talk of the Network: A Com-plex Systems Look at the Underlying Process of Word-of-Mouth[J]. MarketingLetters.2001,12(3):211–223
    [54]张彦超,刘云,张海峰,程辉,熊菲,段海新.基于在线社交网络的信息传播模型[J].物理学报.2011,60(5):050501
    [55] Jianguo Li, Yong Tang, Chengjie Mao, Hanjiang Lai, Jun Zhu. Role BasedAccess Control for social network sites[C]. Pervasive Computing (JCPC),2009Joint Conferences on.2009,389–394
    [56] Barbara Carminati, Elena Ferrari, Andrea Perego. Rule-Based Access Controlfor Social Networks[C]. Robert Meersman, Zahir Tari, Pilar Herrero,(Editors)On the Move to Meaningful Internet Systems2006: OTM2006Workshops.Springer Berlin/Heidelberg,2006, vol.4278of Lecture Notes in ComputerScience,1734–1744
    [57] Daniel Weitzner, Jim Hendler, Tim Berners-Lee, Dan Connolly. Creating aPolicy-aware Web: Discretionary, Rule-based Access for the World Wide We-b[J].2006:1–31
    [58] Barbara Carminati, Elena Ferrari, Andrea Perego. The REL-X vocabulary.[Z].URL http://www.dicom.uninsubria.it/andrea.perego/vocs/relx.owl
    [59] Tim Berners-Lee. Notation3logic: An RDF language for the Semantic Web.[C]
    [60] Talel Abdessalem, Imen Ben Dhia. A Reachability-based Access Control Modelfor Online Social Networks[C]. Databases and Social Networks. DBSocial’11,New York, NY, USA: ACM,2011,31–36. URL http://doi.acm.org/10.1145/1996413.1996419
    [61] Liang-Jie Zhang, Jia Zhang, Hong Cai. Services Computing[M]. Springer andTsinghua University Press,2007
    [62] Paul Heymann, Georgia Koutrika, Hector Garcia-Molina. Fighting Spam onSocial Web Sites: A Survey of Approaches and Future Challenges[J]. IEEEInternet Computing.2007,11(6):36–45
    [63]王尚广,孙其博,杨放春. Web服务选择中信誉度评估方法[J].软件学报.2012,23(6):1350–1367
    [64]蔡飞志.在线社会网络数据分析平台的设计与实现[Z],2012
    [65]周立柱,林玲.聚焦爬虫技术研究综述[J].计算机应用.2005,25(9):1965–1969
    [66] Graphviz. http://zh.wikipedia.org/zh/Graphviz[Z]
    [67] Yonggang Wang, Ennan Zhai, Cui Cao, Yongqiang Xie, Zhaojun Wang, Jianbin Hu, Zhong Chen. DSpam: Defending Against Spam in Tagging Systemsvia Users’ Reliability[C]. ICPADS.2010,139–146
    [68]张洪,段海新,刘武. RRM:一种具有激励机制的信誉模型[J].中国科学(E辑:信息科学).2008,38(10):1747–1759
    [69] Sergey Brin, Lawrence Page. The Anatomy of a Large-scale Hypertextual WebSearch Engine[C]. Computer Networks and ISDN Systems.1998
    [70] Reid Andersen, Christian Borgs, Jennifer Chayes, Uriel Feige, Abraham Flax-man, Adam Kalai, Vahab Mirrokni, Moshe Tennenholtz. Trust-based Recom-mendation Systems: An Axiomatic Approach[C]. WWW.2008
    [71] R. Guha, Ravi Kumar, Prabhakar Raghavan, Andrew Tomkins. Propagationof Trust and Distrust[C]. WWW.2004
    [72] Paolo Massa, Paolo Avesani. Controversial Users Demand Local Trust Metrics:An Experimental Study on epinions.com Community[C]. AAAI.2005
    [73] Jon Kleinberg. The Small-world Phenomenon: An Algorithm Perspective[C].STOC.2000
    [74] Alan Mislove, Massimiliano Marcon, Krishna Gummadi, Peter Druschel, Bob-by Bhattacharjee. Measurement and Analysis of Online Social Networks[C].Internet Measurement Comference.2007,29–42
    [75]王永刚,蔡飞志, Eng Keong Lua,胡建斌,陈钟.一种社交网络虚假信息传播控制方法[J].计算机研究与发展.2012,49(s2):131–137
    [76]陈小飞,王轶彤,冯小军.一种基于网页质量的PageRank算法改进[J].计算机研究与发展.2009,46(z2):381–387
    [77] Yonggang Wang, Ennan Zhai, Eng Keong Lua, Jian bin Hu, Zhong Chen. iSac:Intimacy Based Access Control for Social Network Sites[C]. UIC/ATC.2012,517–524
    [78] Defnition of Intimacy.[Z]. URL http://en.wikipedia.org/wiki/Intimate_relationship/
    [79] Thomas Bayes. An Essay Toward Solving a Problem in the Doctrine ofChances[C]. Philos. Trans. R. Soc. London.1763, vol.53,370–418
    [80] John Douceur. The Sybil Attack[C]. IPTPS.2002
    [81] Nguyen Tran, Bonan Min, Jinyang Li, Lakshminarayanan Subramanian. Sybil-Resilient Online Content Voting[C]. NSDI.2009,15–28
    [82] Bimal Viswanath, Alan Mislove, Meeyoung Cha, Krishna Gummadi. On theEvolution of User Interaction in Facebook[C]. WOSN.2009,37–42
    [83] Christo Wilson, Bryce Boe, Alessandra Sala, Krishna Puttaswamy, Ben Zhao.User Interactions in Social Networks and their Implications[C]. EuroSys.2009,205–218
    [84] Yuting Liu, Bin Gao, Tie-Yan Liu, Ying Zhang, Zhiming Ma, Shuyuan He,Hang Li. BrowseRank: Letting Web Users Vote for Page Importance[C]. SIGIR.2008,451–458
    [85] Shuang Hao, Nadeem Ahmed Syed, Nick Feamster, Alexander Gray, SvenKrasser. Detecting Spammers with SNARE: Spatio-temporal Network-levelAutomatic Reputation Engine[C]. USENIX Security Symposium.2009,101–118
    [86] Wen tau Yih, Joshua Goodman, Vitor Carvalho. Finding Advertising Keywordson Web Pages[C]. WWW.2006,213–222
    [87] Paolo Boldi, Francesco Bonchi, Carlos Castillo, Sebastiano Vigna. Voting inSocial Networks[C]. CIKM.2009,777–786

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700