WEB环境下的社会网络挖掘研究

英文题名：Mining Web Social Networks
作者：林琛
论文级别：博士
学科专业名称：计算机软件与理论
中文关键词：社会网络分析 ; 文本挖掘 ; 专家检索 ; 垃圾内容识别 ; 线程讨论
英文关键词：Social Network Analysis ; Text Mining ; Expert Search ; Junk Identification ; Threaded Discussions
学位年度：2009
导师：汪卫
学科代码：081202
学位授予单位：复旦大学
论文提交日期：2009-10-01

摘要

社会网络研究是理解社会现象,预测人类行为,分析社会结构的重要工具。进入Web 2.0时代以来,庞大的Web用户群体、频繁的Web用户互动和海量的Web内容构成了巨大的Web社会网络,使Web环境下的社会网络挖掘成为信息技术领域的新热点问题。在Web环境下进行社会网络挖掘对于理解Web用户的行为模式,改进各种Web应用如推荐、信息检索、网络舆情监测等系统的效果,从而带来更好的用户体验,提高社会生产效率具有重要的作用。
     Web环境下的社会网络挖掘需要面临以下几个主要的问题。首先,Web中的社会网络是隐含的、模糊的;其次,Web数据中包含着用户创造的海量内容,具有丰富的语义;第三,Web数据中有大量垃圾内容和垃圾链接;第四,Web数据的高度异构和类型繁杂使得Web上的社会网络不能用单一类型的节点和单一类型的关系来描述.研究Web环境下的社会网络挖掘需要重点解决以上这些问题。
     本文主要研究目标是Web上的文本数据,针对Web隐含的模糊的社会网络问题,Web社会网络的丰富语义问题,Web垃圾内容问题,以及多关系和多节点类型的多模社会网络问题,通过对用户行为的分析,采用基于矩阵的、基于生成模型的和基于马尔可夫链的Web社会网络建模方法,以达到抽取隐含社会网络、理解社会网络语义、识别垃圾内容、评测数据质量和挖掘多模社会网络的目标,并实现专家检索等Web应用。
     本文的研究对象包括Web论坛和企业、学术领域的数据。采用线程讨论的Web论坛是Web上宝贵的海量知识库,企业、学术领域数据包含大量专业知识,他们是进行数据挖掘和知识发现的重要对象。Web论坛中具有大量的垃圾内容。企业、学术领域数据中具有多种类型的实体和关系。针对这两个数据源,本文的研究工作和创新内容包括:
     用户行为分析在网络论坛中,用户发帖参与讨论,由此和其他用户进行密切的互动。为了更好的理解网络论坛中用户的社交行为和发文行为,本文通过大量统计分析,发现论坛用户的发帖数量和质量差异很大,揭示论坛社会网络的回复关系、好友关系和相识关系对于论坛用户的兴趣传播和专家知识传播具有明显作用。
     基于稀疏编码的论坛数据建模线程讨论具有结构和语义同步变化,相互影响的特性。针对现有的研究工作普遍对语义和结构分开建模的问题,提出基于矩阵的SMSS模型,同步的对线程讨论的结构和语义建模。同时,针对线程讨论中语义和结构的稀疏性,即每个帖子只覆盖少数几个主题、以及每个帖子只回复讨论线程中的少数几个帖子等特性,提出引入L_1正则项在模型中对结构和语义进行约束。该模型能够抽取出较为精确的社会网络、能够较好的解决Web社会网络的丰富语义和数据质量问题,在垃圾内容识别和专家检索等应用中取得了较好的结果。
     基于生成模型的论坛数据建模方法针对SMSS模型对于垃圾内容识别和专家检索的解决方案较为直接简单的问题,本文同时提出基于生成模型的论坛数据建模方法。在PLSA的优化目标中加入反映帖子结构关系的正则项,以刻画线程讨论的结构和语义同步变化互相影响的特性:针对LDA模型不能准确刻画垃圾主题的问题,提出引入垃圾主题,以区别于有意义的主题;针对论坛作者发帖质量不同的问题,引入作者的发帖模式约束帖子的生成过程;针对现有专家检索模型对未观测到词的概率估计不准确问题,引入在上述模型中学习到的主题,扩展专家生成查询的过程;针对发帖数量很多但质量很低的噪声作者问题,在专家检索排序中引入作者的发帖模式信息;上述模型成功应用在语义解读、垃圾内容识别和专家检索中。
     基于马尔科夫链的多模社会网络建模方法企业、学术领域中存在多种类型的实体,如作者、论文、个人主页等,以及多种类型的关系如引用关系、合作关系等。为了能够更好的利用类型信息,调整类型的影响强弱,本文针对多模网络上的专家检索问题,提出在Web数据中抽取多模网络的框架;通过在文本中根据给定查询自动生成转移概率矩阵,基于马尔可夫链对专家进行排序;针对在多模网络上的马尔可夫过程计算到达专家节点的概率问题,提出在多模网络上的马尔可夫随机游走过程,并证明该过程是遍历不可约的;针对在如Enterprise和学术领域的应用场景中专家检索的实际需求,提出在社团中的专家检索问题,并提供解决方案。上述模型在专家检索和社团中的专家检索等应用中取得了较好的结果。
Social Network Analysis has been widely recognized as an important tool for understanding human behavior and analyzing social structure. As we are in the age of Web 2.0, more and more users join Web communities. With numerous users' content contribution and frequent communications and collaborations among them, Web has become a huge social network with volumes of social content. As a result, recent years have witnessed an emerging research trend on mining Web social networks. Research efforts on mining Web social networks have been proved to be helpful in capturing Web user behavior patterns, enhancing performances of Web applications (such as recommender systems, information retrieval systems, and public sentiment systems), bringing better user experience, and increasing working efficiency.
     However, mining Web social networks is challenging, due to the difference between virtual interactions among Web users and actual interactions among people, and the difference between Web content and traditional content. In general, the following reasons prevent researchers from fully exploiting Web social networks. First of all, Web social networks are implicit while traditional social networks are explicit. Secondly, the wide availability of social contents created by Web users offer abundant semantics. Thirdly, with various kinds of social interactions and heterogenous social actors, Web social networks are multi-mode networks. Finally, since users are encouraged to contribute contents, there are a mass of junk contents and junk links, along with diverse content qualities.
     Towards those challenges, this dissertation focus on mining Web text data to fulfill the goals of extracting implicit social networks, revealing semantics, identifying junk contents, measuring content quality and mining multi-mode social networks. Several techniques based on matrix, generative model, and Markov chain are proposed, and implemented on Web applications including expert search, junk identification and text clustering etc.
     The first part of this dissertation pays attention to mining social networks in Web forums.
     Threaded discussions are popular choices for Web users to exchange information, hence they have been employed in a wide range of Web applications, including Web forums, instant messages, chat rooms and Web logs(blogs) etc. Hence threaded discussions are valuable data sources for knowledge mining. This research addresses three aspects of mining large-scale social networks in Web forums.
     - User behavior analysis In Web forums, users interchange ideas and opinions with each other by posting comments and discussions. By analyzing the diverse posting behavior and social behavior of forum users, this contribution reveals that reply, knows and friend relations significantly affect interest and expertise diffusion in Web forums.
     - Modeling forum data based on matrix Semantics and structures are couple with each other in threaded discussions: replies indicate sharing of topics and vice versa. To model this property, a matrix based SMSS model is proposed to simultaneously model semantics and structures of threaded discussions. The model imposes two sparse constraints to force a sparse post reconstruction in the topic space and a sparse post approximation from previous posts. SMSS model is successfully employed in three applications including social network extraction, junk identification and expert search.
     - Modeling forum data based on generative models Inspired by the intuition of SMSS model, generative models are presented to model the semantics and structure of threaded discussions. In particular, a PLSA-style model is presented with a regularizer to extract the reply relationships; a LDA-style models are presented to distinguish junk topics and meaningful topics; user posting patterns are learned to leverage the quantity and the quality of related posts in ranking experts.
     The second part of this dissertation focus on mining multi-mode social networks. Towards the problem of mining experts in multi-mode networks, an ergodic markov chain model for multi-mode network is presented to discover experts. Mining experts in communities is studied to satisfy the personal information need in enterprise and academic environment.

引文

[1]Lada A.Adamic,Jun Zhang,Eytan Bakshy,and Mark S.Ackerman.Knowledge sharing and yahoo answers:everyone knows something.In WWW '08:Proceeding of the 17th international conference on World Wide Web,pages 665-674,New York,NY,USA,2008.ACM.
    [2]Eugene Agichtein,Carlos Castillo,Debora Donato,Aristides Gionis,and Gilad Mishne.Finding high-quality content in social media.In WSDM '08:Proceedings of the international conference on Web search and web data mining,pages 183-194,New York,NY,USA,2008.ACM.
    [3]Boanerges Aleman-Meza,Meenakshi Nagarajan,Cartic Ramakrishnan,Li Ding,Pranam Kolari,Amit P.Sheth,I.Budak Arpinar,Anupam Joshi,and Tim Finin.Semantic analytics on social networks:experiences in addressing the problem of conflict of interest detection.In WWW '06:Proceedings of the 15th international conference on World Wide Web,pages 407-416,New York,NY,USA,2006.ACM.
    [4]Ricardo Baeza-Yates.User generated content:how good is it? In WICOW '09:Proceedings of the 3rd workshop on Information credibility on the web,pages 1-2,New York,NY,USA,2009.ACM.
    [5]P.Bailey,N.Craswell,AP De Vries,and I.Soboroff.Overview of the trec 2007 enterprise track(draft).In Proc.of TREC,2007.
    [6]Krisztian Balog.People search in the enterprise.In SIGIR '07:Proceedings of the 30th annual international A CM SIGIR conference on Research and development in information retrieval,pages 916-916,New York,NY,USA,2007.ACM.
    [7]Krisztian Balog.People search in the enterprise,volume 42,pages 103-103,New York,NY,USA,2008.ACM.
    [8]Krisztian Balog,Leif Azzopardi,and Maarten de Rijke.Formal models for expert finding in enterprise corpora.In SIGIR '06:Proceedings of the 29th annual international A CM SIGIR conference on Research and development in information retrieval,pages 43-50,New York,NY,USA,2006.ACM.
    [9]Krisztian Balog,Toine Bogers,Leif Azzopardi,Maarten de Rijke,and Antal van den Bosch.Broad expertise retrieval in sparse data environments.In SIGIR '07:Proceedings of the 30th annual international A CM SIGIR conference on Research and development in information retrieval,pages 551-558,New York,NY,USA,2007.ACM.
    [10]Krisztian Balog and Maarten de Rijke.Finding experts and their eetails in e-mail corpora.In WWW '06:Proceedings of the 15th international conference on World Wide Web,pages 1035-1036,New York,NY,USA,2006.ACM.
    [11]Krisztian Balog and Maarten de Rijke.Finding similar experts.In SIGIR '07:Proceedings of the 30th annual international A CM SIGIR conference on Research and development in information retrieval,pages 821-822,New York,NY,USA,2007.ACM.
    [12]Krisztian Balog and Maarten de Rijke.Non-local evidence for expert finding.In CIKM '08:Proceeding of the 17th ACM conference on Information and knowledge management,pages 489-498,New York,NY,USA,2008.ACM.
    [13]Krisztian Balog,Maarten de Rijke,and Wouter Weerkamp.Bloggers as experts:feed distillation using expert retrieval models.In SIGIR '08:Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval,pages 753-754,New York,NY,USA,2008.ACM.
    [14]Y.Benajiba and P.Rosso.Arabic named entity recognition using conditional random fields.In Proc.of Workshop on HLT(?) NLP within the Arabic World,LREC,volume 8,2008.
    [15]Krishna Bharat and Monika R.Henzinger.Improved algorithms for topic distillation in a hyperlinked environment.In SIGIR '98:Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval,pages 104-111,New York,NY,USA,1998.ACM.
    [16]Jiang Bian,Yandong Liu,Ding Zhou,Eugene Agichtein,and Hongyuan Zha.Learning to recognize reliable users and content in social media with coupled mutual reinforcement.In WWW '09:Proceedings of the 18th international conference on World wide web,pages 51-60,New York,NY,USA,2009.ACM.
    [17]Istvan Bfro,David Siklosi,Jacint Szabo,and Andras A.Benczur.Linked latent dirichlet allocation in web spam filtering.In AIRWeb '09:Proceedings of the 5th International Workshop on Adversarial Information Retrieval on the Web,pages 37-40,New York,NY,USA,2009.ACM.
    [18]David M.Blei and John D.Lafferty.Dynamic topic models.In ICML '06:Proceedings of the 23rd international conference on Machine learning,pages 113-120,New York,NY,USA,2006.ACM.
    [19]D.M.Blei and J.D.Lafferty.A correlated topic model of science.Annals of Applied Statis tics,1(1):17-35,2007.
    [20]D.M.Blei,A.Y.Ng,and M.I.Jordan.Latent dirichlet allocation.Journal of Machine Learning Research,3(6):993-1022,2003.
    [21]L.Bolelli,S.Ertekin,D.Zhou,and C.L.Giles.A clustering method for web data with multi-type interrelated components.In Proceedings of the 16th international conference on World Wide Web,pages 1121-1122.ACM New York,NY,USA,2007.
    [22]Levent Boleli,Seyda Ertekin,Ding Zhou,and C.Lee Giles.Finding topic trends in digital libraries.In JCDL '09:Proceedings of the 9th A CM/IEEECS joint conference on Digital libraries,pages 69-72,New York,NY,USA,2009.ACM.
    [23]S.Brin and L.Page.The anatomy of a large-scale hypertextual web search engine.Computer networks and ISDN systems,30(1-7):107-117,1998.
    [24]Christopher S.Campbell,Paul P.Maglio,Alex Cozzi,and Byron Dom.Expertise idcntification using cmail communications.In CIKM '03:Proceedings of the twelfth international conference on Information and knowleclge management,pages 528-531,New York,NY,USA,2003.ACM.
    [25]Xavier Carreras and Lluis Marquez.Boosting trees for anti-spam email filtering.CoRR,cs.CL/0109015,2001.
    [26]C.Chemudugunta,P.Smyth,and M.Steyvers.Modeling general and specific aspects of documents with a probabilistic topic model.In Advances in Neural Information Processing Systems 19:Proceedings of the 2006 Conference,pages 241-248,Vancouver,B.C.,Canada,2007.Neural Information Processing Systems(NIPS) Foundation,MIT Press.
    [27]Ikkyu Choi and Minkoo Kim.Topic distillation using hierarchy concept tree.In SIGIR '03:Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval,pages 371-372,New York,NY,USA,2003.ACM.
    [28]F.R.K.Chung.Spectral graph theory.Amer Mathematical Society,1997.
    [29]A.Clauset,MEJ Newman,and C.Moore.Finding community structure in very large networks.Physical Review E,70(6):66111,2004.
    [30]D.Cohn and T.Hofmann.The missing link-a probabilistic model of document content and hypertext connectivity.Advances in neural information processin9 systems,pages 430-436,2001.
    [31]Gao Cong,Long Wang,Chin-Yew Lin,Young-In Song,and Yueheng Sun.Finding question-answer pairs from online forums.In SIGIR '08:Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval,pages 467-474,New York,NY,USA,2008.ACM.
    [32]N.Craswell,A.de Vries,and I.Soboroff.Overview of the trec-2005 enterprise track.In TREC 2005 Conference Notebook,pages 199-205,2005.
    [33]R.D'Amore.Expertise community detection.In Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval,pages 498-499.ACM New York,NY,USA,2004.
    [34]Laura Dietz,Steffen Bickel,and Tobias Scheffer.Unsupervised prediction of citation influences.In ICML '07:Proceedings of the 24th international conference on Machine learning,pages 233-240,New York,NY,USA,2007.ACM.
    [35]Chris Ding,Ding Zhou,Xiaofeng He,and Hongyuan Zha.R1-pca:rotational invariant 11-norm principal component analysis for robust subspace factorization.In ICML '06:Proceedings of the 23rd international conference on Machine learning,pages 281-288,New York,NY,USA,2006.ACM.
    [36]Chris H.Q.Ding,Xiaofeng He,Hongyuan Zha,Ming Gu,and Horst D.Simon.A min-max cut algorithm for graph partitioning and data clustering.In ICDM '01:Proceedings of the 2001 IEEE International Conference on Data Mining,pages 107-114,Washington,DC,USA,2001.IEEE Computer Society.
    [37]Shilin Ding,Gao Cong,Chin-Yew Lin,and Xiaoyan Zhu.Using conditional random fields to extract contexts and answers of questions from online forums.In Proceedings of ACL-08:HLT,pages 710-718,Columbus,Ohio,June 2008.Association for Computational Linguistics.
    [38]B.Dom,I.Eiron,A.Cozzi,and Y.Zhang.Graph-based ranking algorithms for e-mail expertise analysis.In Proceedings of the 8th A CM SIGMOD workshop on Research issues in data mining and knowledge discovery,pages 42-48.ACM New York,NY,USA,2003.
    [39]Yon Dourisboure,Filippo Geraci,and Marco Pellegrini.Extraction and classification of dense communities in the web.In WWW '07:Proceedings of the 16th international conference on World Wide Web,pages 461-470,New York,NY,USA,2007.ACM.
    [40]Yon Dourisboure,Filippo Geraci,and Marco Pellegrini.Extraction and classification of dense implicit communities in the web graph.ACM Trans.Web,3(2):1-36,2009.
    [41]s.Deerwester Dumais,S.T.Furnas,G.W.Landauer,T.K,and R.Harshman.Indexing by latent semantic analysis.Journal of the American society for information science,41(6):391-407,1990.
    [42]L.Egghe.Theory and practise of the g-index.Scientometrics,69(1):131-152,2006.
    [43]L.Egghe.Dynamic h-index:the hirsch index in function of time.Journal of the American Society for Information Science and Technology,58(3):452-454,2007.
    [44]Jonathan L.Elsas and Jaime G.Carbonell.It pays to be picky:an evaluation of thread retrieval in online forums.In SIGIR '09:Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval,pages 714-715,New York,NY,USA,2009.ACM.
    [45]H.Fang and C.Zhai.Probabilistic models for expert finding.Lecture Notes in Computer Science,4425:418,2007.
    [46]Donghui Feng,Erin Shaw,Jihie Kim,and Eduard Hovy.An intelligent discussion-bot for answering student queries in threaded discussions.In IUI '06:Proceedings of the 11th international conference on Intelligent user interfaces,pages 171-177,New York,NY,USA,2006.ACM.
    [47]L.C.Freeman.The development of social network analysis:a study in the sociology of science.Empirical Press Vancouver,BC,2004.
    [48]Liqiang Geng,Hao Wang,Xin Wang,and Larry Korba.Adapting lda model to discover author-topic relations for email analysis.In DaWaK '08:Proceedings of the 10th international conference on Data Warehousing and Knowledge Discovery,pages 337-346,Berlin,Heidelberg,2008.Springer-Verlag.
    [49]D.Gibson,R.Kumar,and A.Tomkins.Discovering large dense subgraphs in massive graphs.In Proceedings of the 31st international conference on Very large data bases,pages 721-732.VLDB Endowment,2005.
    [50]Mark Girolami and Ata Kaban.On an equivalence between plsi and lda.In SIGIR '03:Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval,pages 433-434,New York,NY,USA,2003.ACM.
    [51]M.Girvan and MEJ Newman.Community structure in social and biological networks.Proceedings of the National Academy of Sciences,99(12):7821,2002.
    [52]Vicenc Gomez,Andreas Kaltenbrunner,and Vicente Lopez.Statistical analysis of the social network and discussion threads in slashdot.In WWW '08:Proceeding of the 17th international conference on World Wide Web,pages 645-654,New York,NY,USA,2008.ACM.
    [53]T.L.Griffiths and M.Steyvers.Finding scientific topics.Proceedings of the National Academy of Sciences,101(90001):5228-5235,2004.
    [54]M.Hamasaki,Y.Matsuo,K.Ishida,Y.Nakamura,T.Nishimura,and H.Takeda.Community focused social network extraction.Lecture Notes in Computer Science,4185:155,2006.
    [55]F.Maxwell Harper,Daniel Moy,and Joseph A.Konstan.Facts or friends?:distinguishing informational and conversational questions in social q&a sites.In CHI'09:Proceedings of the 27th international conference on Human factors in computing systems,pages 759-768,New York,NY,USA,20O9.ACM.
    [56]T.H.Haveliwala.Topic-sensitive pagerank.In Proc.8~(th) WWW,pages 322-331,Chiba,Japan,May 2005.
    [57]T.H.Haveliwala.Topic-sensitive pagerank:A context-sensitive ranking algorithm for web search.IEEE Transactions on Knowledge and Data Engineering,pages 784-796,2003.
    [58]B.He,C.Macdonald,I.Ounis,J.Peng,and R.L.T.Santos.University of glasgow at trec 2008:Experiments in blog,enterprise,and relevance feedback tracks with terrier.In Proceedings of the 17th Tezt REtrieval Conference (TREC 2008),volume 9,pages 10-2,2008.
    [59]JE Hirsch.An index to quantify an individual's scientific research output.Proceedings of the National Academy of Sciences,102(46):16569-16572,2005.
    [60]Thomas Hofmann.Probabilistic latent semantic indexing.In SIGIR '99:Proceedings of the 22nd annual international A CM SIGIR conference on Research and development in information retrieval,pages 50-57,New York,NY,USA,1999.ACM.
    [61]Liangjie Hong and Brian D.Davison.A classification-based approach to question answering in discussion boards.In SIGIR '09:Proceedings of the 32nd international A CM SIGIR conference on Research and development in information retrieval,pages 171-178,New York,NY,USA,2009.ACM.
    [62]Rajat Raina Andrew Y.Ng Honglak LEE,Alexix Battle.Efficient sparse coding algorithms.In Proc.19th NIPS,pages 801-808,Cambridge,MA,USA,2007.
    [63]Jizhou Huang,Ming Zhou,and Dan Yang.Extracting chatbot knowledge from online discussion forums.In Proc.11~(th) IJCAI,2006.
    [64]Noriko Imafuji and Masaru Kitsuregawa.Effects of maximum flow algorithm on identifying web community.In WIDM '02:Proceedings of the 4th international workshop on Web information and data management,pages 43-48,New York,NY,USA,2002.ACM.
    [65]Hidehiko Ino,Mineichi Kudo,and Atsuyoshi Nakamura.Partitioning of web graphs by community topology.In WWW '05:Proceedings of the 14th international conference on World Wide Web,pages 661-669,New York,NY,USA,2005.ACM.
    [66]H.Kautz,B.Selman,and M.Shah.Referralweb:Combining social networks and collaborative filtering.Communications of the ACM,1997.
    [67]Jong Wook Kim,K.Selcuk Candan,and Mehmet E.D(o|=)nderler.Topic segmentation of message hierarchies for indexing and navigation support.In WWW '05:Proceedings of the 14th international conference on World Wide Web,pages 322-331,New York,NY,USA,2005.ACM.
    [68]J.Kleinberg.Authoritative sources in a hyperlinked environment.Journal of the ACM,46(5):604-622,1999.
    [69]M.Kobayashi and R.Yung.Tracking topic evolution in on-line postings:2006 ibm innovation jam data.Lecture Notes in Computer Science,5012:616-625,2008.
    [70]P.Kolari,A.Java,T.Finin,T.Oates,and A.Joshi.Detecting spam blogs:A machine learning approach.In PROCEEDINGS OF THE NATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE,volume 21,pages 1351-1356.Menlo Park,CA;Cambridge,MA;London;AAAI Press;MIT Press;1999,2006.
    [71]R.Kumar,P.Raghavan,S.Rajagopalan,and A.Tomkins.Trawling the web for emerging cyber-communities.Computer Networks-the International Journal of Computer and Telecommunications Networkin,31(11):1481-1494,1999.
    [72]Jerome Kunegis,Andreas Lommatzsch,and Christian Bauckhage.The slashdot zoo:mining a social network with negative edges.In WWW '09:Proceedings of the 18th international conference on World wide web,pages 741-750,New York,NY,USA,2009.ACM.
    [73]John Lafferty and Chengxiang Zhai.Document language models,query models,and risk minimization for information retrieval.In SIGIR '01:Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval,pages 111-119,New York,NY,USA,2001.ACM.
    [74]Cliff Lampe and Erik Johnston.Follow the(slash) dot:effects of feedback on new members in an online community.In GROUP '05:Proceedings of the 2005 international ACM SIGGROUP conference on Supporting group work,pages 11-20,New York,NY,USA,2005.ACM.
    [75]Cliff Lampe and Paul Resnick.Slash(dot) and burn:distributed moderation in a large online conversation space.In CHI '04:Proceedings of the SIGCHI conference on Human factors in computing systems,pages 543-550,New York,NY,USA,2004.ACM.
    [76]Cliff A.C.Lampe,Erik Johnston,and Paul Resnick.Follow the reader:filtering comments on slashdot.In CHI '07:Proceedings of the SIGCHI conference on Human factors in computing systems,pages 1253-1262,New York,NY,USA,2007.ACM.
    [77]Huajing Li,Isaac G.Councill,Levent Bolelli,Ding Zhou,Yang Song,Wang-Chien Lee,Anand Sivasubramaniam,and C.Lee Giles.Citeseerx:a scalable autonomous scientific digital library.In InfoScale '06:Proceedings of the 1st international conference on Scalable information systems,page 18,New York,NY,USA,2006.ACM.
    [78]Huajing Li,Zaiqing Nie,Wang-Chien Lee,Lee Giles,and Ji-Rong Wen.Scalable community discovery on textual data with relations.In CIKM '08:Proceeding of the 17th ACM conference on Information and knowledge management,pages 1203-1212,New York,NY,USA,2008.ACM.
    [79]Longzhuang Li,Yi Shang,and Wei Zhang.Improvement of hits-based algorithms on web documents.In WWW '02:Proceedings of the 11th inter-national conference on World Wide Web,pages 527-535,New York,NY,USA,2002.ACM.
    [80]Chen Lin,Jiang-Ming Yang,Rui Cai,Xin-Jing Wang,and Wei Wang.Simultaneously modeling semantics and structure of threaded discussions:a sparse coding approach and its applications.In SIGIR '09:Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval,pages 131-138,New York,NY,USA,2009.ACM.
    [81]Chen Lin,Jiang-Ming Yang,Rui Cai,Xin-Jing Wang,Wei Wang,and Lei Zhang.Modeling semantics and structure of discussion threads.In WWW '09:Proceedings of the 18th international conference on World wide web,pages 1103-1104,New York,NY,USA,2009.ACM.
    [82]Yan Liu,Alexandru Niculescu-Mizil,and Wojciech Gryc.Topic-link lda:joint models of topic and author community.In ICML '09:Proceedings of the 26th Annual International Conference on Machine Learning,pages 665-672,New York,NY,USA,2009.ACM.
    [83]C.Macdonald,D.Hannah,and I.Ounis.High quality expertise evidence for expert search.Lecture Notes in Computer Science,4956:283,2008.
    [84]C.Macdonald and I.Ounis.A belief network model for expert search.In Proceedings of 1st conference on Theory of Information Retrieval(ICTIR),2007.
    [85]C.Macdonald and I.Ounis.Voting techniques for expert search.Knowledge and information systems,16(3):259-280,2008.
    [86]J.Mairal,F.Bach,J.Ponce,and G.Sapiro.Online learning for matrix factorization and sparse coding,stat,1050:1,2009.
    [87]Naohiro Matsumura,David E.Goldberg,and Xavier Llora.Mining directed social network from message board.In WWW '05:Special interest tracks and posters of the 14th international conference on World Wide Web,pages 1092-1093,New York,NY,USA,2005.ACM.
    [88]Y.Matsuo,J.Mori,M.Hamasaki,T.Nishimura,H.Takeda,K.Hasida,and M.Ishizuka.Polyphonet:an advanced social network extraction system from the web.Web Semantics:Science,Services and Agents on the World Wide Web,5(4):262-278,2007.
    [89]A.McCallum,X.Wang,and A.Corrada-Emmanuel.Topic and role discovery in social networks with experiments on Enron and academic email.Journal of Artificial Intelligence Research,30:249-272,2007.
    [90]D.W.McDonald and M.S.Ackerman.Expertise recommender:a flexible recommendation system and architecture.In Proceedings of the 2000 ACM conference on Computer supported cooperative work,pages 231-240.ACM New York,NY,USA,2000.
    [91]Duncan McDougall and Craig Macdonald.Expertise search in academia using facets.In SIGIR '09:Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval,pages 834-834,New York,NY,USA,2009.ACM.
    [92]Qiaozhu Mei,Deng Cai,Duo Zhang,and ChengXiang Zhai.Topic modeling with network regularization.In WWW '08:Proceeding of the 17th international conference on World Wide Web,pages 101-110,New York,NY,USA,2008.ACM.
    [93]Qiaozhu Mei,Duo Zhang,and ChengXiang Zhai.A general optimization framework for smoothing language models on graph structures.In SIGIR '08:Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval,pages 611-618,New York,NY,USA,2008.ACM.
    [94]K.A.Meyer.Face-to-face versus threaded discussions:The role of time and higher-order thinking.Journal of Asynchronous Learning Networks,7(3):55-65,2003.
    [95]Greg P.Milette,Michael K.Schneider,Kathy Ryall,and Robert Hyland.Exploiting social context for expertise propagation.In SIGIR '09:Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval,pages 835-835,New York,NY,USA,2009.ACM.
    [96]David Mimno and Andrew McCallum.Expertise modeling for matching papers with reviewers.In KDD '07:Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining,pages 500-509,New York,NY,USA,2007.ACM.
    [97]G.Mishne,D.Carmel,and R.Lempel.Blocking blog spam with language model disagreement.In Proceedings of the First International Workshop on Adversarial Information Retrieval on the Web(AIRWeb),Chiba,Japan,2005.
    [98]A.Mockus and J.D.Herbsleb.Expertise browser:a quantitative approach to identifying expertise.In Proceedings of the 24th International Conference on Software Engineering,pages 503-512.ACM New York,NY,USA,2002.
    [99]Ramesh M.Nallapati,Amr Ahmed,Eric P.Xing,and William W.Cohen.Joint latent topic models for text and citations.In KDD '08:Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining,pages 542-550,New York,NY,USA,2008.ACM.
    [100]L.Page,S.Brin,R.Motwani,and T.Winograd.The pagerank citation ranking:Bringing order to the web.Technical report.Stanford University,Stanford,CA,1998.
    [101]SK Pal and BL Narayan.A web surfer model incorporating topic continuity.IEEE Transactions on Knowledge and Data Engineering,17(5):726-729,2005.
    [102]Shashank Pandit,Duen Horng Chau,Samuel Wang,and Christos Faloutsos.Netprobe:a fast and scalable system for fraud detection in online auction networks.In WWW '07:Proceedings of the 16th international conference on World Wide Web,pages 201-210,New York,NY,USA,2007.ACM.
    [103]D.Petkova and W.B.Croft.Hierarchical language models for expert finding in enterprise corpora.In 18th IEEE International Conference on Tools with Artificial Intelligence,2006.ICTAI'06,pages 599-608,2006.
    [104]Xuan-Hieu Phan,Le-Minh Nguyen,and Susumu Horiguchi.Learning to classify short and sparse text & web with hidden topics from large-scale data collections.In WWW '08:Proceeding of the 17th international conference on World Wide Web,pages 91-100,New York,NY,USA,2008.ACM.
    [105]Jay M.Ponte and W.Bruce Croft.A language modeling approach to information retrieval.In SIGIR '98:Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval,pages 275-281,New York,NY,USA,1998.ACM.
    [106]Ian Porteous,David Newman,Alexander Ihler,Arthur Asuncion,Padhraic Smyth,and Max Welling.Fast collapsed gibbs sampling for latent dirichlet allocation.In KDD '08:Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining,pages 569-577,New York,NY,USA,2008.ACM.
    [107]Rajat Rains,Alexix Battle,Honglak Lee,Benjamin Packer,and Andrew Y.Ng.Self-taught learning:Transfer learning from unlabeled data.In Proc.24~(th) ICML,Corvallis,OR,2007.
    [108]M.Richardson and P.Domingos.The intelligent surfer:Probabilistic combination of link and content information in pagerank.Advances in Neural Information Processing Systems,2:1441-1448,2002.
    [109]J.Scott.Social network analysis:A handbook.Sage,2000.
    [110]P.Serdyukov and D.Hiemstra.Modeling documents as mixtures of persons for expert finding.Lecture Notes in Computer Science,4956:309,2008.
    [111]P.Serdyukov,D.Hiemstra,M.Fokkinga,and P.M.G.Apers.Generative modeling of persons and documents for expert search.In Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval,pages 827-828.ACM New York,NY,USA,2007.
    [112]Dou Shen,Qiang Yang,Jian-Tao Sun,and Zheng Chen.Thread detection in dynamic text message streams.In SIGIR '06:Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval,pages 35-42,New York,NY,USA,2006.ACM.
    [113]J.Shi and J.Malik.Normalized cuts and image segmentation.IEEE Trans-actions on pattern analysis and machine intelligence,22(8):888-905,2000.
    [114]Parag Singla and Matthew Richardson.Yes,there is a correlation:-from social networks to personal behavior on the web.In WWW '08:Proceeding of the 17th international conference on World Wide Web,pages 655-664,New York,NY,USA,2008.ACM.
    [115]I.Soboroff,A.P.de Vries,and N.Craswell.Overview of the trec 2006 enterprise track.TREC 2006 Working Notes,2006.
    [116]Xiaodan Song,Belle L.Tseng,Ching-Yung Lin,and Ming-Ting Sun.Personalized recommendation driven by information flow.In SIGIR '06:Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval,pages 509-516,New York,NY,USA,2006.ACM.
    [117]Y.Song,J.Huang,D.Zhou,H.Zha,and C.L.Giles.Iknn:Informative knearest neighbor pattern classification.Lecture Notes in Computer Science,4702:248,2007.
    [118]Y.Song,D.Zhou,J.Huang,IG Councill,H.Zha,and CL Giles.Boosting the feature space:Text classification for unstructured data on the web.In Data Mining,2006.ICDM'06.Sixth International Conference on,pages 1064-1069,2006.
    [119]Mark Steyvers,Padhraic Smyth,Michel Rosen-Zvi,and Thomas Griffiths.Probabilistic author-topic models for information discovery.In KDD '04:Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining,pages 306-315,New York,NY,USA,2004.ACM.
    [120]Bingjun Sun,Ding Zhou,Hongyuan Zha,and John Yen.Multi-task text segmentation and alignment based on weighted mutual information.In CIKM '06:Proceedings of the 15th ACM international conference on Information and knowledge management,pages 846-847,New York,NY,USA,2006.ACM.
    [121]Jie Tang,Duo Zhang,and Limin Yao.Social network extraction of academic researchers.In ICDM '07:Proceedings of the 2007 Seventh IEEE International Conference on Data Mining,pages 292-301,Washington,DC,USA,2007.IEEE Computer Society.
    [122]Jie Tang,Jing Zhang,Limin Yao,Juanzi Li,Li Zhang,and Zhong Su.Arnetminer:extraction and mining of academic social networks.In KDD '08: Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining,pages 990-998,New York,NY,USA,2008.ACM.
    [123]Wil M.P.van der Aalst,Hajo A.Reijers,and Minseok Song.Discovering social networks from event logs.Computer Supported Cooperative Work (CSCW),14(6):549-593,October 2005.
    [124]U.von Luxburg.A tutorial on spectral clustering.Statistics and Computing,17(4):395-416,2007.
    [125]Nayer Wanas,Motaz El-Saban,Heba Ashour,and Waleed Ammar.Automatic scoring of online discussion posts.In WICOW '08:Proceeding of the 2nd ACM workshop on Information credibility on the web,pages 19-26,New York,NY,USA,2008.ACM.
    [126]C.Wang,D.Blei,and D.Heckerman.Continuous time dynamic topic models.In The 23rd Conference on Uncertainty in Artificial Intelligence,pages 1-8,2008.
    [127]Xuerui Wang and Andrew McCallum.Topics over time:a non-markov continuous-time model of topical trends.In KDD '06:Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining,pages 424-433,New York,NY,USA,2006.ACM.
    [128]Y.C.Wang,M.Joshi,W.Cohen,and C Rose.Recovering implicit thread structure in newsgroup style conversations.In Proceedings of the 2nd International Conference on Weblogs and Social Media,pages 152-160,Seattle,Washington,U.S.A.,2008.AAAI Press.
    [129]Barry Wellman.The network is personal:Introduction to a special issue of Social Networks.Social Networks,29(3):349-356,2007.
    [130]G.Xu and W.Y.Ma.Building implicit links from content for forum search.In Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval,pages 300-307.ACM New York,NY,USA,2006.
    [131]Zhuoming XU,Xiao CAO,Yisheng DONG,and Yahong HAN.s-hitsc:an improved model and algorithm for topic distillation on the web.Soft Comput.,10(1):2-11,2006.
    [132]K.W.Yang and S.Y.Huh.Automatic expert identification using a text categorization technique in knowledge management systems.Expert Systems with Applications,34(2):1445-1455,2008.
    [133]Shinjae Yoo,Yiming Yang,Frank Lin,and Il-Chul Moon.Mining social networks for personalized email prioritization.In KDD '09:Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining,pages 967-976,New York,NY,USA,2009.ACM.
    [134]Chengxiang Zhai and John Lafferty.A study of smoothing methods for language models applied to information retrieval.ACM Trans.Inf.Syst.,22(2):179-214,2004.
    [135]J.Zhang and M.S.Ackerman.Searching for expertise in social networks:a simulation of potential strategies.In Proceedings of the 2005 international ACM SIGGROUP conference on Supporting group work,pages 71-80.ACM New York,NY,USA,2005.
    [136]J.Zhang,J.Tang,and J.Li.Expert finding in a social network.Lecture Notes in Computer Science,4443:1066-,2007.
    [137]Jun Zhang,Mark S.Ackerman,and Lada Adamic.Expertise networks in online communities:structure and algorithms.In WWW '07:Proceedings of the 16th international conference on World Wide Web,pages 221-230,New York,NY,USA,2007.ACM.
    [138]D.Zhou,S.A.Orshanskiy,H.Zha,and C.L.Giles.Co-ranking authors and documents in a heterogeneous network.In 7th IEEE International Conference on Data Mining(ICDM07),2007.
    [139]Ding Zhou,Jiang Bian,Shuyi Zheng,Hongyuan Zha,and C.Lee Giles.Exploring social annotations for information retrieval.In WWW '08:Proceeding of the 17th international conference on World Wide Web,pages 715-724,New York,NY,USA,2008.ACM.
    [140]Ding Zhou,Xiang Ji,Hongyuan Zha,and C.Lee Giles.Topic evolution and social interactions:how authors effect research.In CIKM '06:Proceedings of the 15th ACM international conference on Information and knowledge management,pages 248-257,New York,NY,USA,2006.ACM.
    [141]Ding Zhou,Eren Manavoglu,Jia Li,C.Lee Giles,and Hongyuan Zha.Probabilistic models for discovering e-communities.In WWW '06:Proceedings of the 15th international conference on World Wide Web,pages 173-182,New York,NY,USA,2006.ACM.
    [142]Ding Zhou,Shenghuo Zhu,Kai Yu,Xiaodan Song,Belle L.Tseng,Hongyuan Zha,and C.Lee Giles.Learning multiple graphs for document recommendations.In WWW '08:Proceeding of the 17th international conference on World Wide Web,pages 141-150,New York,NY,USA,2008.ACM.
    [143]Jianhan Zhu,Dawei Song,Stefan R(u|¨)ger,and Xiangji Huang.Modeling document features for expert finding.In CIKM '08:Proceeding of the 17th ACM conference on Information and knowledge management,pages 1421-1422,New York,NY,USA,2008.ACM.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700