基于复杂在线网络的社会化搜索
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
信息检索领域有两种不同的观点:以计算机为中心和以人为中心。前者将信息检索问题看做如下几个方面组成:建立有效的索引,以高性能处理用户查询,以及开发能够提高结果集合“质量”的排序算法。而以人为中心的观点认为,信息检索问题包括:研究用户的行为,理解用户的主要需求,并判定这样的理解如何影响检索系统的组成和运行。尽管以搜索引擎为代表的信息检索观点存在诸如大量信息无法被检索,查询需求表达困难,返回结果粒度大等问题,但是在信息检索领域,这种观点仍然是最有影响的。
     Web2.0的浪潮和“PeopleWeb”的崛起,给以人为中心的信息检索范式提供了新的机遇。这个新的机遇来自社会化搜索。Web2.0的核心代表是社会网络服务。本文以中国最大的大学生真实社交网络服务网站——“校内网”为研究对象,目的在于考察可能存在的以人为中心的信息检索范式。本文的主要工作包括:
     1.收集并制作校内网用户数据集。该数据集包含了南京航空航天大学的34085名注册用户的个人页面信息。具体包含三类信息:一,基本个人资料如姓名,性别,家乡,高中等。二,个人展示信息如头像,相册,日志等。三,社会交互信息包括好友列表,留言交互,礼物交互等。为数据挖掘,行为科学等研究领域提供了便利。
     2.分析用户行为角色和交互模式。基于校内网的留言板特性,构建留言交互社交网络,数据挖掘算法识别出明显差异的用户类别,包括“出访型”、“互访型”和“入访型”用户。进一步聚类识别出校内网中的“人气之星”,考察用户类别与声望之间的关系。在用户分类的基础上,采用卡方检验证实不同类型的用户之间存在着显著的交互模式。
     3.分析用户的需求类型和兴趣分布,构建仿真模型进行搜索实验。基于校内网的群组特性和用户数据中的群组列表,发现群组中的用户数量分布符合幂率。进一步将所有用户的兴趣分为“大众需求”,“小众需求”和“个性化需求”,并在此基础上构建仿真模型,进行专长定位搜索。实验表明,弱连带和出访类型搜索策略在“个性化需求”下的有效性。
     4.基于Java语言开发了SNSAnalyzer社交网站分析系统。整个系统包括网页爬虫模块、多智能体仿真模块,社会网络分析模块和社会网络可视化模块。
There are two points of view in information retrieval field, one of which is computer centered perspective and the other is focus on human behavior. The former view takes problems in information retrieval as efficiently building document index, dealing with user’s query and designing ranking algorithms that can improve the quality of result set. According to the later perspective, it is studying human behavior, understanding the need of users and judging how they will affect the construction and function of retrival system that frame information retrieval problems. The computer centered view represented by search engine technology is the dominated one although it suffers problems such as the difficulty of indexing the entire web, the hardness of express information need with key words and the size of result set.
     However, the revolution of“Web2.0”and the rising of“PeopleWeb”give new opportunity to the human centered information retrieval paradigm. The new opportunity is called“social search”. One of the core applications in Web2.0 revolution is social network service. In this thesis, xiaonei.com, the largest social network service focus on campus users in China, is scrutinized in order to study the potential of designing human centered information retrieval system. The main works are as follows:
     1. Collecting and making a dataset based on xiaonei.com. The dataset is collected from profile pages of 34085 users of Nanjing University of Aeronautics and Astronautics. The information in the dataset include three types: (1)demographic characteristics of users such as name, gender, hometown, highschool; (2)self-representation information such as head picture, album and weblogs; (3) information produced from interpersonal communication such as making friends, leaving messages and exchaning virtual gifts. This dataset can facilitate the research in fields like data mining and human behavior.
     2. Analyzing behavioral roles and communication patterns of users. A clustering algorithm clearly identify three categories of users including“outgoing”,“reciprocal”and“incoming”types based on social network constructed from message interaction. The“popular star”was further identified, showing the result is correlated with user prestige. The chi-square test for independence demonstrate clearly communication pattern exists among the different types of users.
     3. Analyzing the distribution of needs and interests of users and constructing a simulation model. The interests of users are studied by analyzing the group feature of xiaonei.com. They are found to obey power-law distribution. Furthermore, theses interests are classified into three categories:“mass need”,“group need”and“individual need”. A simulation model is constructed based on the taxonomy. An expertise location experiment demonstrate the effectiveness of the weak tie query propagate strategy and the outcoming strategy.
     4. The SNSAnalyzer system is built with java programming languaget. The entire system including four major components: web page crawler, multi-agent simulator, social network analysis component and social network visualizer.
引文
[1] Vannevar B. As we may think. The Atlantic Monthly. 1945. Availabe from: www.theatlantic.com/doc/194507/bush.
    [2] Nardi B, Whittaker S, and Bradner E. Interaction and outeraction: instant messaging in action. Proceedings of CSCW '00. Philadelphia, PA, 2000:79-88.
    [3] Berners-Lee T,Fischetti M, and Dertouzos T M. Weaving the Web: The Original Design and Ultimate Destiny of the World Wide Web by its Inventor. Harper, San Francisco,1999.
    [4] boyd d m, Ellison N B. Social network sites: definition, history, and scholarship. Journal of Computer-Mediated Communication, 2007, 13(11).
    [5] Donath J, boyd d m. Public displays of connection. BT Technology Journal, 2004, 22(4): 71-82.
    [6] Donath J. Signals in social supernets. Journal of Computer-Mediated Communication, 2007, 13(1).
    [7] Baeza-Yates R, Ribeiro-Neto B. Modern Information Retrieval. New York, ACM Press, 1999.
    [8] Kautz H, Selman B, and Milewski A. Agent Amplifed Communication. Proceedings of AAAI-96, Portland, Oreg, Cambridge, 1996.
    [9] PAGE L, BRIN S, MOTWANI R, et al. The pagerank citation ranking: Bringing order to the web. Tech. Rep.. Computer Systems Laboratory, Stanford University, Stanford, CA, 1998.
    [10] Gy¨ongyi Z, Koutrika G, Pedersen J, et al. Questioning Yahoo! Answers. First WWW Workshop on Question Answering on the Web, 2008.
    [11] Sherman C. What's the Big Deal With Social Search?. SearchEngineWatch, Aug 15, 2006.
    [12] Freyne J, Farzan R, Brusilovsky P, et al. Collecting community wisdom: Integrating social search & social navigation. In: IUI 2007, Honolulu, HI, USA: ACM Press,2007.
    [13] Banerjee A, Basu S. A social query model for decentralized search. Technical report, University of Minnesota, 2007.
    [14] Davitz J, Yu J, Basu S, et al. ilink: Search and routing in social networks. In ACM KDD, 2007.
    [15] Milgram S. The small-world problem. Psychology Today, 1967:60-67.
    [16] Travers J, Milgram S. An experimental study of the small world problem. Sociometry, 1969, 32:425-443.
    [17] Brusilovsky P. Social information access: The other side of the social web. In Proc. of 4th International Conference on Current Trends in Theory and Practice of Computer Science,Lecture Notes in Computer Science,Springer Verlag, vol.4910, pp.5-22, 2008.
    [18] Kantor P B, Boros E, Melamed B, et al. Capturing human intelligence in the net. Communications of the ACM, 2000, 43(8):12-116.
    [19] Jung S, Harris K, Webster J, et al. SERF: Integrating human recommendations with search. In: Proc. of ACM 13th Conference on Information and Knowledge Management, 2004:571-580.
    [20] Smyth B, Balfe E, Freyne J, et al. Exploiting Query Repetition and Regularity in an Adaptive Community-Based Web Search Engine. User Modeling and User-Adapted Interaction,2004, 14(5):383-423.
    [21] Freyne J, Smyth B. An Experiment in Social Search, Proceedings of the 3rd International Conference on Adaptive Hypermedia and Adaptive Web-Based Systems (AH2004), The Netherlands, Eindhoven, 2004.
    [22] Hill W, Hollan J, Wroblewzki D, et al. EditWear and ReadWear. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, ACM Press, 1992:3-9.
    [23] Wexelblat A, Maes P. Footprints: History-Rich Web Browsing. In Proceedings of the Third International Conference on Computer-Assisted Information Retrieval. Montreal, Quebec, Canada, 1997.
    [24] Twindale M, Paice C.Browsing is a collaborative process. Information Processing and Management, 1997, 33(6):761-83.
    [25] Brusilovsky P, Chavan G, and Farzan R. Social adaptive navigation support for open corpus electronic textbooks. In Proceedings of Third International Conference on Adaptive Hypermedia and Adaptive Web-Based Systems (AH'2004), the Netherlands, Eindhoven, 2004:24-33.
    [26] Farzan R, Brusilovsky P. Social navigation support in E-Learning: What are real footprints. In Proceedings of IJCAI'05 Workshop on Intelligent Techniques for Web Personalization, Edinburgh, U.K., 2005:49-56.
    [27] Marchiori M. Social search engines. International Journal of Bifurcation and Chaos, 2006.
    [28] Watts D J, Dodds P S, and Newman M E J. Identity and search in social networks. Science, 2002, 296:1302-1305.
    [29] Kleinberg J. Navigation in a small world. Nature 406, 845, 2000.
    [30] Kleinberg J. The small-world phenomenon: An algorithmic perspective. Proceedings of the 32nd ACM Symposium on Theory of Computing. Also appears as Cornell Computer Science Technical Report 99-1776, 1999.
    [31] Dodds P S, Muhamad R, and Watts D J. An experimental study of search in global socialnetworks, Science, 2003, 301:827-829.
    [32] Kautz H, Selman B, and Coen M. Bottom-up design of software agents. Comm. ACM, 1994, 37(7):143-146.
    [33] Kautz H, Selman B, and Shah M. Referral web: Combining social networks and collaborative filtering. Communications of the ACM, 1997, 40(3):63-65.
    [34] Bonnell R, Huhns M, Stephens L, et al. MINDS: Multiple intelligent node document servers. In Proceedings of IEEE First International Conference on Office Automation, 1984:125-136.
    [35] Krulwich B, Burkey C. The ContactFinder: Answering bulletin board questions with referrals. In Proceedings of the National Conference on Artificial Intelligence, 1996:10-15.
    [36] Kanfer A, Sweet J, and Schlosser A E. Humanizing the net: Social navigation with a "know-who" email agent. In Proceedings of the 3rd Conference on Human Factors and the Web, 1997.
    [37] Vivacqua A, Lieberman H. Agents to assist in finding help. In Proceedings of ACM CHI'00 Conference on Human Factors in Computing Systems, 2000:65-72.
    [38] Yu B, Venkatraman M, and Singh M P. An adaptive social network for information access: Theoretical and experimental results. Applied Artificial Intelligence, 2002.
    [39] Zhang J, Ackerman M. Searching for expertise in social networks: a simulation of potential strategies. In Proc. of the 2005 international ACM SIGGROUP conference on supporting group work, 2005:71–80.
    [40] Zhang J, Ackerman M S, and Adamic L. Expertise networks in online communities: structure and algorithms. In WWW '07: Proceedings of the 16th international conference on World Wide Web, New York, NY, USA, ACM Press, 2007:221-230.
    [41] Ellison N, Steinfield C, Lampe C. The benefits of Facebook "friends": Exploring the relationship between college students' use of online social networks and social capital. Journal of Computer-Mediated Communication, 12(3), 2007.
    [42] Golder S, Donath J. Social Roles in Electronic Communities. Presented at the Association of Internet Researchers (AoIR) Brighton, England, 2004:19-22.
    [43] Maia M, Almeida J, Almeida V. Identifying User Behavior in Online Social Networks. Proceedings of the first Workshop on Social Network Systems, 2008.
    [44] Liu H. Social network profiles as taste performances. Journal of Computer-Mediated Communication, 2007, 13(13).
    [45] Jiawei H, Kamber M.数据挖掘概念与技术(范明,孟小峰译),北京:机械工业出版社,2007.
    [46]弗雷德里克格雷维特,拉里沃尔塔.行为科学统计学精要(第6版).北京:北京大学出版社,2008:471-494.
    [47] Anderson C, The long tail, Wired, Oct. 2004. Available from: http://www.wired.com/wired/archive/12.10/tail.htmli.
    [48] Granovetter M. The strength of weak ties. American Journal of Sociology, 1973, 78:1360-1380.
    [49] Serenko A, Detlor B. Agent Toolkits: A General Overview of the Market and an Assessment of Instructor Satisfaction with Utilizing Toolkits in the Classroom, McMaster University, Hamilton, Ontario, Canada, 2002.
    [50] Nooy d w, Mrvar A, and Batagelj V. Exploratory social network analysis with Pajek, structural analysis in the social sciences, New York, Cambridge University Press,2005.
    [51] Hanneman, Robert A, Riddle M. Introduction to social network methods. University of California Riverside, 2005.
    [52] O'Madadhain J, Fisher D, Smyth P, et al. Analysis and visualization of network data using JUNG. Journal of Statistical Software, 2005:1-35.
    [53] Heer J, Card S K, Landay J A. prefuse: A Toolkit for Interactive Information Visualization. ACM Human Factors in Computing Systems (CHI), 2005.
    [54] Josef K, Hermann M. The Transformation of the Web: How Emerging Communities Shape the Information We Consume. Journal of Universal Computer Science, Vol. 12, No.2, pp.187-213, 2005.
    [55] Ramakrishnan R, Tomkins A. Toward a PeopleWeb. Computer, vol.40, no.8, pp.63-72, 2007.
    [56] Kautz H, Selman B, and Shah M. The hidden web. AI Magazine, 1997, 18(2):27-36.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700