Resampling-Based Gap Analysis for Detecting Nodes with High Centrality on Large Social Network

详细信息查看全文

作者：Kouzou Ohara (10)
Kazumi Saito (11)
Masahiro Kimura (12)
Hiroshi Motoda (13) (14)

10. Department of Integrated Information Technology ; Aoyama Gakuin University ; Kanagawa ; Japan
11. School of Administration and Informatics ; University of Shizuoka ; Shizuoka ; Japan
12. Department of Electronics and Informatics ; Ryukoku University ; Shiga ; Japan
13. Institute of Scientific and Industrial Research ; Osaka University ; Osaka ; Japan
14. School of Computing and Information Systems ; University of Tasmania ; Hobart ; Australia
关键词：Gap analysis ; Error estimation ; Resampling ; Node centrality
刊名：Lecture Notes in Computer Science
出版年：2015
出版时间：2015
年：2015
卷：9077
期：1
页码：135-147
全文大小：344 KB
参考文献：1. Bonacichi, P (1987) Power and centrality: A family of measures. Amer. J. Sociol. 92: pp. 1170-1182 CrossRef
2. Brandes, U (2001) A faster algorithm for betweenness centrality. Journal of Mathematical Sociology 25: pp. 163-177 CrossRef
3. Brin, S, Page, L (1998) The anatomy of a large-scale hypertextual web search engine. Computer Networks and ISDN Systems 30: pp. 107-117 CrossRef
4. Chen, W, Lakshmanan, L, Castillo, C (2013) Information and influence propagation in social networks. Synthesis Lectures on Data Management 5: pp. 1-177 CrossRef
5. Freeman, L (1979) Centrality in social networks: Conceptual clarification. Social Networks 1: pp. 215-239 CrossRef
6. Henzinger, MR, Heydon, A, Mitzenmacher, M, Najork, M (2000) On near-uniform url sampling. The International Journal of Computer and Telecommunications Networking 33: pp. 295-308
7. Katz, L (1953) A new status index derived from sociometric analysis. Sociometry 18: pp. 39-43
8. Kleinberg, J (2008) The convergence of social and technological networks. Communications of ACM 51: pp. 66-72 CrossRef
9. Klimt, B, Yang, Y The enron corpus: a new dataset for email classification research. In: Boulicaut, J-F, Esposito, F, Giannotti, F, Pedreschi, D eds. (2004) Machine Learning: ECML 2004. Springer, Heidelberg, pp. 217-226 CrossRef
10. Kurant, M, Markopoulou, A, Thiran, P (2011) Towards unbiased bfs sampling. IEEE Journal on Selected Areas in Communications 29: pp. 1799-1809 CrossRef
11. Leskovec, J., Faloutsos, C.: Sampling from large graphs. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2006), pp. 631鈥?36 (2006)
12. Newman, M.E.J.: Scientific collaboration networks. ii. shortest paths, weighted networks, and centrality. Physical Review E 64, 016132 (2001)
13. Ohara, K, Saito, K, Kimura, M, Motoda, H Resampling-based framework for estimating node centrality of large social network. In: D啪eroski, S, Panov, P, Kocev, D, Todorovski, L eds. (2014) Discovery Science. Springer, Heidelberg, pp. 228-239 CrossRef
14. Zhuge, H, Zhang, J (2010) Topological centrality and its e-science applications. Journal of the American Society of Information Science and Technology 61: pp. 1824-1841 CrossRef
作者单位：Advances in Knowledge Discovery and Data Mining
丛书名：978-3-319-18037-3
刊物类别：Computer Science
刊物主题：Artificial Intelligence and Robotics
Computer Communication Networks
Software Engineering
Data Encryption
Database Management
Computation by Abstract Devices
Algorithm Analysis and Problem Complexity
出版者：Springer Berlin / Heidelberg
ISSN：1611-3349

文摘

We address a problem of identifying nodes having a high centrality value in a large social network based on its approximation derived only from nodes sampled from the network. More specifically, we detect gaps between nodes with a given confidence level, assuming that we can say a gap exists between two adjacent nodes ordered in descending order of approximations of true centrality values if it can divide the ordered list of nodes into two groups so that any node in one group has a higher centrality value than any one in another group with a given confidence level. To this end, we incorporate confidence intervals of true centrality values, and apply the resampling-based framework to estimate the intervals as accurately as possible. Furthermore, we devise an algorithm that can efficiently detect gaps by making only two passes through the nodes, and empirically show, using three real world social networks, that the proposed method can successfully detect more gaps, compared to the one adopting a standard error estimation framework, using the same node coverage ratio, and that the resulting gaps enable us to correctly identify a set of nodes having a high centrality value.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700