高维大数据基因网络中的社区发现——以NC方法为例
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:Community Detection in Genetic Network Big Data: Taking NC Method as an Example
  • 作者:孙怡帆 ; 吴梦云 ; 史兴杰
  • 英文作者:Sun Yifan;Wu Mengyun;Shi Xingjie;
  • 关键词:基因网络 ; 社区发现 ; 元数据
  • 英文关键词:Genetic network;;Community detection;;Metadata
  • 中文刊名:TJYJ
  • 英文刊名:Statistical Research
  • 机构:中国人民大学统计学院;上海财经大学统计与管理学院;南京财经大学经济学院统计系;
  • 出版日期:2019-03-25
  • 出版单位:统计研究
  • 年:2019
  • 期:v.36;No.330
  • 基金:中国人民大学科学研究基金(中央高校基本科研业务费专项资金资助)项目“生物医学大学的统计方法基础研究”(15XNI011)的阶段性成果
  • 语种:中文;
  • 页:TJYJ201903011
  • 页数:5
  • CN:03
  • ISSN:11-1302/C
  • 分类号:126-130
摘要
从大量基因中识别出致病基因是大数据下十分重要的高维统计问题。基因间网络结构的存在使得对于致病基因的识别从单个基因识别扩展到基因模块识别。从基因网络中挖掘出基因模块就是所谓的社区发现(或节点聚类)问题。绝大多数社区发现方法仅利用网络结构信息,而忽略节点本身的信息。Newman和Clauset于2016年提出了一个将二者有机结合的基于统计推断的社区发现方法(简称为NC方法)。本文以NC方法为案例,介绍统计方法在实际基因网络中的应用和取得的成果,并从统计学角度提出了改进措施。通过对NC方法的分析可以看出,对于以基因网络为代表的非结构化数据,统计思想和原理在数据分析中仍然处于核心地位,但相应的统计方法则需要针对数据的特点及关心的问题进行相应的调整和优化。
        The identification of disease genes is an important high dimensional statistical problem. The network structure of genes has inspired researchers to shift their attention from single gene identification to genetic module identification. Detecting the genetic module from genetic network is the so-called community detection(or node clustering). Most research in this area only use the topology structure, but neglect the metadata on the nodes. Newman and Clauset proposed a statistical inference based community detection method(NC method) to combine the metadata and topology structure-. In this paper, we take NC method as an example to demonstrate the applications and achievements of statistical methods in genetic network, and discuss the potential improvements from statistical point of view. The analysis of NC method indicates that the statistical thinking and principle play an important role in the analysis of unstructured data, such as the genetic network. The statistical methods need adjustment and optimization according to the characteristics of data and questions of interest.
引文
[1]World Health Organization. World Health Statistics 2017: Monitoring Health for the SDGs[R]. 2017.
    [2]J M Stuart, et al. A Gene-coexpression Network for Global Discovery of Conserved Genetic Modules[J]. Science, 2003(302): 249-255.
    [3]P Beltrao, G Cagney, N Krogan. Quantitative Genetic Interactions Reveal Biological Modularity [J]. Cell, 2010(141): 739-745.
    [4]A L Barabási, N Gulbahce, J Loscalzo. Network Medicine: A Network-based Approach to Human Disease [J]. Nature Reviews Genetics, 2011(12): 56-68.
    [5]K Rohe, S Chatterjee, B Yu. Spectral Clustering and the High-dimensional Stochastic Block Model [J]. Annals of Statistics, 2011(39): 1878-1915.
    [6]S Fortunato, D Hric. Community Detection in Networks: A User Guide [J]. Physics Reports, 2016(659): 1-44.
    [7]M E J Newman, A Clauset. Structure and Inference in Annotated Networks [J]. Nature Communications, 2016 (7): 11863.
    [8]P W Holland, K B Laskey, S Leinhardt.Stochastic Block Models: Some First Steps [J]. Social Networks, 1983(5): 109-137.
    [9]S E Fienberg, S Wasserman. Categorical Data Analysis of Single Sociometric Relations [J]. Sociological Methodology, 1981 (12):156-192.
    [10]T Snijders, K Nowicki. Estimation and Prediction for Stochastic Blockmodels for Graphs with Latent Block Structure [J]. Journal of Classifications, 1997(14): 75-100.
    [11]A Decelle, et al. Inference and Phase Transitions in the Detection of Modules in Sparse Networks [J]. Physical Review Letter, 2011(107): 065701.
    [12]M Mezard, A Montanari. Information, Physics, and Computation [M]. Oxford University Press, 2009.
    [13]L Peel, D B Larremore, A Clauset. The Ground Truth About Metadata and Community Detection in Networks [J]. Science advance, 2017 (3): e1602548.
    [14]P J Mucha, et al. Community Structure in Time-dependent, Multiscale, and Multiplex Networks [J]. Science, 2010(328): 876-878.
    [15]T P Peixoto. Inferring the Mesoscale Structure of Layered, Edge-valued, and Time-varying Networks [J]. Physical Review E, 2015(92): 042807.
    [16]C De Bacco, et al. Community Detection, Link Prediction, and Layer Interdependence in Multilayer Networks [J]. Physical Review E, 2017(95): 042317.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700