A theoretic study of a distance-based regression model
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:A theoretic study of a distance-based regression model
  • 作者:Jialu ; Li ; Wei ; Zhang ; Sanguo ; Zhang ; Qizhai ; Li
  • 英文作者:Jialu Li;Wei Zhang;Sanguo Zhang;Qizhai Li;School of Mathematics and Statistics, Beijing Institute of Technology;Biostatistics and Bioinformatics Branch, National Institute of Child Health and Human Development;School of Mathematical Sciences, University of Chinese Academy of Sciences;Key Laboratory of Big Data Mining and Knowledge Management, Chinese Academy of Sciences;LSC, NCMIS, Academy of Mathematics and Systems Science, Chinese Academy of Sciences;
  • 英文关键词:distance-based regression;;Euclidean;;pseudo F test statistic;;Mahalanobis
  • 中文刊名:JAXG
  • 英文刊名:中国科学:数学(英文版)
  • 机构:School of Mathematics and Statistics, Beijing Institute of Technology;Biostatistics and Bioinformatics Branch, National Institute of Child Health and Human Development;School of Mathematical Sciences, University of Chinese Academy of Sciences;Key Laboratory of Big Data Mining and Knowledge Management, Chinese Academy of Sciences;LSC, NCMIS, Academy of Mathematics and Systems Science, Chinese Academy of Sciences;
  • 出版日期:2019-02-25 14:07
  • 出版单位:Science China(Mathematics)
  • 年:2019
  • 期:v.62
  • 基金:supported by National Natural Science Foundation of China (Grant No. 11722113)
  • 语种:英文;
  • 页:JAXG201905009
  • 页数:20
  • CN:05
  • ISSN:11-5837/O1
  • 分类号:161-180
摘要
The distance-based regression model has many applications in analysis of multivariate response regression in various ?elds, such as ecology, genomics, genetics, human microbiomics, and neuroimaging. It yields a pseudo F test statistic that assesses the relation between the distance(dissimilarity) of the subjects and the predictors of interest. Despite its popularity in recent decades, the statistical properties of the pseudo F test statistic have not been revealed to our knowledge. This study derives the asymptotic properties of the pseudo F test statistic using spectral decomposition under the matrix normal assumption, when the utilized dissimilarity measure is the Euclidean or Mahalanobis distance. The pseudo F test statistic with the Euclidean distance has the same distribution as the quotient of two Chi-squared-type mixtures. The denominator and numerator of the quotient are approximated using a random variable of the form ξχ_d~2+ η, and the approximate error bound is given. The pseudo F test statistic with the Mahalanobis distance follows an F distribution.In simulation studies, the approximated distribution well matched the "exact" distribution obtained by the permutation procedure. The obtained distribution was further validated on H1N1 in?uenza data, aging human brain data, and embryonic imprint data.
        The distance-based regression model has many applications in analysis of multivariate response regression in various ?elds, such as ecology, genomics, genetics, human microbiomics, and neuroimaging. It yields a pseudo F test statistic that assesses the relation between the distance(dissimilarity) of the subjects and the predictors of interest. Despite its popularity in recent decades, the statistical properties of the pseudo F test statistic have not been revealed to our knowledge. This study derives the asymptotic properties of the pseudo F test statistic using spectral decomposition under the matrix normal assumption, when the utilized dissimilarity measure is the Euclidean or Mahalanobis distance. The pseudo F test statistic with the Euclidean distance has the same distribution as the quotient of two Chi-squared-type mixtures. The denominator and numerator of the quotient are approximated using a random variable of the form ξχ_d~2+ η, and the approximate error bound is given. The pseudo F test statistic with the Mahalanobis distance follows an F distribution.In simulation studies, the approximated distribution well matched the "exact" distribution obtained by the permutation procedure. The obtained distribution was further validated on H1N1 in?uenza data, aging human brain data, and embryonic imprint data.
引文
1 Chen J,Bittinger K,Charlson E S,et al.Associating microbiome composition with environmental covariates using generalized UniFrac distances.Bioinformatics,2012,28:2106-2113
    2 Du S,Lv J.Minimal Euclidean distance chart based on support vector regression for monitoring mean shifts of autocorrelated processes.Internat J Product Econom,2013,141:377-387
    3 Gower J C.Some distance properties of latent root and vector methods used in multivariate analysis.Biometrika,1966,53:325-338
    4 Kruskal J B.Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis.Phychometrika,1964,29:1-27
    5 Li Q,Wacholder S,Hunter D J,et al.Genetic background comparison using distance-based regression,with applications in population stratification evaluation and adjustment.Genet Epidemiol,2009,33:432-441
    6 Lu T,Pan Y,Kao S,et al.Gene regulation and DNA damage in the aging human brain.Nature,2004,429:883-891
    7 McArdle B H,Anderson M J.Fitting multivariate models to community data:A comment on distance-based redundancy analysis.Ecology,2001,82:290-297
    8 Nievergelt C M,Libiger O,Schork N J.Generalized analysis of molecular variance.PLoS Genet,2007,3:467-478
    9 Pan W.Relationship between genomic distance-based regression and kernel machine regression for multi-marker association testing.Genet Epidemiol,2011,35:211-216
    10 Shapira S D,Irit G V,Shum B O V,et al.A physical and regulatory map of host-influenza interactions reveals pathways in H1N1 infection.Cell,2009,139:1255-1267
    11 Shapiro S S,Wilk M B.An analysis of variance test for normality(complete samples).Biometrika,1965,52:591-611
    12 Shehzad Z,Kelly C,Reiss P T,et al.A multivariate distance-based analytic framework for connectome-wide association studies.Neuroimage,2014,93:74-94
    13 Wessel J,Schork N J.Generalized genomic distance-based regression methodology for multilocus association analysis.Amer J Hum Genet,2006,79:792-806
    14 Xu Y,Guo X,Sun J,et al.Snowball:Resampling combined with distance-based regression to discover transcriptional consequences of a driver mutation.Bioinformatics,2015,31:84-93
    15 Zapala M A,Hovatta I,Ellison J A,et al.Adult mouse brain gene expression patterns bear an embryologic imprint.Proc Natl Acad Sci USA,2005,102:10357-10362
    16 Zapala M A,Schork N J.Multivariate regression analysis of distance matrices for testing associations between gene expression patterns and related variables.Proc Natl Acad Sci USA,2006,103:19430-19435
    17 Zhang J.Approximate and asymptituc distributions of Chi-squared-type mixtures with applications.J Amer Statist Assoc,2005,100:273-285

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700