A new multivariate test formulation: theory, implementation, and applications to genome-scale sequencing and expression
详细信息    查看全文
  • 作者:Lei Xu
  • 关键词:Multivariate test ; Lattice taxonomy ; Intrinsic factors ; Property ; oriented rejection ; Best first path ; Dependence decoupling ; Directional test ; Case–control study ; Two ; variate PT test
  • 刊名:Applied Informatics
  • 出版年:2016
  • 出版时间:December 2016
  • 年:2016
  • 卷:3
  • 期:1
  • 全文大小:2,812 KB
  • 参考文献:Adhikari K, Reales G, Smith AJ, Konka E, Palmen J, Quinto-Sanchez M, Acuña-Alonzo V, Jaramillo C, Arias W, Fuentes M et al (2015) A genome-wide association study identifies multiple loci for variation in human ear morphology. Nat Commun 6:7500CrossRef
    BaiZ D, Saranadasa H (1996) Effect of high dimension: by an example of a two sample problem. Statistica Sinica 6(2):311–329MathSciNet
    Bansal V, Libiger O, Torkamani A, Schork NJ (2010) Statistical analysis strategies for association studies involving rare variants. Nat Rev Genetics 11(11):773–785CrossRef
    Chapman J, Whittaker J (2008) Analysis of multiple snps in a candidate gene or region. Genetic Epidemiol 32(6):560CrossRef
    Demidenko E (2013) Mixed models: theory and applications with R. probability and statistics. John Wiley and Sons, Hoboken
    Dempster AP (1958) A high dimensional two sample significance test. Ann Math Stat 995–1010
    DempsterA P (1960) A significance test for the separation of two highly multivariate small samples. Biometrics 16(1):41–50MathSciNet CrossRef
    Evangelou E, Ioannidis JP (2013) Meta-analysis methods for genome-wide association studies and beyond. Nat Rev Genet 14(6):379–389CrossRef
    Fan R, Knapp M (2003) Genome association studies of complex diseases by case-control designs. Am J Hum Genet 72(4):850–868CrossRef
    Ferguson J, Wheeler W, Fu Y, Prokunina-Olsson L, Zhao H, Sampson J (2013) Statistical tests for detecting associations with groups of genetic variants: generalization, evaluation, and implementation. Euro J Human Genet 21(6):680–686CrossRef
    Fisher RA (1932) Statistical methods for research workers, 4th edn, Oliver and Boyd, Edinburgh, pp 99–101
    Gudmundsson J, Sulem P, Gudbjartsson DF, Masson G, Agnarsson BA, Benediktsdottir KR, Sigurdsson A, Magnusson OT, Gudjonsson SA, Magnusdottir DN (2012) A study based on whole-genome sequencing yields a rare variant at 8q24 associated with prostate cancer. Nat Genet 44(12):1326–1329CrossRef
    Han F, Pan W (2010) A data-adaptive sum test for disease association with multiple common or rare variants. Human Heredity 70(1):42–54CrossRef
    Hotelling H (1931) The generalization of student’s ratio. Ann Math Stat 2(3):360–378CrossRef
    Kiezun A, Garimella K, Do R, Stitziel NO, Neale BM, McLaren PJ, Gupta N, Sklar P, Sullivan PF, Moran JL (2012) Exome sequencing and the genetic basis of complex traits. Nature genetics 44(6):623–630
    Koh K, Kim SJ, Boyd SP (2007) An interior-point method for large-scale l1-regularized logistic regression. J Mach Learn Res 8(8):1519–1555MathSciNet MATH
    Lee S, Abecasis GR, Boehnke M, Lin X (2014) Rare-variant association analysis: Study designs and statistical tests. Am J Human Genet 95(1):5–23CrossRef
    Lee S, Wu MC, Lin X (2012) Optimal tests for rare variant effects in sequencing association studies. Biostatistics 13(4):762–775CrossRef
    Li H, Gui J (2004) Partial cox regression analysis for high-dimensional microarray gene expression data. Bioinformatics 20(suppl 1):208–215CrossRef
    Li B, Leal SM (2008) Methods for detecting associations with rare variants for common diseases: application to analysis of sequence data. Am J Human Genet 83(3):311–321CrossRef
    Morgenthaler S, Thilly WG (2007) A strategy to discover genes that carry multi-allelic or mono-allelic risk for common diseases: a cohort allelic sums test (cast). Mut Res/Fund Mol Mech Mutag 615(1):28–56CrossRef
    MorrisA P, Zeggini E (2010) An evaluation of statistical approaches to rare variant analysis in genetic association studies. Genetic Epidemiol 34(2):188CrossRef
    Price AL, Kryukov GV, de Bakker PI, Purcell SM, Staples J, Wei LJ, Sunyaev SR (2010) Pooled association tests for rare variants in exon-resequencing studies. Am J Human Genet 86(6):832–838CrossRef
    ShevadeS K, KeerthiS S (2003) A simple and efficient algorithm for gene selection using sparse logistic regression. Bioinformatics 19(17):2246–2253CrossRef
    SrivastavaM S (2007) Multivariate theory for analyzing high dimensional data. J Jpn Stat Soc 37(1):53–86CrossRef
    Suykens JA, Van Geste lT, De Brabanter J, De Moor B, Vandewalle J (2002) Least squares support vector machines. World Scientific Publishing, SingaporeCrossRef MATH
    SuykensJ A, Vandewalle J (1999) Least squares support vector machine classifiers. Neural Process Lett 9(3):293–300CrossRef
    SwansonD M, Blacker D, AlChawa T, Ludwig KU, Mangold E, Lange C (2013) Properties of permutation-based gene tests and controlling type 1 error using a summary statistic based gene test. BMC Genet 14(1):108CrossRef
    Tu S, Xu L (2011) An investigation of several typical model selection criteria for detecting the number of signals. Front Elect Electronic Eng China 6(2):245–255MathSciNet CrossRef
    Tu S, Xu L (2014) Learning binary factor analysis with automatic model selection. Neurocomputing 134:149–158CrossRef
    WuM C, Lee S, Cai T, Li Y, Boehnke M, Lin X (2011) Rare-variant association testing for sequencing data with the sequence kernel association test. Am J Hum Genet 89(1):82–93CrossRef
    Xu L (2003) Independent component analysis and extensions with noise and time: a bayesian ying-yang learning perspective. Neural Inform Process Lett Rev 1:1–52
    Xu L (2009) Independent subspaces. In: Rabunal JR, Dorado J, Sierra AP (eds.) Encyclopedia of Artificial Intelligence. IGI Global Snippet, Hershey, Pennsylvania, pp 892–901
    Xu L (2011) Codimensional matrix pairing perspective of byy harmony learning: hierarchy of bilinear systems, joint decomposition of data-covariance, and applications of network biology. Front Electr Electron Eng China 6:86–119. A special issue on Machine Learning and Intelligence Science: IScIDE2010 (A)
    Xu L (2015a) Bi-linear matrix-variate analyses, integrative hypothesis tests, and case-control studies. Appl Inform 2(1):1–39CrossRef
    Xu L (2015b) Further advances on bayesian ying yang harmony learning. Appl Inform 2(5)
    ZaykinD V (2011) Optimally weighted z-test is a powerful method for combining probabilities in meta-analysis. J Evol Biol 24(8):1836–1841CrossRef
    Zhou X, Stephens M (2014) Efficient multivariate linear mixed model algorithms for genome-wide association studies. Nat Methods 11(4):407–409CrossRef
  • 作者单位:Lei Xu (1) (2)

    1. Department of Computer Science and Engineering, The Chinese University of Hong Kong, Shatin, NT, Hong Kong, China
    2. Department of Computer Science and Engineering, Centre for Brain-inspired Computing and Bio-Health Informatics, The School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University, SEIEE Building 3, 800 Dongchuan Road, Minhang District, 200240, Shanghai, China
  • 刊物类别:Computing Methodologies; Bioinformatics; Health Informatics; Computer Imaging, Vision, Pattern Recog
  • 刊物主题:Computing Methodologies; Bioinformatics; Health Informatics; Computer Imaging, Vision, Pattern Recognition and Graphics; Computer Applications; Statistics for Life Sciences, Medicine, Health Sciences;
  • 出版者:Springer Berlin Heidelberg
  • ISSN:2196-0089
文摘
A new formulation is proposed for multivariate test, consisting of not only a hierarchy of numerous tests organised in a lattice taxonomy of properties that come from different combinations of multi-variates and represent different factors associated with the rejection of null hypothesis, but also by a theory of property-oriented rejection. Located on the bottom level of this taxonomy is a conventional formulation of multivariate test, featured by a property with the weakest collegiality and a rejection with the largest p value. From one level up to the next, the dimension of rejection increases, the collegiality of properties strengthen, and the p values reduce, until the top level that is featured by a property with the strongest collegiality and a rejection with the smallest p value. Instead of traversing all the combinations in the taxonomy, an easy implementation is developed to identify distinctive properties by the best first path (BFP) in a lattice taxonomy of an appropriate number of intrinsic factors that are obtained after decoupling second-order dependence cross multivariate statistics and discarding those non-distinctive components. Even away off this BFP, if needed, a particular combination of intrinsic factors may be conveniently tested in such a taxonomy too. Moreover, further improvement is made by considering some dependence of higher than second order, with the top level p value refined into one upper bound that is obtained by directional test. Furthermore, detailed implementations are also provided for applications to genome-scale sequencing and expression, with particular emphasis on multivariate phenotype-targeted test for expression profile analyses. Keywords Multivariate test Lattice taxonomy Intrinsic factors Property-oriented rejection Best first path Dependence decoupling Directional test Case–control study Two-variate PT test

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700