Bayesian variable selection in multinomial probit model for classifying high-dimensional data
详细信息    查看全文
  • 作者:Aijun Yang ; Yunxian Li ; Niansheng Tang ; Jinguan Lin
  • 关键词:Bayesian stochastic search variable selection ; Generalized $$g$$ g ; prior ; Multi ; class classification
  • 刊名:Computational Statistics
  • 出版年:2015
  • 出版时间:June 2015
  • 年:2015
  • 卷:30
  • 期:2
  • 页码:399-418
  • 全文大小:842 KB
  • 参考文献:Albert J, Chib S (1993) Bayesian analysis of binary and polychotomous response data. J Am Stat Assoc 88:669鈥?79MATH MathSciNet View Article
    Alizadeh AA, Eisen MB, Davis RE, Ma C, Lossos IS, Rosenwald A, Boldrick JC, Sabet H, Tran T, Yu X, Powell JI, Yang L, Marti GE, Moore T, Hudson J, Lu L, Lewis DB, Tibshirani R, Sherlock G, Chan WC, Greiner TC, Weisenburger DD, Armitage JO, Warnke R, Staudt LM et al (2000) Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature 403:503鈥?11View Article
    Ambroise C, McLachlan GJ (2002) Selection bias in gene extraction on the basis of microarray gene-expression data. Proc Natl Acad Sci USA 99:6562鈥?566MATH View Article
    Antonov AV, Tetko IV, Mader MT, Budczies J, Mewes HW (2004) Optimization models for cancer classification: extracting gene interaction information from microarray expression data. Bioinformatics 20:644鈥?52View Article
    Ben-Dor A, Bruhn L, Friedman N, Nachman I, Schummer M, Yakhini Z (2000) Tissue classification with gene expression profiles. J Comput Biol 7:559鈥?83View Article
    Brown PJ (1993) Measurement, regression, and calibration. Clarendon, OxfordMATH
    Brown PJ, Vannucci M, Fearn T (1998) Multivariate Bayesian variable selection and prediction. J R Stat Soc B 60:627鈥?41MATH MathSciNet View Article
    Chu W, Ghahramani Z, Falciani F, Wild DL (2005) Biomarker discovery in microarray gene expression data with Gaussian processes. Bioinformatics 21:3385鈥?393View Article
    Dawid AP (1981) Some matrix-variate distribution theory: notational considerations and a Bayesian application. Biometrika 68:265鈥?74MATH MathSciNet View Article
    Dettling M (2004) BagBoosting for tumor classification with gene expression data. Bioinformatics 20:3583鈥?593View Article
    Dettling M, B眉hlmann P (2003) Boosting for tumor classification with gene expression data. Bioinformatics 19:1061鈥?069View Article
    Draminski M et al (2008) Monte Carlo feature selection for supervised classification. Bioinformatics 24:110鈥?17View Article
    D铆za-Uriarte, And茅s (2006) Gene selection and classification of microarray data using random forest. BMC Bioinformatics 7:3View Article
    Dudoit Y, Yang H, Callow M, Speed T (2002) Comparison of discrimination methods for the classification of tumors using gene expression data. J Am Stat Assoc 97:77鈥?7MATH View Article
    Genz A, Bretz F (2002) Methods for the computation of multivariate t-probabilities. J Comput Graph Stat 11:950鈥?71MathSciNet View Article
    Gelfand A (1996) Model determination using sampling-based methods. In: Gilks WR, Richardson S, Spiegelhalter DJ (eds) Markov chain Monte Carlo in practice. Chapman and Hall, London, pp 145鈥?58
    George EI, McCulloch RE (1993) Variable selection via Gibbs sampling. J Am Stat Assoc 88:881鈥?89View Article
    Geman S, Geman D (1984) Stochastic relaxation, Gibbls distribution, and the Bayesian restoration of images. IEEE Trans Pattern Anal Mach Intell 6:721鈥?41MATH View Article
    Gilks W, Richardson S, Spiegelhalter D (1996) Markov chain Monte Carlo in practise. Chapman and Hall, London
    Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP, Coller H, Loh ML, Downing JR, Caligiuri MA, Bloomfield CD, Lander ES (1999) Molecular classification of cancer:class discovery and class prediction by gene expression monitoring. Science 286:531鈥?37View Article
    Gupta M, Ibrahim JG (2007) Variable selection in regression mixture modeling for the discovery of gene regulatory networks. J Am Stat Assoc 102:867鈥?80MATH MathSciNet View Article
    Gupta M, Ibrahim JG (2009) An information matrix prior for Bayesian analysis in generalized linear models with high dimensional data. Stat Sin 19:1641鈥?663MATH MathSciNet
    Guyon I, Weston J, Barnhill S, Vapnik V (2012) Gene selection for cancer classification using support vector machines. Mach Learn 46:389鈥?22View Article
    Ha HJ, Kubagawa H, Burrows PD (1992) Molecular cloning and expression pattern of a human gene homologous to the murine mb-1 gene. J Immunol 148:1526鈥?531
    Jaeger J, Sengupta R, Ruzzo WL (2003) Improved gene selection for classification of microarrays. Pac Symp Biocomput 8:53鈥?4
    Khan J, Wei JS, Ringnr M, Saal LH, Ladanyi M, Westermann F, Berthold F, Schwab M, Antonescu CR, Peterson C, Meltzer PS (2001) Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nat Med 7:673鈥?79View Article
    Kamps MP, Murre C, Sun X-H, Baltimore D (1990) A new homeobox gene contributes the DNA binding domain of the t(1;19) translocation protein in pre-B ALL. Cell 6:547鈥?55View Article
    Kingsmore SF, Watson ML, Seldin MF (1995) Genetic mapping of the T lymphocyte-specific transcription factor 7 gene on mouse chromosome 11. Mamm Genome 6:378鈥?80
    Koo JY, Sohn I, Kim S, Lee JW (2006) Structured polychotomous machine diagnosis of multiple cancer types using gene expression. Bioinformatics 22:950鈥?58View Article
    Lachenbruch PA, Mickey MR (1968) Estimation of error rates in discriminant analysis. Technometrics 10:1鈥?1MathSciNet View Article
    Lamnisos D, Griffin JE, Steel FJ (2009) Mark Transdimensional sampling algorithms for Bayesian variable selection in classification problems with many more variables than observations. J Comput Graph Stat 18:592鈥?12MathSciNet View Article
    Le Cao K-A, Chabrier P (2008) ofw: an R package to selection continuous variables for multi-class classification with a stochastic wrapper method. J Stat Softw 28:1鈥?6
    Lee Y, Lee CK (2003) Classification of multiple cancer types by multicategory support vector machines using gene expression data. Bioinformatics 19:1132鈥?139View Article
    Lee Y, Lin Y, Wahba G (2004) Multicategory support vector machines: theory and application to the classification of microarray data and satellite radiance data. J Am Stat Assoc 99:67鈥?1MATH MathSciNet View Article
    McLachlan GJ (1992) Discriminant analysis and statistical pattern recognition. Wiley, New YorkView Article
    Nguyen DV, Rocke DM (2002) Multi-class cancer classification via partial least squares with gene expression profiles. Bioinformatics 18:1216鈥?226View Article
    Panagiotelisa A, Smith M (2008) Bayesian identification, selection and estimation of semiparametric functions in high dimensional additive models. J Econometr 143:291鈥?16View Article
    Rocke DR, Ideker T, Troyanskaya O, Quackenbush J, Dopazo J (2009) Papers on normalization, variable selection, classification or clustering of microarray data. Bioinformatics 25:701鈥?02View Article
    Ross DT, Scherf U, Eisen MB, Perou CM, Rees C, Spellman P, Iyer V, Jeffrey SS, de Rijn MV, Waltham M, Pergamenschikov A, Lee JCF, Lashkari D, Shalon D, Myers TG, Weinstein JN, Botstein D, Brown PO (2000) Systematic variation in gene expression patterns in human cancer cell lines. Nat Genet 24:227鈥?35View Article
    Sha N, Vannucci M, Tadesse MG, Brown PJ, Dragoni I, Davies N, Roberts TC, Contestabile A, Salmon N, Buckley C, Falciani F (2004) Bayesian variable selection in multinomial probit models to identify molecular signatures of disease stage. Biometrics 60:812鈥?19MATH MathSciNet View Article
    Smith M, Kohn R (1996) Nonparametric regression via Bayesian variable selection. J Econometr 75:317鈥?43MATH View Article
    Tan AC, Naiman DQ, Xu L, Winslow RL, Geman D (2005) Simple decision rules for classifying human cancers from gene expression profiles. Bioinformatics 21:3896鈥?904View Article
    Tibshirani R, Hastie T, Narasimhan B, Chu G (2003) Class prediction by nearest shrunken centroids, with applications to DNA microarrays. Stat Sci 18:104鈥?17MATH MathSciNet View Article
    Train K (2003) Discrete choice methods with simulation. Cambridge University Press, CambridgeMATH View Article
    Troyanskaya O, Cantor M, Sherlock G, Brown P, Hastie T, Tibshirani R, Botstein D, Altman RB (2001) Missing value estimation methods for DNA microarrays. Bioinformatics 17:520鈥?25View Article
    Tusher VG, Tibshirani R, Chu G (2001) Significance analysis of microarrays applied to the ionizing radiation response. Proc Natl Acad Sci USA 98:5116鈥?121MATH View Article
    Yang AJ, Song XY (2010) Bayesian variable selection for disease classification using gene expression data. Bioinformatics 26:215鈥?22View Article
    Yeo G, Poggio T (2001) Multiclass classification of SRBCTs, DSpace@MIT. Massachusetts Institute of Technology
    Yeung KY, Bumgarner RE (2003) Multi-class classification of microarray data with repeated measurements: application to cancer. Genome Biol 4:R83View Article
    Yeung KY, Bumgarner RE, Raftery AE (2005) Bayesian model averaging: development of an improved multi-class, gene selection and classification tool for microarray data. Bioinformatics 21:2394鈥?402View Article
    Zellner A (1986) On assessing prior distributions and Bayesian regression analysis with g-prior distributions. In: Bayesian inference and decision techniques: essays in honor of Bruno de Finetti, Amsterdam, pp 233鈥?43
    Zhou X, Wang X, Dougherty ER (2006) Multi-class cancer classification using multinomial probit regression with Bayesian gene selection. IEE Proc Syst Biol 153:70鈥?8View Article
  • 作者单位:Aijun Yang (1) (2)
    Yunxian Li (3) (4)
    Niansheng Tang (4)
    Jinguan Lin (5)

    1. College of Economics and Management, Nanjing Forestry University, Nanjing, Jiangsu, China
    2. School of Economics and Management, Southeast University, Nanjing, Jiangsu, China
    3. School of Finance, Yunnan University of Economics and Finance, Kunming, Yunan, China
    4. Department of Statistics, Yunnan University, Kunming, Yunan, China
    5. Department of Mathematics, Southeast University, Nanjing, Jiangsu, China
  • 刊物类别:Mathematics and Statistics
  • 刊物主题:Mathematics
    Statistics
    Statistics
    Probability and Statistics in Computer Science
    Probability Theory and Stochastic Processes
    Economic Theory
  • 出版者:Physica Verlag, An Imprint of Springer-Verlag GmbH
  • ISSN:1613-9658
文摘
Selecting a small number of relevant genes for classification has received a great deal of attention in microarray data analysis. While the development of methods for microarray data with only two classes is relevant, developing more efficient algorithms for classification with any number of classes is important. In this paper, we propose a Bayesian stochastic search variable selection approach for multi-class classification, which can identify relevant genes by assessing sets of genes jointly. We consider a multinomial probit model with a generalized \(g\)-prior for the regression coefficients. An efficient algorithm using simulation-based MCMC methods are developed for simulating parameters from the posterior distribution. This algorithm is robust to the choice of initial value, and produces posterior probabilities of relevant genes for biological interpretation. We demonstrate the performance of the approach with two well-known gene expression profiling data: leukemia data, lymphoma data, SRBCTs data and NCI60 data. Compared with other classification approaches, our approach selects smaller numbers of relevant genes and obtains competitive classification accuracy based on obtained results.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700