TSG: a new algorithm for binary and multi-class cancer classification and informative genes selection
详细信息    查看全文
  • 作者:Haiyan Wang (1) (6)
    Hongyan Zhang (2) (3) (4)
    Zhijun Dai (2) (4)
    Ming-shun Chen (5)
    Zheming Yuan (2) (4)
  • 刊名:BMC Medical Genomics
  • 出版年:2013
  • 出版时间:January 2013
  • 年:2013
  • 卷:6
  • 期:1-supp
  • 全文大小:289KB
  • 参考文献:1. Tibshirani R, Hastie T, Narasimhan B, Chu G: Diagnosis of multiple cancer types by shrunken centroids of gene expression. / Proc Natl Acad Sci USA 2002, 99:6567鈥?572. CrossRef
    2. Geman D, d'Avignon C, Naiman D, Winslow R: Classifying Gene Expression Profiles from Pairwise mRNA Comparisons. / Statistical Applications in Genetics and Molecular Biology 2004. doi: 10.2202/1544鈥?115.1071
    3. Tan AC, Naiman DQ, Xu L, Winslow RL, Geman D: Simple decision rules for classifying human cancers from gene expression profiles. / Bioinformatics 2005, 21:3896鈥?904. CrossRef
    4. Chang CC, Lin CJ: LIBSVM: a library for support vector machines. / ACM Transactions on Intelligent, Systems and Technology 2011,2(27):1鈥?7. CrossRef
    5. Li T, Zhang C, Ogihara M: A comparative study of feature selection and multiclass classification methods for tissue classification based on gene expression. / Bioinformatics 2004, 20:2429鈥?437. CrossRef
    6. Yang K, Cai Z, Li J, Lin G: A stable gene selection in microarray data analysis. / BMC Bioinformatics 2006, 7:228. CrossRef
    7. Ding C, Peng H: Minimum redundancy feature selection from microarray gene expression data. / J Bioinform Comput Biol 2005,3(2):185鈥?05. CrossRef
    8. Ooi CH, Chetty M, Teng SW: Differential prioritization between relevance and redundancy in correlation-based feature selection techniques for multiclass gene expression data. / BMC Bioinformatics 2006, 7:320. CrossRef
    9. Zhang JG, Deng HW: Gene selection for classification of microarray data based on the bayes error. / BMC Bioinformatics 2007,8(1):37. CrossRef
    10. Wei X, Li K: Exploring the within and between class correlation distributions for tumor classification. / Proc Natl Acad Sci USA 2010,107(15):6737鈥?742. CrossRef
    11. Liu Q, Sung A, Chen Z, Chen L, Liu J, Qiao M, Wang Z, Huang X, Deng Y: Gene selection and classification for cancer microarray data based on machine learning and similarity measures. / BMC Genomics 2011,12(Suppl 5):S1. doi:10.1186/1471鈥?164鈥?2-S5-S1 CrossRef
    12. Guyon I, Weston J, Barnhill S, Vapnik V: Gene selection for cancer classification using support vector machines. / Machine learning 2002, 46:389鈥?22. CrossRef
    13. Rakotomamonjy A: Variable selection using svm based criteria. / J Mach Learn Res 2003, 3:1357鈥?370.
    14. D委az-Uriarte R, Alvarez de Andr茅s S: Gene selection and classification of microarray data using random forest. / BMC Bioinformatics 2006, 7:3. doi:10.1186/1471鈥?105鈥?-3 CrossRef
    15. Ho TK: The random subspace method for constructing decision forests. / IEEE Transactions on Pattern Analysis and Machine Intelligence 1998,20(8):832鈥?44. CrossRef
    16. Li X, Zhao H: Weighted random subspace method for high dimensional data classification. / Statistics and its Interface 2009, 2:153鈥?59.
    17. Alizadeh AA, Eisen MB, Davis RE, Ma C, Lossos IS, Rosenwald A, Boldrick JC, Sabet H, Tran T, Yu X, Powell JI, Yang L, Marti GE, Moore T, Hudson J Jr, Lu L, Lewis DB, Tibshirani R, Sherlock G, Chan WC, Greiner TC, Weisenburger DD, Armitage JO, Warnke R, Levy R, Wilson W, Grever MR, Byrd JC, Botstein D, Brown PO, Staudt LM: Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. / Nature 2000, 403:503鈥?11. CrossRef
    18. Alon U, Barkai N, Notterman DA, Gish K, Ybarra S, Mack D, Levine AJ: Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. / Proc Natl Acad Sci USA 1999, 96:6745鈥?750. CrossRef
    19. Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP, Coller H, Loh ML, Downing JR, Caligiuri MA, Bloomfield CD, Lander ES: Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. / Science 1999,286(5439):531鈥?37. CrossRef
    20. Pomeroy SL, Tamayo P, Gaasenbeek M, Sturla LM, Angelo M, McLaughlin ME, Kim JY, Goumnerova LC, Black PM, Lau C, Allen JC, Zagzag D, Olson JM, Curran T, Wetmore C, Biegel JA, Poggio T, Mukherjee S, Rifkin R, Califano A, Stolovitzky G, Louis DN, Mesirov JP, Lander ES, Golub TR: Prediction of central nervous system embryonal tumour outcome based on gene expression. / Nature 2002, 415:436鈥?42. CrossRef
    21. Shipp MA, Ross KN, Tamayo P, Weng AP, Kutok JL, Aguiar RC, Gaasenbeek M, Angelo M, Reich M, Pinkus GS, Ray TS, Koval MA, Last KW, Norton A, Lister TA, Mesirov J, Neuberg DS, Lander ES, Aster JC, Golub TR: Diffuse large B-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning. / Nat Med 2002, 8:68鈥?4. CrossRef
    22. Gordon GJ, Jensen RV, Hsiao LL, Gullans SR, Blumenstock JE, Ramaswami S, Richards WG, Sugarbaker DJ, Bueno R: Translation of microarray data into clinically relevant cancer diagnostic tests using gene expression ratios in lung cancer and mesothelioma. / Cancer Res 2002, 62:4963鈥?967.
    23. Singh D, Febbo PG, Ross K, Jackson DG, Manola J, Ladd C, Tamayo P, Renshaw AA, D'Amico AV, Richie JP, Lander ES, Loda M, Kantoff PW, Golub TR, Sellers WR: Gene expression correlates of clinical prostate cancer behaviour. / Cancer Cell 2002, 1:203鈥?09. CrossRef
    24. Stuart RO, Wachsman W, Berry CC, Wang-Rodriguez J, Wasserman L, Klacansky I, Masys D, Arden K, Goodison S, McClelland M, Wang Y, Sawyers A, Kalcheva I, Tarin D, Mercola D: In silico dissection of cell-type-associated patterns of gene expression in prostate cancer. / Proc Natl Acad Sci USA 2004, 101:615鈥?20. CrossRef
    25. Welsh JB, Sapinoso LM, Su AI, Kern SG, Wang-Rodriguez J, Moskaluk CA, Frierson HF Jr, Hampton GM: Analysis of Gene Expression Identifies Candidate Markers and Pharmacological Targets in Prostate Cancer. / Cancer Res 2001, 61:5974鈥?978.
    26. Ramaswamy S, Tamayo P, Rifkin R, Mukherjee S, Yeang CH, Angelo M, Ladd C, Reich M, Latulippe E, Mesirov JP, Poggio T, Gerald W, Loda M, Lander ES, Golub TR: Multiclass cancer diagnosis using tumor gene expression signatures. / Proc Natl Acad Sci USA 2001, 98:15149鈥?5154. CrossRef
    27. Beer DG, Kardia SL, Huang CC, Giordano TJ, Levin AM, Misek DE, Lin L, Chen G, Gharib TG, Thomas DG, Lizyness ML, Kuick R, Hayasaka S, Taylor JM, Iannettoni MD, Orringer MB, Hanash S: Gene-expression profiles predict survival of patients with lung adenocarcinoma. / Nat Med 2002, 8:816鈥?24.
    28. Armstrong SA, Staunton JE, Silverman LB, Pieters R, den Boer ML, Minden MD, Sallan SE, Lander ES, Golub TR, Korsmeyer SJ: MLL translocations specify a distinct gene expression profile that distinguishes a unique leukemia. / Nat Genet 2002, 30:41鈥?7. CrossRef
    29. Khan J, Wei JS, Ringn茅r M, Saal LH, Ladanyi M, Westermann F, Berthold F, Schwab M, Antonescu CR, Peterson C, Meltzer PS: Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. / Nat Med 2001, 7:673鈥?79. CrossRef
    30. Perou CM, S酶rlie T, Eisen MB, van de Rijn M, Jeffrey SS, Rees CA, Pollack JR, Ross DT, Johnsen H, Akslen LA, Fluge O, Pergamenschikov A, Williams C, Zhu SX, L酶nning PE, B酶rresen-Dale AL, Brown PO, Botstein D: Molecular portraits of human breast tumours. / Nature 2000, 406:747鈥?52. CrossRef
    31. Bhattacharjee A, Richards WG, Staunton J, Li C, Monti S, Vasa P, Ladd C, Beheshti J, Bueno R, Gillette M, Loda M, Weber G, Mark EJ, Lander ES, Wong W, Johnson BE, Golub TR, Sugarbaker DJ, Meyerson M: Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma subclasses. / Proc Natl Acad Sci USA 2001, 98:13790鈥?3795. CrossRef
    32. Yeoh EJ, Ross ME, Shurtleff SA, Williams WK, Patel D, Mahfouz R, Behm FG, Raimondi SC, Relling MV, Patel A, Cheng C, Campana D, Wilkins D, Zhou X, Li J, Liu H, Pui CH, Evans WE, Naeve C, Wong L, Downing JR: Classification, subtype discovery, and prediction of outcome in pediatric acute lymphoblastic leukemia by gene expression profiling. / Cancer Cell 2002, 1:133鈥?43. CrossRef
    33. Su AI, Welsh JB, Sapinoso LM, Kern SG, Dimitrov P, Lapp H, Schultz PG, Powell SM, Moskaluk CA, Frierson HF Jr, Hampton GM: Molecular classification of human carcinomas by use of gene expression signatures. / Cancer Res 2001, 61:7388鈥?393.
  • 作者单位:Haiyan Wang (1) (6)
    Hongyan Zhang (2) (3) (4)
    Zhijun Dai (2) (4)
    Ming-shun Chen (5)
    Zheming Yuan (2) (4)

    1. Department of Statistics, Kansas State University, Manhattan, KS, 66506, USA
    6. this work was done while Haiyan Wang was on sabbatical leave at Hunan Provincial Key Laboratory of Crop Germplasm Innovation and Utilization, Changsha, 410128, China
    2. Hunan Provincial Key Laboratory of Crop Germplasm Innovation and Utilization, Changsha, 410128, China
    3. College of Information Science and Technology, Hunan Agricultural University, Changsha, 410128, China
    4. College of Bio-safety Science and Technology, Hunan Agricultural University, Changsha, 410128, China
    5. USDA-ARS and Department of Entomology, Kansas State University, Manhattan, KS, 66506, USA
  • ISSN:1755-8794
文摘
Background One of the challenges in classification of cancer tissue samples based on gene expression data is to establish an effective method that can select a parsimonious set of informative genes. The Top Scoring Pair (TSP), k-Top Scoring Pairs (k-TSP), Support Vector Machines (SVM), and prediction analysis of microarrays (PAM) are four popular classifiers that have comparable performance on multiple cancer datasets. SVM and PAM tend to use a large number of genes and TSP, k-TSP always use even number of genes. In addition, the selection of distinct gene pairs in k-TSP simply combined the pairs of top ranking genes without considering the fact that the gene set with best discrimination power may not be the combined pairs. The k-TSP algorithm also needs the user to specify an upper bound for the number of gene pairs. Here we introduce a computational algorithm to address the problems. The algorithm is named Chisquare-statistic-based Top Scoring Genes (Chi-TSG) classifier simplified as TSG. Results The TSG classifier starts with the top two genes and sequentially adds additional gene into the candidate gene set to perform informative gene selection. The algorithm automatically reports the total number of informative genes selected with cross validation. We provide the algorithm for both binary and multi-class cancer classification. The algorithm was applied to 9 binary and 10 multi-class gene expression datasets involving human cancers. The TSG classifier outperforms TSP family classifiers by a big margin in most of the 19 datasets. In addition to improved accuracy, our classifier shares all the advantages of the TSP family classifiers including easy interpretation, invariant to monotone transformation, often selects a small number of informative genes allowing follow-up studies, resistant to sampling variations due to within sample operations. Conclusions Redefining the scores for gene set and the classification rules in TSP family classifiers by incorporating the sample size information can lead to better selection of informative genes and classification accuracy. The resulting TSG classifier offers a useful tool for cancer classification based on numerical molecular data.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700