Gene selection for cancer identification: a decision tree model empowered by particle swarm optimization algorithm
详细信息    查看全文
  • 作者:Kun-Huang Chen (1)
    Kung-Jeng Wang (1)
    Min-Lung Tsai (2)
    Kung-Min Wang (3)
    Angelia Melani Adrian (1)
    Wei-Chung Cheng (4) (5)
    Tzu-Sen Yang (6) (7)
    Nai-Chia Teng (8)
    Kuo-Pin Tan (9)
    Ku-Shang Chang (2)
  • 关键词:Gene expression ; Cancer ; Particle swarm optimization ; Decision tree classifier
  • 刊名:BMC Bioinformatics
  • 出版年:2014
  • 出版时间:December 2014
  • 年:2014
  • 卷:15
  • 期:1
  • 全文大小:409 KB
  • 参考文献:1. Alba E, / et al.: Gene selection in cancer classification using PSO/SVM and GA/SVM hybrid algorithms. / IEEE C Evol Computat 2007, 9:284鈥?90.
    2. Li S, Wu X, Tan M: Gene selection using hybrid particle swarm optimization and genetic algorithm. / Soft Comput 2008, 12:1039鈥?048. CrossRef
    3. Ahmad A, Dey L: A feature selection technique for classificatory analysis. / Pattern Recogn Lett 2005, 26:43鈥?6. CrossRef
    4. Su Y, Murali TM, / et al.: RankGene: identification of diagnostic genes based on expression data. / Bioinformatics 2003, 19:1578鈥?579. CrossRef
    5. Kahavi R, John GH: Wrapper for feature subset selection. / Artif Intell 1997, 97:273鈥?24. CrossRef
    6. Li X, Rao S, Wang Y, Gong B: Gene mining: a novel and powerful ensemble decision approach to hunting for disease genes using microarray expression profiling. / Nucleic Acids Res 2004, 32:2685鈥?694. CrossRef
    7. Zhao XM, Cheung YM, Huang DS: A novel approach to extracting features from motif content and protein composition for protein sequence classification. / Neural Netw 2005, 18:1019鈥?028. CrossRef
    8. Brown MP, / et al.: Knowledge-based analysis of microarray gene expression data by using support vector machines. / Proc Natl Acad Sci U S A 2000, 97:262鈥?67. CrossRef
    9. Evers L, Messow CM: Sparse kernel methods for high-dimensional survival data. / Bioinformatics 2008, 24:1632鈥?638. CrossRef
    10. Hua S, Sun Z: A novel method of protein secondary structure prediction with high segment overlap measure: support vector machine approach. / J Mol Biol 2001, 308:397鈥?07. CrossRef
    11. Oh JH, Gao J: A kernel-based approach for detecting outliers of high-dimensional biological data. / BMC Bioinforma 2009, 10:S7. CrossRef
    12. Saeys Y, / et al.: Feature selection for splice site prediction: a new method using EDA-based feature ranking. / BMC Bioinforma 2004, 5:64. CrossRef
    13. Zhu Y, Shen X, Pan W: Network-based support vector machine for classification of microarray samples. / BMC Bioinforma 2009, 10:S21. CrossRef
    14. Li L, Darden TA, / et al.: Gene assessment and sample classification for gene expression data using a genetic algorithm/k-nearest neighbor method. / Comb Chem High T Scr 2001, 4:727鈥?39.
    15. Li L, Jiang W, / et al.: A robust hybrid between genetic algorithm and support vector machine for extracting an optimal feature gene subset. / Genomics 2005, 85:16鈥?3. CrossRef
    16. Kennedy J, Eberhart R: Particle swarm optimization. / IEEE Int Conf Neural Networks - Conf Proc 1995, 4:1942鈥?948. CrossRef
    17. Robinson J, Rahmat-Samii Y: Particle swarm optimization in Electromagnetics. / IEEE Trans Antennas Propag 2004, 52:397鈥?07. CrossRef
    18. Chen LF, / et al.: Particle swarm optimization for feature selection with application in obstructive sleep apnea diagnosis. / Neural Comput Appl 2011,21(8):2087鈥?096. CrossRef
    19. Mohamad MS, / et al.: / Particle swarm optimization for gene selection in classifying cancer classes. Proceedings of the 14th International Symposium on Artificial Life and Robotics; 2009:762鈥?65.
    20. Shen Q, Shi WM, Kong W: Hybrid particle swarm optimization and tabu search approach for selecting genes for tumor classification using gene expression data. / Comput Biol Chem 2008, 32:52鈥?9. CrossRef
    21. Wu X, / et al.: Top 10 algorithms in data mining. / Knowl Inf Syst 2008, 14:1鈥?7. CrossRef
    22. Cheng WC, / et al.: Microarray meta-analysis database (M2DB): a uniformly pre-processed, quality controlled, and manually curated human clinical microarray database. / BMC Bioinforma 2010, 11:421. CrossRef
    23. GEMS Dataset 2012. http://www.gems-system.org/
    24. Dudoit S, Fridlyand J, Speed TP: Comparison of discrimination methods for the classification of tumors using gene expression data. / J Am Stat Assoc 2002, 97:77鈥?6. CrossRef
    25. Jiang P, / et al.: MiPred: classification of real and pseudo microRNA precursors using random forest prediction modelwith combined features. / Nucleic Acids Res 2007, 35:W339-W344. CrossRef
    26. Batuwita R, Palade V: MicroPred: effective classification of pre-miRNAs for human miRNA gene prediction. / Bioinformatics 2009, 25:989鈥?95. CrossRef
    27. Wang Y, / et al.: Predicting human microRNA precursors based on an optimized feature subset generated by GA-SVM. / Genomics 2011, 98:73鈥?8. CrossRef
    28. Nanni L, Brahnam S, Lumini A: Combining multiple approaches for gene microarray classification. / Bioinformatics 2008, 28:1151鈥?157. CrossRef
    29. Park I, Lee KH, Lee D: Inference of combinatorial Boolean rules of synergistic gene sets from cancer microarray datasets. / Bioinformatics 2010, 26:1506鈥?512. CrossRef
    30. Tan PN, Steinbach M, Kumar V: / Michael Steinbach, and Vipin Kumar. Introduction to Data Mining. 1st edition. Addison Wesley, Boston, MA, USA; 2005.
    31. Brazma A, Vilo J: Gene expression data analysis. / FEBS Lett 2000, 480:2鈥?6. CrossRef
    32. Golub TR, / et al.: Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. / Science 1999, 286:531鈥?27. CrossRef
    33. Zhao W, / et al.: A novel framework for gene selection. / Int J Adv Comput Technol 2011, 3:184鈥?91.
    34. TOM laboratory: / TOM laboratory. 2013. http://tom.im.ntust.edu.tw/
    35. Kennedy J, Eberhart RC, Shi Y: / Swarm Intelligence. San Francisco, CA, USA: Morgan Kaufman; 2001.
    36. Shi Y, Eberhart RC: / A Modified Particle Swarm Optimizer. Anchorage Alaska: IEEE International Conference on Evolutionary Computation; 1998:69鈥?3.
    37. Tan S: Neighbor-weighted k-nearest neighbor for unbalanced text corpus. / Expert Syst Appl 2005, 28:667鈥?71. CrossRef
    38. Stone M: Cross-validatory choice and assessment of statistica predictions. / J Royal Stat Soc 1974, 36:111鈥?47.
    39. Geisser S: The predictive sample reuse method with applications. / J Am Stat Assoc 1975, 70:320鈥?28. CrossRef
    40. Larson S: The shrinkage of the coefficient of multiple correlation. / J Educat Psychol 1931, 22:45鈥?5. CrossRef
    41. Mosteller F, Turkey JW: / Data analysis, including statistics. Handbook of Social Psychology. Reading, MA: Addison-Wesley; 1968.
    42. Mosteller F, Wallace DL: Inference in an authorship problem. / J Am Stat Assoc 1963, 58:275鈥?09.
    43. Cortes C, Vapnik V: Support-vector networks. / Mach Learn 1995, 20:273鈥?97.
    44. Kononenko I: / A counter example to the stronger version of the binary tree hypothesis. ECML-95 workshop on Statistics, machine learning, and knowledge discovery in databases; 1995:31鈥?6.
  • 作者单位:Kun-Huang Chen (1)
    Kung-Jeng Wang (1)
    Min-Lung Tsai (2)
    Kung-Min Wang (3)
    Angelia Melani Adrian (1)
    Wei-Chung Cheng (4) (5)
    Tzu-Sen Yang (6) (7)
    Nai-Chia Teng (8)
    Kuo-Pin Tan (9)
    Ku-Shang Chang (2)

    1. Department of Industrial Management, National Taiwan University of Science and Technology, Taipei, 106, Taiwan, R.O.C
    2. Department of Food Science, Yuanpei University, No. 306, Yuanpei Street, Hsinchu, 300, Taiwan, R.O.C
    3. Department of Surgery, Shin-Kong Wu Ho-Su Memorial Hospital, Taipei, Taiwan, R.O.C
    4. Pediatric Neurosurgery, Department of Surgery, Cheng Hsin General Hospital, Taipei, 11220, Taiwan, R.O.C
    5. Genomic Research Center, National Yang-Ming University, Taipei, 11221, Taiwan, R.O.C
    6. School of Dental Technology, Taipei Medical University, Taipei, 110, Taiwan, R.O.C
    7. Taiwan Research Center for Biomedical Implants and Microsurgery Devices, Taipei Medical University, Taipei, 110, Taiwan, R.O.C
    8. School of Dentistry, College of Oral Medicine, Taipei Medical University, Taipei, Taiwan, R.O.C
    9. MBA, School of Management, National Taiwan University of Science and Technology, Taipei, 106, Taiwan, R.O.C
  • ISSN:1471-2105
文摘
Background In the application of microarray data, how to select a small number of informative genes from thousands of genes that may contribute to the occurrence of cancers is an important issue. Many researchers use various computational intelligence methods to analyzed gene expression data. Results To achieve efficient gene selection from thousands of candidate genes that can contribute in identifying cancers, this study aims at developing a novel method utilizing particle swarm optimization combined with a decision tree as the classifier. This study also compares the performance of our proposed method with other well-known benchmark classification methods (support vector machine, self-organizing map, back propagation neural network, C4.5 decision tree, Naive Bayes, CART decision tree, and artificial immune recognition system) and conducts experiments on 11 gene expression cancer datasets. Conclusion Based on statistical analysis, our proposed method outperforms other popular classifiers for all test datasets, and is compatible to SVM for certain specific datasets. Further, the housekeeping genes with various expression patterns and tissue-specific genes are identified. These genes provide a high discrimination power on cancer classification.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700