用户名: 密码: 验证码:
基于DNA微阵列数据的癌症分类技术研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
随着人类基因组计划的基本完成,生命科学的进程进入了后基因组时代,在后基因组时代,研究的重点从对单个基因的研究上升到了对整个基因组功能和动态变化规律的研究,从而产生了对海量生物信息进行处理的需求。DNA微阵列(也称基因芯片)技术的出现是后基因组时代的一个主要标志,同时也是目前生物信息学研究的主要领域之一。通过此技术,可以同时检测成千上万个基因在生物体内的活性,为从分子层次上对疾病,尤其是癌症进行诊断、分型、致病机理的研究以及药物的快速开发提供了极大的便利。但是由于实验成本的限制,使得微阵列数据集通常包含的样本数较少,因此造成了其高维小样本的特点。如何从这些高维小样本数据中挖掘有用的生物学信息并使用这些信息对癌症的检测与分型提供有效的指导,便成为了机器学习与模式识别领域研究的当务之急。本文主要围绕癌症微阵列数据分类问题开展研究,具体研究成果主要包括:
     (1)缠绕型特征基因选择方法通常具有以下两个缺点:收敛速度过慢和易陷入局部最优。故此提出了两种基于群集智能的特征基因选择方法:基于蚁群的特征基因选择方法和基于改进的离散粒子群的特征基因选择方法。前者实现简单,且可以快速的获取一个较优解,有效地解决了现有方法收敛速度过慢的问题。而后者则通过增加一条简单的规则使算法可以巧妙地避开局部最优解,具有更强的寻优能力。
     (2)针对现有的选择性集成分类方法通常具有较高时间复杂度的问题,提出了一种基于相关分析的集成分类方法,其通过将差异的选择从分类器层转换到训练子集层这一巧妙的策略有效地降低了计算的复杂度,同时可以保持分类的准确率并节省存储的开销,具有较强的实用性。
     (3)提出了一种基于可信分析的多类微阵列数据分类方法。该方法的思想是首先使用“一对多”支持向量机对样本进行分类,然后评估分类结果的可信性,对可信度低的样本采用一种称为基于质心距离的类别优先级评估方法进行评判。该方法的优势在于提高了分类的精度,且并未在计算复杂度方面有显著地增加。
     (4)考虑微阵列数据集小样本的特性,提出了一种基于无标签样本的癌症增量诊断方法。该方法的思想是首先使用现有的有标签样本训练一个诊断系统,使其在实际的临床诊断中为测试样本作出判别,并对判别结果的可信度作出定量的评估。然后根据可信度的高低来决定是否需要人类医学专家的辅助判别。最后将新标记的样本加入到有标签样本集中并更新系统。该方法在保证诊断精度的同时,兼顾了系统的利用率,同时可使诊断系统的性能得到增量的提高。与传统方法相比,其在实际临床诊断中具有更强的实用性。
With the near completion of Human Genome Project, life science has entered into the Post-Genome Era. In this era, the research mainly focuses on the functions and dynamics of the whole genome but not individual gene. This has given rise to a demand on the processing capability of a large quantity of biology information. DNA microarray (i.e. gene chip) technology is one of major marks of Post-Genome Era and primary research fields in Bioinformatics. By this technology, the expression level of tens of thousands genes may be detected simultaneously. It has been widely applied to diagnose disease especially for cancer at molecular level, recognize subtypes, make clear the principle of a specific disease and develop new medicines rapidly. However, owing to expensive experimental cost, only a few samples are embedded in microarray dataset which leads to high dimension and small samples. Therefore, how to mining useful information and taking advantage of them to guide cancer classification and subtype recognition have been emphasized in machine learning and pattern recognition. This paper mainly research some related aspects of cancer classification based on microarray data, detailed work are listed as below:
     (1) Wrapper feature gene selection methods generally hold two drawbacks: slow convergence and local optimum. Therefore, two feature gene selection methods based on swarm intelligence are proposed: feature gene selection method based on ant colony optimization and feature gene selection method based on improved discrete particle swarm optimization. The former implements easily and can acquire an excellent solution rapidly which solve the problem of slow convergence effectively. While the latter may avoid local optimum by adding an easy rule, so that new optimum solutions are constantly found.
     (2) Generally, selective ensemble classification method has high time complexity. Therefore, an ensemble classification method based on correlation analysis is presented in this paper. It decreases computation complexity by extracting diverse classifiers at training subset level but not classifier level. Meanwhile, the proposed approach may keep classification accuracy and save storage cost, which enhances the method usability.
     (3) A multiclass microarray data classification approach is developed in this paper. Firstly, one-versus-rest support vector machine is used to classify for testing samples. Then the confidences of the classification results are evaluated and some samples with low confidence are extracted. At last, the extracted samples are estimated by a novel strategy named as class priority estimation method based centroid distance. The proposed method improves recognition rate and meanwhile the computation complexity hasn’t obvious increase.
     (4) Considering small sample size of microarray data, an incremental cancer diagnostic method based on unlabeled samples is proposed in this paper. At first, an initial diagnostic system is trained with a few exsiting labeled samples and it will provide diagnosis for testing samples in clinical, the confidences of diagnostic results will be estimated quantificationally, too. Then the samples are decided whether to be returned to human medical experts for diagnosing with other detection methods or not according to the confidences. At last, the new labeled samples will be added into labeled samples set to update the system. The proposed method simultaneously guarantees diagnostic accuracy and utilization of the system. Meanwhile, it is permitted to improve the performance of itself incrementally. Compared with traditional approaches, the proposed method is more practical in clinical.
引文
[1] Southern E M. DNA chips: analyzing sequence by hybridization to oligonucleotides on a large scale. Trends in Genetics, 1996, 12(3):110-115.
    [2] Hacia J H, Brody L C, Chee M S, et al. Detection of heterozygous mutations in BRCA1 using high density oligonucleotide arrays and two-colour fluorescence analysis. Nature genetics, 1996, 14(4):441-447.
    [3] Wang D G, et al. Large-Scale Identification, Mapping and Genotyping of Single-Nucleotide Polymorphisms in the Human Genome. Science, 1998, 280(5366):1077-1082.
    [4] Golub T R, Slonim D K, Tamayo P, et al. Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science, 1999, 286(5439):531-537.
    [5] Alon U, Barkai N, Notterman D A, et al. Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by Oligonucleotide array. Proceedings of the National Academy of Sciences, l999, 96(12):6745-6750.
    [6] Alizadeh A A, Eisen M B, Davis R E, et al. Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling, Nature, 2000, 403(6769):503-511.
    [7] Pomeroy S L, Tamayo P, Gaasenbeek M, et al. Prediction of central nervous system embryonal tumour outcome based on gene expression. Nature, 2002, 415(6870):436-442.
    [8] Shipp M A, Ross K N, Tamayo P, et al. Diffuse large B-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning. Nature Medicine, 2002, 8(1):68-74.
    [9] Soinov L A, Krestyaninova M A, Brazma A. Towards reconstruction ofgene networks from expression data by supervised learning. Genome Biology, 2003, 4(1):R6.
    [10] Pe’er D, Regev A, Elidan G, et al. Inferring subnetworks from perturbed expression profiles. Bioinformatics, 2001, 17(Suppl 1): S215-S224.
    [11] Evans W E, Guy R K. Gene expression as a drug discovery tool. Nature Genetics, 2004, 36(3):214-215.
    [12]李瑶.基因芯片数据分析与处理.北京:化学工业出版社,2006.
    [13] Schena M, et al. Quantitative Montitoring of Gene Expression Patterns with a Complementary DNA Microarray. Science, 1995, 270:467-470.
    [14] Derisi J L, Iyer V R, Brosn P O. Exploring the metabolic and genetic control of gene expression on a genomic scale. Science, 1997, 278(5338): 680-686.
    [15] Zhang C L, Lu X S, Zhang X G. Significance of Gene Ranking for Classification of Microarray Samples. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2006, 3(3):312-320.
    [16] Zhang X G, Lu X, Shi Q, et al. Recursive SVM feature selection and sample classification for mass-spectrometry and microarray data. BMC Bioinformatics, 2006, 7:197.
    [17] Liu K H, Huang D S. Cancer classification using rotation forest. Computers in Biology and Medicine, 2008, 38(5):601-610.
    [18] Huang D S, Zheng C H. Independent component analysis based penalized discriminant method for tumor classification using gene expression data. Bioinformatics, 2006, 22(15):1855-1862.
    [19]王树林,王戟,陈火旺,等.肿瘤信息基因启发式宽度优先搜索算法研究.计算机学报,2008,31(4):636-649.
    [20]李颖新,阮晓钢.基于支持向量机的肿瘤分类特征基因选取.计算机研究与发展, 2005, 42(10): 1796-1801.
    [21]李颖新,李建更,阮晓钢.肿瘤基因表达谱分类特征基因选取问题及分析方法研究.计算机学报, 2006, 29(2): 324-330.
    [22]李建中,杨昆,高宏,等.考虑样本不平衡的模型无关的基因选择方法.软件学报, 2006, 17(7): 1485-1493.
    [23] Li X, Rao S Q, Wang Y D, et al. Gene mining: a novel and powerful ensemble decision approach to hunting for disease genes using microarray expression profiling. Nucleic Acids Research, 2006, 32(9):2685-2694.
    [24] Jornsten R, Wang H Y, Welsh W J, et al. DNA microarray data imputation and significance analysis of differential expression. Bioinformatics, 2005, 21(22): 4155-4161.
    [25] Scheel I, Aldrin M, Glad I, et al. The influence of missing value imputation on detection of differentially expressed genes from microarray data. Bioinformatics, 2005, 21(23):4272-4279.
    [26] Brevern A G, Hazout S, Malpertuy A. Influence of microarrays experiments missing values on the stability of gene groups by hierarchical clustering. BMC Bioinformatics, 2004, 5:114.
    [27] Troyanskaya O, Cantor M, Sherlock G, et al. Missing value estimation methods for DNA microarrays. Bioinformatics, 2001, 17(6):520-525.
    [28] Oba S, Sato M, Takemasa I, et al. A Bayesian missing value estimation method for gene expression profile data. Bioinformatics, 2003, 19(16): 2088-2096.
    [29] Kim H, Golub G H, Park H. Missing value estimation for DNA microarray gene expression data: local least squares imputation. Bioinformatics, 2005, 21(2):187-198.
    [30] Wang X, Li A, Jiang Z H, Feng H Q. Missing value estimation for DNA microarray gene expression data by Support Vector Regression imputation and orthogonal coding scheme. BMC Bioinformatics, 2006, 7:32.
    [31] Tuikkala J, Elo L L, Nevalainen O S, Aittokallio T. Improving missing value estimation in microarray data with gene ontology. Bioinformatics,2006, 22(5):566-572.
    [32] Hu J J, Li H F, Waterman M S, et al. Integrative missing value estimation for microarray data. BMC Bioinformatics, 2006, 7:449.
    [33] Tuikkala J, Elo L L, Nevalainen O S, et al. Missing value imputation improves clustering and interpretation of gene expression microarray data. BMC Bioinformatics, 2008, 9:202.
    [34] Brock G N, Shaffer J R, Blakesley R E, et al. Which missing value imputation method to use in expression profiles: a comparative study and two selection schemes. BMC Bioinformatics, 2008, 9:12.
    [35] Dougherty E R. Small Sample Issues for Microarray Based Classification. Comparative and Functional Genomics, 2001, 2(1):28-34.
    [36] Inza I, Larranaga P, Blanc R, et al. Filter versus wrapper gene selection approaches in DNA microarray domains. Artificial Intelligence in Medicine, 2004, 31(2):91-103.
    [37] Wang Y H, Makedon F S, Ford J C, et al. HykGene: a hybrid approach for selecting marker genes for phenotype classification using microarray gene expression data. Bioinformatics, 2005, 21(8):1530-1537.
    [38] Tan F, Fu X Z, Wang H, et al. A Hybrid Feature Selection Approach for Microarray Gene Expression Data. Proc. of Sixth International Conference on Computational Science, LNCS3992. Springer, 2006, 678-685.
    [39] Shen Q, Shi W M, Kong W. Hybrid particle swarm optimization and tabu search approach for selecting genes for tumor classification using gene expression data. Computational Biology and Chemistry, 2008, 32(1): 53-60.
    [40] Li G Z, Bu H L, Yang M Q, et al. Selecting subsets of newly extracted features from PCA and PLS in microarray data analysis. BMC Genomics, 2008, 9(Suppl 2):S24.
    [41] Liu Y H. Prominent feature selection of microarray data. Progress inNatural Science, 2009, 19(10):1365-1371.
    [42] Horng J T, Wu L C, Liu B J, et al. An expert system to classify microarray gene expression data using gene selection by decision tree. Expert System with Applications, 2009, 36(5):9072-9081.
    [43] Li L, Weinberg C R, Darden T A, et al. Gene selection for sample classification based on gene expression data: study of sensitivity to choice of parameters of the GA/KNN method. Bioinformatics, 2001, 17(12):1131-1142.
    [44] Cho S B, Won H H. Cancer classification using ensemble of neural networks with multiple significant gene subsets. Applied Intelligence, 2007, 26(3):243-250.
    [45] Chen Y H, Zhao Y O. A novel ensemble of classifiers for microarray data classification. Applied Soft Computing, 2008, 8(4):1664-1669.
    [46] Asyali M H. Gene expression profile class predication using linear Bayesian classifiers. Computers in Biology and Medicine, 2007, 37(12): 1690-1699.
    [47] Furey T S, Cristianini N, Duffy N. Support vector machine classification and validation of cancer tissue samples using microarray expression data. Bioinformatics, 2000, 16(10):906-914.
    [48] Lee J W, Lee J B, Park M, et al. An extensive comparison of recent classification tools applied to microarray data. Computational Statistics & Data Analysis, 2005, 48(4):869-885.
    [49] Pirooznia M, Yang J Y, Yang M Q, et al. A comparative study of different machine learning methods on microarray gene expression data. BMC Genomics, 2008, 9(Suppl 1):S13.
    [50] Braga-Neto U M, Dougherty E R. Is cross-validation valid for small-sample microarray classification? Bioinformatics, 2004, 20(3): 374-380.
    [51] Fu W J, Carroll R J, Wang S. Estimating misclassification error with small samples via bootstrap cross-validation. Bioinformatics, 2005, 21(9): 1979-1986.
    [52] Varma S, Simon R. Bias in error estimation when using cross-validation for model selection. BMC Bioinformatics, 2006, 7:91.
    [53] Wood I A, Visscher P M, Mengersen K L. Classification based upon gene expression data: bias and precision of error rates. Bioinformatics, 2007, 23(11):1363-1370.
    [54] Dougherty E R, Hua J P, Bittner M L. Validation of Computational Methods in Genomics. Current Genomics, 2007, 8(1):1-19.
    [55] Baldi P, Long A D. A Bayesian framework for the analysis of microarray expression data: Regularized t-test and statistical inferences of gene changes. Bioinformatics, 2001, 17(16):509-519.
    [56] Dudoit S, Fridlyand J, Speed T P. Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data. Journal of American Statistical Association, 2002, 97(457):77-87.
    [57] Cho S B, Won H H. Machine learning in DNA microarray analysis for cancer classification. Proc. of First Asia-Pacific bioinformatics conference. 2003:189-198.
    [58] Chen Y H, Zhao Y O. A novel ensemble of classifiers for microarray data classification. Applied Soft Computing, 2008, 8(4):1664-1669.
    [59]邓林,马尽文,裴健.秩和基因选取方法及其在肿瘤诊断中的应用.科学通报, 2004, 13(7):1311-1316.
    [60] Chuang L Y, Ke C H, Chang H W, Yang C H. A Two-Stage Feature Selection Method for Gene Expression Data. OMICS, 2009, 13(2): 127-137.
    [61] Liu H Q, Li J Y, Wong L. A Comparative Study on Feature Selection and Classification Methods Using Gene Expression Profiles and ProteomicPatterns. Genome Informatics, 2002, 13:51-60.
    [62] Xing E P, Jordan M I, Karp R M. Feature selection for high-dimensional genomic microarray data. Proc. of 18th International Conf on Machine Learning. San Francisco, CA: Morgan Kaufmann, 2001: 601-608.
    [63] Tibshiranit R, Hastie T, Narasimhan B, Chu G. Diagnosis of multiple cancer types by shrunken centroids of gene expression. Proceedings of the National Academy of Sciences, 2002, 99(10):6567-6572.
    [64] Shen Q, Shi W M, Kong W. New gene selection method for multiclass tumor classification by class centroid. Journal of Biomedical Informatics, 2009, 42(1):59-65.
    [65] Czekaj T, Wu W, Walczak B. Classification of genomic data: Some aspects of feature selection. Talanta, 2008, 76(3):564-574.
    [66] Jaeger J, Sengupta R, Ruzzo W L. Improved gene selection for classification of microarrays. Proc of Pacific symp on Biocomputing. Singapore: World Scientific Publishing Company, 2003: 53-64.
    [67] Au W H, Chan K C C, Wong A K C, et al. Attribute Clustering for Grouping, Selection, and Classification of Gene Expression Data. IEEE/ACM transactions on computational biology and bioinformatics, 2005, 2(2):83-101.
    [68] Hanczar B, Courtine M, Benis A, et al. Improving Classification of Microarray Data using Prototype-based Feature Selection. SIGKDD Explorations, 2003, 5(2):23-30.
    [69] Hong J H, Cho S B. Efficient huge-scale feature selection with speciated genetic algorithm. Pattern Recognition Letters, 2006, 27(2):143-150.
    [70] Ooi C H, Tan P. Genetic algorithms applied to multi-class prediction for analysis of gene expression data. Bioinformatics, 2003, 19(1):37-44.
    [71] Chen X W. Margin-based wrapper methods for gene identification using microarray. Neurocomputing, 2006, 69(16-18):2236-2243.
    [72] Chuang L Y, Chang H W, Tu C J, Yang C H. Improved binary PSO for feature selection using gene expression data. Computational Biology and Chemistry, 2008, 32(1):29-38.
    [73] Guyon I, Weston J, Barnhill S, et al. Gene Selection for Cancer Classification using Support Vector Machines. Machine Learning, 2002, 46(1-3):389-422.
    [74]马良,朱刚,宁爱兵.蚁群优化算法.北京:科学出版社,2008.
    [75] Colorni A, Dorigo M, Maniezzo V. Distributed optimization by ant colonies. Proceedings of 1st European Conference on Artificial Life, Paris, France: Elsevier Publishing, 1991:134-142.
    [76] Parpinelli R S, et al. Data mining with an ant colony optimization algorithm. IEEE Transactions on Evolutionary Computation, 2002, 6(4):321-332.
    [77] Caro G D, Dorigo M. AntNet: distributed stigmergetic control for communications networks. Journal of Artificial Intelligence Research, 1998, 9:317-365.
    [78] Shmygelska A, Hoos H H. An ant colony optimization algorithm for the 2D and 3D hydrophobic polar protein folding problem. BMC Bioinformatics, 2005, 6:30.
    [79] Stutzle T, Hoos H H. MAX MIN Ant system. Journal of Future Generation Computer Systems, 2000, 16(8):889-914.
    [80] Kennedy J, Eberhart R C. Particle swarm optimization. Proceedings of IEEE International Conference on Neural Networks, Piscataway NJ: IEEE press, 1995:1942-1948.
    [81] Kennedy J, Eberhart R C. A discrete binary version of the particle swarm algorithm. IEEE Conference on Systems, Man, and Cybernetics, Piscataway NJ: IEEE press, 1997:4104-4109.
    [82] Shen Q, Shi W H, Kong W, et al. A combination of modified particleswarm optimization algorithm and support vector machine for gene selection and tumor classification. Talanta, 2007, 71(4): 1679-1683.
    [83] Vapnik V. Statistical Learning Theory. New York: Wiley, 1998.
    [84] Singh D, Febbo P, Ross K, et al. Gene expression correlates of clinical prostate cancer behavior. Cancer Cell, 2002, 1(2): 203-209.
    [85] Gunn S R. Support Vector Machines for Classification and Regression. http://users.ecs.soton.ac.uk/srg/publi catio-ns/pdf/SVM.pdf.
    [86] Peng S H, Xu Q H, Feng X, et al. Molecular Classification of Cancer Types from Microarray Data Using the Combination of Genetic Algorithms and Support Vector Machines. FEBS Letters, 2003, 555(2): 358-362.
    [87] Ma S, Huang J. Regularized ROC method for disease classification and biomarker selection with microarray data. Bioinformatics, 2005, 21(24): 4356-4362.
    [88] Dietterich T G. Machine learning research: four current directions. AI Magazine, 1997, 18(4): 97-136.
    [89]周志华,陈世福.神经网络集成.计算机学报, 2002, 25(1): 1-8.
    [90]王珏,周志华,周傲英.机器学习及应用.北京:清华大学出版社,2006.
    [91] Sebestyen G S. Decision-making processes in pattern recognition. New York: Macmillan, 1962.
    [92] Hansen L K, Salamon P. Neural network ensembles. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1990, 12(10):993-1001.
    [93] Schapire R E. The strength of weak learnability. Machine learning, 1990, 5(2):197-227.
    [94] Dietterich T G. Ensemble Methods in Machine Learning. Proceedings of the First International Workshop on Multiple Classifier Systems, Cagliari, Italy, LNCS1857, 2000:1-15.
    [95]梁英毅.集成学习综述. http://soft.cs.tsinghua.edu.cn/~keltin/docs/ensem-ble.pdf.
    [96] Krogh A, Vedelsby J. Neural network ensembles, cross validation, and active learning. Advances in Neural Information Processing Systems, 1995, 7:231-238.
    [97] Peng Y H. A novel ensemble machine learning for robust microarray data classification. Computers in Biology and Medicine, 2006, 36(6):553-573.
    [98] Breiman L. Bagging predictors. Machine Learning, 1996, 24(2):123-140.
    [99] Ho T. The random subspace method for constructing decision forests. IEEE Trans. Pattern Anal. Mach. Intel., 1998, 20(8): 832-844.
    [100] Freund Y, Schapire R E. Experiments with a new boosting algorithm. Proc. 13rd Int. Conf. Machine Learning, Bari, Italy, 1996:148-156.
    [101] Breiman L. Random Forest. Machine Learning, 2001, 45(1):5-32.
    [102] Zhou Z H, Wu J X, Tang W. Ensembling neural networks: Many could be better than all. Artificial Intelligence, 2002, 137(1):239-263.
    [103] Dettling M. Bagboosting for tumor classification with gene expression data, Bioinformatics, 2004, 20(18):3583-3593.
    [104] Bertoni A, Folgieri R, Valentini G. Bio-molecular cancer prediction with random subspace ensembles of support vector machines, Neurocomputing, 2005, 63:535-539.
    [105] Hu H, Li J Y, Wang H, et al. A Maximally Diversified Multiple Decision Tree Algorithm for Microarray Data Classification. Proc of 2006 Workshop on Intelligent Systems for Bioinformatics. Hobart, Australia: CRPIT, 2007:35-38.
    [106] Diaz-Uriarte R, Alvarez de Andres S. Gene selection and classification of microarray data using random forest. BMC Bioinformatics, 2006, 7:3.
    [107] Statnikov A, Wang L, Aliferis C F. A comprehensive comparison of random forests and support vector machines for microarray-based cancer classification. BMC Bioinformatics, 2008, 9:319.
    [108] Kim K J, Cho S B. An Evolutionary Algorithm Approach to Optimal Ensemble Classifiers for DNA Microarray Data Analysis. IEEE Trans on Evolutionary Computation, 2008, 12(3):377-388.
    [109] Hsu C, Lin C. A comparison of methods for multiclass support vector machines. IEEE Trans. Neural Networks, 2002, 13(2):415-425.
    [110]唐发明,王仲东,陈绵云.支持向量机多类分类算法研究.控制与决策, 2005, 20(7):746-754.
    [111] Yeang C H, Ramaswamy S, Tamayo P, et al. Molecular classification of multiple tumor types. Bioinformatics, 2001, 17(1):316-322.
    [112] Lee Y, Lee C. Classification of multiple cancer types by multicategory support vector machines using gene expression data. Bioinformatics, 2003, 19(9):1132-1139.
    [113] Tan Y X, Shi L M, Tong W D, et al. Multi-class tumor classification by discriminant partial least squares using microarray gene expression data and assessment of classification models. Computational Biology and Chemistry, 2004, 28(3):235-243.
    [114] Berrar D, Bradbury I, Dubitzky W. Instance-based concept learning from multiclass DNA microarray data. BMC Bioinformatics, 2006, 7:73.
    [115] Shen L, Tan E C. Reducing multiclass cancer classification to binary by output coding and SVM. Computational Biology and Chemistry, 2005, 30(1): 63-71.
    [116] Li T, Zhang C L, Ogihara M. A comparative study of feature selection and multiclass classification methods for tissue classification based on gene expression. Bioinformatics, 2004, 20(15):2429-2437.
    [117] Statnikov A, Aliferis C F, Tsamardinos I, et al. A comprehensive evaluation of multicategory classification methods for microarray gene expression cancer diagnosis. Bioinformatics, 2005, 21(5):631-643.
    [118] Su A I, Welsh J B, Sapinoso L M, et al. Molecular classification of humancarcinomas by use of gene expression signatures. Cancer Research, 2001, 61(20):7388-7393.
    [119] Staunton J E, Slonim D K, Coller H A, et al. Chemosensitivity prediction by transcriptional profiling. Proceedings of the National Academy of Sciences, 2001, 98(19):10787-10792.
    [120] Khan J, Wei J S, Ringner M, et al. Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nature Medicine, 2001, 7(6):673-679.
    [121] Nutt C L, Mani D R, Betensky R A, et al. Gene expression-based classification of malignant gliomas correlates better with survival than histological classification. Cancer Research, 2003, 63(7):1602-1607.
    [122] Armstrong S A, Staunton J E, Silverman L B, et al. MLL translocations specify a distinct gene expression profile that distinguishes a unique leukemia. Nature Genetics, 2002, 30(1):41-47.
    [123] Franc V, Hlavac V. Statistical Pattern Recognition Toolbox. http://cmp.felk. cvut.cz/cmp/software/stprtool/.
    [124] Tong S, Koller D. Support vector machine active learning with applications to text classification. Journal of Machine Learning Research, 2002, 2:45-66.
    [125] Tong S, Chang E. Support vector machine active learning for image retrieval. Proceedings of the ninth ACM international conference on Multimedia, Ottawa, Canada, 2001: 107-118.
    [126] Almgren M, Jonsson E. Using Active Learning in Intrusion Detection. Proceedings of the 17th IEEE workshop on Computer Security Foundations, 2004: 88-98.
    [127]龙军,殷建平,祝恩,等.主动学习研究综述.计算机研究与发展,2008, 45(suppl.): 300-304.
    [128] Bianchi N C, Conconi A, Gentile C. Learning probabilistic linear-threshold classifiers via selective sampling. Proceedings of the 16th COLT, 2003: 373-386.
    [129] Freund Y, Seung H S, Shamir E, et al. Selective sampling using the query by committee algorithm. Machine Learning, 1997, 28(2-3):133-168.
    [130] Schohn G, Cohn D. Less is more: Active learning with support vector machines. Proc of the 17th International Conference, San Francisco: Morgan Kaufmann, 2000: 839-846.
    [131] Seung H A, Opper M, Sompolinsky H. Query by committee. Annual Workshop on Computaional Learning Theory, Pittsburgh, 1992: 287-294.
    [132] Lindenbaum M, Markovitch S, Rusakov D. Selective sampling for nearest neighbor classifiers. Machine Learning, 2004, 54(2): 125-152.
    [133] Baram Y, EI-Yaniv R, Luz K. Online choice of active learning algorithm. Journal of Machine Learning Research, 2004, 5:255-291.
    [134] Muslea I, Minton S, Knoblock C A. Active learning with multiple views. Journal of Artificial Intelligence Research, 2006, 27:203-233.
    [135] Nguyen H T, Smeulders A. Active learning using pre-clustering. Proc of the 21st International Conference on Machine Learning, Banff, Canada, 2004:79-86.
    [136] Liu Y. Active Learning with Support Vector Machine Applied to Gene Expression Data for Cancer Classification. J. Chem. Inf. Comput. Sci., 2004, 44(6): 1936-1941.
    [137] Vogiatzis D, Tsapatsoulis N. Active learning for microarray data. International Journal of Approximate Reasoning, 2008, 47(1): 85-96.
    [138] Chapelle O, Scholkopf B, Zien A, et al. Semi-Supervised Learning. Cambridge, MA: MIT Press, 2006.
    [139] Blum A, Mitchell T. Combining labeled and unlabeled data with co-training. Annual Workshop on Computational Learning Theory, Madison, 1998: 92-100.
    [140] Zhou Z H, Chen K J, Dai H B. Enhancing relevance feedback in image retrieval using unlabeled data. ACM Transactions on Information Systems, 2006, 24(2): 219-244.
    [141] Li Y Q, Guan C T, Li H Q, et al. A self-training semi-supervised SVM algorithm and its application in an EEG-based brain computer interface speller system. Pattern Recognition Letters, 2008, 29(9): 1285-1294.
    [142] Zhou Y, Goldman S. Democratic Co-learning. Proceedings of the 16th IEEE International Conference on Tools with Artificial Intelligence, 2004:594–602.
    [143] Zhou Z H, Li M. Tri-Training: Exploiting Unlabeled Data Using Three Classifiers. IEEE Transactions on Knowledge and Data Engineering, 2005, 17(11): 1529-1541.
    [144] Joachims T. Transductive Inference for Text Classification Using Support Vector Machines. Proceedings of the 16th International Conference on Machine Learning, 1999:200-209.
    [145] Li M, Zhou Z H. Improve computer-aided diagnosis with machine learning techniques using undiagnosed samples. IEEE Transactions on Systems, Man and Cybernetics-Part A: Systems and Humans, 2007, 37(6): 1088-1098.
    [146] Miller D J, Uyar H S. A mixture of experts classifier with learning based on both labeled and unlabeled data. Advances in Neural Information Processing Systems , 1997, 9: 57-71.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700