基于Laplace谱的基因表达谱数据分类研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
基因表达谱数据分类研究就是通过分析DNA微阵列实验中所获取的基因表达谱数据,发掘出不同样本间的基因表达差异,寻找基因与组织病变之间所存在的内在联系。虽然模式识别领域的各种算法在这些年来都有了长足的发展,但是在针对基因表达谱数据的分类研究中仍有许多问题需要解决。基因表达谱数据由于其获取方式的独特性,具有高维度,低样本的特点。传统的机器学习方法在面对这种数据时,无法取得较好的分类结果,并且其极高的运算复杂度,大大降低了数据分析效率。
     本文基于谱图理论展开针对基因表达谱数据的分类研究,将反映图结构的特征表示引入到基因表达谱数据分类中,研究基因表达谱数据的特征提取及基于谱图理论的基因谱表达数据分类方法,并对算法的性能进行分析。主要研究内容有:
     1.基因表达谱数据蕴含着大量的生物信息,如何有效地从中挑选出特征基因将对算法的准确率及实时性产生巨大的影响。本文提出一种利用熵度量作为指标进行癌症基因表达数据特征提取的方法。首先对基因表达数据进行筛选并计算各个基因的熵,然后提取出熵最大的若干基因作为特征基因,并用支持向量机进行分类。对前列腺癌基因表达数据的留一法以及分组法实验都证明了该方法的有效性。
     2.尝试着将一种基于Laplace谱的算法应用于癌症基因表达谱数据的分类上。该方法首先挑选出与类中心欧式距离最小的若干个样本通过高斯权构造Laplace完全图,记为代表该类的标准图。然后用待测样本依次替换标准图中所有的点,将生成的新图与标准图进行特征点匹配,并计算匹配点数总和。最后将待测样本划分为总匹配点数最多的那个类。
     3.提出一种基于图的Fiedler向量的癌症基因表达谱数据聚类算法。该方法将分属不同类的所有样本通过高斯权构造Laplace完全图,经SVD分解后获得Fiedler向量,最终利用各样本所对应的Fiedler向量分量的符号差异来进行基因表达谱数据的分类。
Classification of gene expression data is an important way to find the relationship between the different genes. Although the field of pattern recognition algorithms have been significant developed in these years, but it still has many problems must be solved in clustering of gene expression data. Because of the two characteristics (high dimension and low sample) of gene expression data, traditional machine learning methods can not get desired results, and its high computational complexity greatly reduces the efficiency of data analysis.
     The theory of graphs spectra is introduced into the classification of gene expression data. We utilize this theory to extract the feature of gene expression data and propose some algorithms for classification of gene expression data. This dissertation's main research contents and the achievements are as follows:
     1. DNA microarray technology has brought a far-reaching impact on the biomedical field, and it is very significant for using classification method to analyze tumor gene expression data. This dissertation proposes an algorithm for obtaining informative genes of tumor gene expression data by utilizing entropy as an indicator. The whole process is done by first putting tumor gene expression data into strata and calculating the entropy of each individual cancer genes. Then, several genes with the highest entropy were selected and classified using SVM. The effectiveness of this algorithm has been proven by leaving-one method and group method.
     2. We introduce a novel classification algorithm for gene expression data based on the Laplacian spectra of graphs. Firstly, the class center is obtained by computing the average of each class in the training set, and the Laplacian matrices of complete graphs so called normal graphs are constructed on some samples with the minimum Euclidean distance between the class center. Then, the sum of matched points is calculated by replacing points of standard image with test samples. Finally, the test sample is divided into the biggest one of the total matched points of the class.
     3. This dissertation proposes an algorithm for classification of gene expression data based on Fiedler Vector. Firstly, the Laplacian matrix of complete graph is constructed on all the different types of gene expression data. Then, the Fiedler Vector is obtained by the singular value decomposition of this Laplacian matrix. Finally, the samples are divided into two classes by utilizing the signs of the Fiedler Vector components.
引文
[1]黄德双.基因表达谱数据挖掘方法研究[M].北京:科学出版社,2009.
    [2]Golub TR, Slonim DK, Tamayo P,et al. Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring [J]. Science,1999,286:531-537.
    [3]Michael B Eisen, Paul T Spellman, Patrick O Brown et al. Cluster analysis and display of genome wide expression patterns.PNAS USA,1998,95(25):14863-14868.
    [4]Brazma A, Vilo J. Gene expression data analysis [J]. FEBS Letters,2000,480(1): 17-24.
    [5]阮晓钢,周淑娟.基于聚类的肿瘤亚型发现模型[J].控制工程,2007,14(2):122-124
    [6]阮晓钢,晁浩.肿瘤识别过程中特征基因的选取[J].控制工程,2007,14(4):373-375
    [7]Sugiyama A,Kotani M. Analysis of gene expression data by using self-organizing maps and k-means clustering[J]. Neural Network,2002(5):1342-1345.
    [8]Singh D, Febbo PG, Ross K,et al. Gene expression correlates of clinical prostate cancer behavior [J]. Cancer Cell,2002,1(2):203-209.
    [9]Andrew D Keller.Michel Schummer.Lee Hood. Bayesian classification of DNA array expression data[R]. UW-CSE-2000
    [10]Zhou Xiaobo,Wang Xiaodong,Dougherty ER.A Bayesian approach to nonlinear porbit gene selection and classification [J] Journal of the Franklin Institute,2004, 341:137-156.
    [11]Zhang Heping,Yu Chang-Yung.Singer Burton,et al.Recursive partitioning for tumor classification with gene expression micorarray data [J]. PNAS USA,2001, 98(12):6730-6735.
    [12]李颖新,刘全金,阮晓钢.一种癌症基因表达数据的知识提取方法[J].电 子学报2004,32(9):1479-1482.
    [13]Yang Yonggao.Chen JX.Kim Woosung.Gene expression clustering and 3D visualization[J]. Computing in Science and Engineering 2008,5(5):37-43
    [14]Tamayo P.Slonim D.Mesirov J Interpreting patterns of gene expression with self-organizing maps:methods and application to hematopoietic differentiation[J]. PNAS USA,1999,96(6):2907-2912
    [15]Ressom HH.Wang D.Natarajan P.Adaptive double self-organizing map and its application in gene expression data[R].Proceedings of the International Joint Conference on Nueral Networks 2003
    [16]Brown PS Michael.Grundy Noble William.Lin David Knowledge based analysis of microarray gene expression data by using support vector machines [J]. PNAS USA,2000,97(1):262-267
    [17]Furey TS.Cristianini N.Duffy N. Support vector machine classification and validation of cancer tissue samples using microarray expression data [J]. Bioinformatics,2000,16(10):906-914
    [18]Ben-Dor A.Bruhn L.Friedman N Tissue classification with gene expression profiles[J]. Comput Biol 2000,7(3-4):559-583
    [19]Sridhar Ramaswamy.Pablo Tamayo.Ryan Rifkin Multiclass cancer diagnosis using tumor gene expression signatures [J]. PNAS USA,2001,98(26):15149-15154
    [20]Guyon I, Weston J, Barnhill S,et al. Gene Selection for Cancer Classification using Support Vector Machines [J]. Machine Learning,2002,1(46):389-422.
    [21]李颖新,阮晓钢.基于基因表达谱的肿瘤亚型识别与分类特征基因选取研究[J].电子学报,2005,33(4):651-655
    [22]李颖新,李建更,阮晓钢.肿瘤基因表达谱分类特征基因选取问题及分析方法研究[J].计算机学报,2006,29(2):324-330
    [23]刘全金,李颖新,阮晓钢.基于BP网络灵敏度分析的肿瘤亚型分类特征基因选取[J].中国生物医学工程学报,2008,27(5):710-715
    [24]Alok Sharma, Kuldip K. Paliwal. Cancer classification by gradient LDA technique using microarray gene expressiondata[J]. Data & Knowledge Engineering,2008,66:338-347.
    [25]Muni S. Srivastava,Tatsuya Kubokawa. Comparison of discrimination methods for High Dimensional Data[J]. Japan Statist,2007,37(1):123-134.
    [26]Shital Shah, Andrew Kusiak. Cancer gene searchwith data-mining and genetic algorithms[J]. Computers in Biology and Medicine,2007,37:251-261.
    [27]Huang DS,Zheng CH. Independent component analysis-based penalized discriminant method for tumor classification using gene expression data[J]. Bioinformatics,2006,22(15):1855-1862.
    [28]Shulin Wang, Huowang Chen, Ji Wang, et al.Molecular Diagnosis of Tumor Based on Independent Component Analysisand Support Vector Machines[R]. International Conference on Computational Intelligence and Security,2006:362-367.
    [29]Naoto Yukinawa, Shigeyuki Oba, Kikuya Kato, et al. Optimal Aggregation of Binary Classifiers for Multiclass Cancer Diagnosis Using Gene Expression Profiles[J]. Computational Biology and Bioinformatics,2009,6(2):333-343.
    [30]C.W. Yeung, p.H.P Leung,2K.Y.Chan,et al. An Integrated Approach of Particle Swarm Optimization and Support Vector Machine for Gene Signature Selection and Cancer Prediction[R]. Proceedings of the 28th IEEE EMBS Annual International Conference,2009:3450-3456
    [31]Ching Wei Wang, New Ensemble Machine Learning Method for Classification and Prediction on Gene Expression Data[R]. Proceedings of the 28th IEEE EMBS Annual International Conference,2006:3478-3481.
    [32]Shulin Wang, Ji Wang, Huowang Chen,et al. The Classification of Tumor Using Gene Expression Profile Based on Support Vector Machines and Factor Analysis[R]. Sixth International Conference on Intelligent Systems Design and Applications,2006:471-476.
    [33]K.Y. Chan, H.L. Zhu, C.C. Lau,et al. Gene Signature Selection for Cancer Prediction Using an Integrated Approach of Genetic Algorithm and Support VectorMachine[R]. IEEE Congress on Evolutionary Computation,2008: 217-224.
    [34]Arpita Das, Mahua Bhattacharya. GA Based Neuro Fuzzy Techniques for Breast Cancer Identification[R]. International Machine Vision and Image Processing Conference,2008:136-141.
    [35]Fang-Xiang Wu, Fang-Xiang Wu. Dynamic-Model-Based Method for Selecting Significantly Expressed Genes From Time-Course Expression Profiles[J].IEEE Transactions on Information Technology in Biomedicine,2010,14(1):16-22.
    [36]Thomas C. Chenl, Sandeep Sanga, Tina Y. Chou. Neural Network with K-Means Clustering via PC A for Gene Expression Profile Analysis[R]. World Congress on Computer Science and Information Engineering,2009:670-673.
    [37]E. Fersini, I. Giordani, E. Messina, F. Archetti. Relational Clustering and Bayesian Networks for Linking Gene Expression Profiles and Drug Activity Patterns[R]. IEEE International Conference on Bioinformatics and Biomedicine Workshop,2009:20-25.
    [38]Jin-Mao Wei, Xin-Bin Yang, Shu-Qin Wang,et al. A Novel Rough Hypercuboid Method for Classifying Cancers Based on Gene Expression Profiles[R]. Fifth International Conference on Fuzzy Systems and Knowledge Discovery,2008: 262-266.
    [39]Kasturi.J, Acharya.R. A new information-theoretic dissimilarity for clustering time-dependent gene expression profiles modeled with radial basis functions [R]. IEEE International Joint Conference on Neural Networks,2008:2857-2864.
    [40]Haifeng Li, Keshu Zhang, Tao Jiang. Robust and Accurate Cancer Classification with Gene Expression Profiling[R]. Computational Systems Bioinformatics Conference,2005:310-321.
    [41]Xiaogang Ruan, Jinlian Wang, Hui Li, et al. A Method for Cancer Classification Using Ensemble Neural networks with Gene Expression Profile[R].The 2nd International Conference on Bioinformatics and Biomedical Engineering,2008: 342-346.
    [42]N. L. Biggs. Algebraic Graph Theory [M].Cambridge:Cambridge University Press,1974.
    [43]D. Cvetkovic, M. Doob and H. Sachs, Spectra of Graphs -Theory and Application (third edition) [M]. New York:Johann Ambrosius Barth Verlag, 1995.
    [44]F.R.K. Chung, Spectral Graph Theory [J]. American Mathematical Society, Providence, Rhode Island,1997.
    [45]L. Collatz, U. Sinogowitz. Spektren endlicher grafen [J]. Abh. Math. Sem. Univ. Hamburg,1957,21:63-77.
    [46]B. Mohar. Some applications of Laplace eigenvalues of graphs [J].Graph Symmetry:Algebraic Methods and Applications,497 NATO ASI Series C,1997, 227-275.
    [47]李炯生,张晓东,潘永亮.图的Laplace特征值[J].数学进展,2003,2(32):157-165.
    [48]B.Mohar.The Laplacian spectrum of graphs [J]. in:Graph Theory, Combinatorics, and Applications,1991,2:871-898.
    [49]E. Hiickel. Quantentheoretische Beitrage zum Benzolproblem [D]. Z. Phys.1931, 70:204-286.
    [50]H. Sachs. Beziehungen zwischen den in einem graphen exthaltenen Kreisen und seinemcharakteristischen polynom [J]. Publ. Math. Debrecen,1964,11:119-134.
    [51]J. Hoffman. On eigenvalues and colrings of graphs [C]. Graph Theory and Its Applications (B. Harris. Ed.), Academic Press, New York,1970:79-91.
    [52]D. Cvetkovic. Graphs and their spectra [D]. Univ. Beograd Publ. Elektrotehn. Fak. Ser. Mat. Fiz.1971,354:1-50.
    [53]D.Cvetkovic, M. Doob and H. Sachs, Spectra of Graphs-Theory and Application (second edition) [M]. Berlin:VEB Deutscher Verlag d. Wiss,1982.
    [54]D. Cvetkovic, M. Doob, I. Gutman and A. Torgasev. Recent Results in The theory of Graph Spectra [M]. North Holland, Amersterdam,1988.
    [55]D. Cvetkovic, P. Rowlinson, S. Simic. Eigenspace of Graphs [M]. Cambridge University Press, Cambridge,1997.
    [56]N. Biggs. Algebraic Graph Theory (second edition) [M]. Cambridge U. P. Cambridge,1993.
    [57]C. D. Godsil. Algebraic Combiantorics [M], Chapman & Hall, New York,1993.
    [58]M. Fiedler. A geometric approach to the laplacian matrix of a graph [J]. Combinatorial and Graph-Theoretical Problems in Linear Algebra,1993,73-98.
    [59]R. Merris, R. Grone. The laplacian spectrum of a graph ii [J]. SIAM J. Discrete Math.,1994,7:221-229.
    [60]R. Grone. On the geometry and laplacian of a graph [J]. Linear Algebra and Appl.,1991,150:167-178.
    [61]R. Merris. A survey of graph laplacians [J]. Linear Algebra Appl.,1994, 39:19-31.
    [62]B. Mohar. Lapalce eigenvalues of graphs-a survey [J]. Discretr Math.,1992, 109:171-183.
    [63]I. Chavel. Eigenvalues in Riemannian Geometry [M]. Academic Press, New York,1984.
    [64]G Kirchhoff. ber die auflsung der gleichungen, auf welche man bei der untersuchung der linearen verteilung galvanischer strme gefhrt wird [J]. Ann. Phys. Chem.,1847,72:497-508.
    [65]王年,范益政,韦穗,梁栋.基于图的Laplace谱的特征匹配[J].中国图象图形学报,2006,11(3):332-336.
    [66]M. Fiedler. Algebraic connectivity of Graphs[J], Czechoslovak Mathematical Journal1973,23 (98):298-305.
    [67]M. Fiedler. Laplacian of graphs and algebraic connectivity, Combinatorics and Graph Theory 1989,25:57-70.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700