生物信息数据挖掘中的若干方法及其应用研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
许多生物(包括人在内)的基因组测序已经完成或接近完成,在揭示这些巨量数据所蕴涵的信息时,产生了一门新的交叉学科—生物信息学,通过对生物学实验数据的获取、加工、存储、检索与分析,进而达到揭示数据所蕴含的生物学意义的目的。数据挖掘技术用于在数据库中发现潜在有用的知识,在生物信息学研究当中,正发挥着越来越重要的作用,而且取得了丰硕的成果。本文探讨若干生物信息数据挖掘的方法及其应用,主要工作如下:
     1.用支持向量机和FDOD两种方法对同源寡聚蛋白质进行了分类研究。Garian R.利用决策树方法从蛋白质一级结构出发对同源二聚体和同源非二聚体进行了分类,证实了蛋白质一级结构即氨基酸序列包含四级结构信息。本文用SVM和FDOD两种方法对同源二聚体和同源非二聚体进行分类,利用原始序列的子序列分布作为特征向量。采用和决策树方法同样的数据集,两种方法均大幅度提高了预测准确率。本文也对同源二聚体、同源三聚体、同源四聚体和同源六聚体进行了分类,取得了好的结果。
     2.构造了基于线性规划的ν-SVM分类器。Scholkopf B等提出的基于二次规划的ν-支持向量机(ν-SVM)相比标准的SVM,其优势在于可以控制支持向量的数目和误差,但由于增加了模型的复杂性,限制了其应用。本文构造了一种基于线性规划的ν-SVM分类器,模型简单,参数ν具有明确的意义,同样可以控制支持向量的数目和误差,可以直接利用比较成熟的线性规划算法。数值试验表明,本文提出的基于线性规划的ν-SVM的训练速度要比基于二次规划的ν-SVM快得多,而分类效果两者相当。
     3.提出了无参数鲁棒线性规划支持向量机分类的牛顿算法。Mangasarian O L最近提出的无参数鲁棒线性规划支持向量机克服了标准SVM需要选取正则化参数等一些缺点,其模型是一个线性规划。本文给出了这种线性规划的精确的最小2-范数解,在此基础上提出了快速的牛顿算法,此算法只需要一个线性方程组解算器。理论、数值实验以及在癌症基因表达数据分类上的应用都表明了用牛顿算法实现的无参数鲁棒线性规划支持向量机模型合理、简单,算法快速、容易实现。
     4.用FDOD方法对DNA序列进行相似性分析。序列的比较是生物信息学中最常用的研究手段之一,其根本任务是发现序列之间的相似性和不相似性。序列比对是序列比较的主要方法,但有其不足之处,所以很多人寻求用其他方法来比较DNA序列。本文
The sequencing of several genomes, including the human genome, has provided a vast amount of data which must be exploited. Bioinformatics is essentially the science of taking this. In Bioioformatics researchers study how to capture, manage, deposit, retrieve, analyze biological information enabling the discovery of encyclopedic biological knowledge. Data mining technology is used to extract potential and useful information from the databases, and is playing an increasingly important role in the study of Bioioformatics and bear fertile fruits. This paper investigates some data mining methods for bioinformatics and their application. The main work is summarized as followings:1. Both support vector machine and FDOD methods are applied to classification of homo-oligomeric proteins. Garian R used decision tree method to discriminate between homodimers and non-homodimers from the primary structure and showed that protein primary sequence contains quaternary structure information. In this present work, support vector machine and FDOD methods are applied to discriminating between homodimers and non-homodimers, where for training and testing protein primary sequences, their subsequence distributions act as input vectors. The classification results of the two methods are much better than that of the previous method on the same data set. The two methods are also applied to discriminating between homodimers, homotrimers, homotetramers and homohexamers from the protein primary structure, and the results are also good.2. A new v - SVM classifier based on linear programming is proposed. The v - SVM Classifier proposed by Scholkopf B has the advantage of controlling numbers of support vectors and errors compared to regular SVM, However, Its formulation is more complicated, which confines its applications. We present a new and simpler v - SVM classifier based on linear programming, The parameter v also has implicit sense of controlling numbers of support vectors and errors. Furthermore we can use effective linear programming solvers available. Numerical tests show that our v - SVM based on linear programming is much faster than original v - SVM and performs comparably in accuracy.3. A Newton method for parameterless robust linear programming support vector machine is presented. Parameterless robust linear programming support vector machine for classification,
    
    recently proposed by Mangasarian O L, solved this issue of determining the size of regular parameter. We have discussed the least 2 - norm solution of the parameterless linear programming problem and then presented a fast Newton method. The algorithm requires only a linear equation solver. The theory, numerical tests and application to gene expression data for cancer classification demonstrate that it is simple, fast and easily accessible.4. FDOD is applied to analysis of similarities of DNA sequences. Comparison of sequences is one of the most common study means in Bioinformatics. Comparison of sequences aims at analyzing the similarity and dissimilarity of DNA sequences. It mainly depends on sequence alignment, which has some shortcomings. So people try to develop new methods. FDOD is used to analyze the similarities of DNA sequences from the primary structure. The effect of residue order along the sequence is taken into account in some extent. For different length of subsequence the approach is illustrated through the examination of similarities among the coding sequences of the first exon of β-globin gene of 11 different species. The resultsdemonstrate that FDOD method is effective.5. A novel 2-D graphical representation of DNA sequences and corresponding numerical characterization approach are proposed, and then applied to examining the similarities of DNA sequences. Graphical representations of DNA sequences allow visual inspection of data, and can facilitate the analysis, comparison and identification of such sequences. This paper considers a novel 2-D graphical representation of DNA sequences according to homomorphism in Algebra and chemical structure classification of
引文
[1] Fayyad U, Piatetsky-Shapim G, Smyth P. The KDD process for extracting useful knowledge from volumes of data. Communications of the ACM, 1996, 39 (11): 27-34.
    [2] Hart J, Kamber M. Data mining: concepts and techniques, San Francisco: Morgan Kaufinann, 2001.
    [3] 史忠植.知识发现.北京:清华大学出版社,2002.
    [4] Baxevartis A D,OuelleRe B F F.生物信息学—基因和蛋白质分析的实用指南,李衍达,孙之荣等译.北京:清华大学出版社,2000.
    [5] 赵国屏等.生物信息学.北京:科学出版社,2002.
    [6] 郝柏林,张淑誉.生物信息学手册.上海:上海科学技术出版社,2000.
    [7] 陈润生.生物信息学.生物物理学报,1999,15(1):6-12.
    [8] 张春霆.生物信息学—重大科学意义与经济效益兼备的新学科.中国科学基金,1999(2):65-68.
    [9] Baldi P,Brunak S.生物信息学—机器学习方法,张东晖等译.北京:中信出版社,2003
    [10] Vapnik V. Estimation of dependencies based on empirical data. Berlin: Springer-Verlag, 1982.
    [11] Cherkassky V, Mulier F. Learning from Data: concepts, theory and methods. New York: John Viley & Sons, 1997.
    [12]Vapnik V.统计学习理论的本质.张学工译.北京:清华大学出版社,2000.
    [13] 张学工.关于统计学习理论与支持向量机.自动化学报,2000,26(1):32-42.
    [14]邓乃扬,田英杰著.数据挖掘中的新方法—支持向量机.北京:科学出版社,2004.
    [15] Cristianini N, Shawe-Taylor J. An introduction to support vector machines and other kernel-based leaming methods. Cambridge: Cambridge University Press, 2000.
    [16] Cortes C, Vapnik V. Support-vector networks. Machine Learning, 1995, 20: 273-297.
    [17] Sch(o|¨)lkopf B, Burges C, Vapnik V. Extracting support data for a given task. In: Fayyad U M, Uthurusamy R, eds. Proc. Of First Intl. Conf. on Knowledge Discovery & Data Mining. Menlo Park, CA: AAAI Press, 1995. 262-267.
    [18] Sch(o|¨)lkopf B, Burges C, Smola AJ, eds. Advances in Kernel Methods-Support Vector Learning. Cambridge, MA: MIT Press, 1998.
    
    [19] Burges C J C. A tutorial on support vector machines for pattem recognition. Data Mining and Knowledge Discovery, 1998, 2 (2): 1-47.
    [20] Smola A J, Scholkopf B. A tutorial on support vector regression. Statistics and Computing, 2004, 14: 199-222.
    [21] Fang W. The disagreement degree of multi-person judgments in additive structure, Mathematical Social Sciences, 1994, 28 (2): 85-111.
    [22] Fang W. On a global optimization problem in the study of information discrepancy, Joumal of Global Optimization, 1997, 11: 387-408.
    [23] Fang W. The characterization of a measure of information discrepancy. Information Science, 2000, 125: 207-232.
    [24] Fang W, Roberts F, Ma Z. A measure of discrepancy of multiple sequences. Information Science, 2001, 137: 75-102.
    [25] Wang J., Fang W., Ling L., Chen R. Gene's functional arrangement as a measure of the phylogenetic relationships of microorganisms. Journal of Biological Physics, 2002, 28: 55-62.
    [26] Jin L, Fang W, Tang H. Prediction of protein structural classes by a new measure of information discrepancy. Computational Biology and Chemistry, 2003, 27: 373-380.
    [27] Garian R. Prediction of quatemary structure from primary structure. Bioinformatics, 2001, 17: 551-556.
    [28] 袁亚湘,孙文瑜.最优化理论与方法.北京:科学出版社,1997.
    [29] Colin C. Algorithmic approaches to training support vector machines: A survey. In: Proceedings of ESANN 2000. Belgium: D-Facto Publications, 2000. 27-36.
    [30] 刘江华,程君实,程佳品.支持向量机训练算法综述.信息与控制,2002,31(1):45-50.
    [31] 宋晓峰,陈德创,俞改军等.支持向量机中优化算法.计算机科学,2003,30(3):12-15.
    [32] Osuna E, Freund R, Girosi F. An improved training algorithm for support vector machines. In: Principe J, Gile L, Morgan N et al, eds. Neural Networks for Signal Processing Ⅶ-Proceeding of the 1997 IEEE workshop. NewYork: IEEE, 1997. 276-285.
    [33] Joachims T. Make large-scale support vector machine learning practical. In: Sch(o|¨)lkopf B, Burges C, Smola A J, eds. Advances in Kernel Methods-Support Vector Learning. Cambridge, MA: MIT Press, 1999: 185-208.
    [34] Platt J. Fast training of support vector machines using sequential minimal optimization. In: Sch(o|¨)lkopf B, Burges C, Smola, A J, eds. Advances in Kernel Methods-Support Vector Learning. Cambridge, MA: MIT Press, 1999. 185-208.
    
    [35] Keerthi S S, Shevade S K, Bhattacharyya C et al. Improvements to Platt SMO algorithm for SVM classifier design. Neural Computation, 2001,13:637-649.
    [36] Hsu CW, Lin CJ. A simple decomposition method for support vector machines. Machine Learning, 2002,46: 291-314.
    [37] Chang C, Hsu C, Lin C. The analysis of decomposition methods for support vector machines. IEEE Transaction on Neural Networks, 2000,11(4): 1003-1008.
    [38] Keerthi S S, Gilbert E. Convergence of a generalized SMO algorithmfor SVM classifier design. Machine Learning, 2002,46 (13): 351-360.
    [39] Chang C C, Lin C J. LIBSVM: a Library for Support Vector Machines (Version 2. 3). http://www.csie.ntu.edu.tw/~cjlin/papers/libsvm.pdf, 2001.
    [40] Mangasarian O L. Generalized support vector machines. In: Smola AJ, Bartlett PL, Scholkopf B et al, eds. Advances in Large Margin Classifiers, Cambridge, MA: MIT Press,2000.135-146.
    [41] Mangasarian O L, Musicant D R. Successive overrelaxation for support vector machines. IEEE Transactions on Neural Networks, 1999,10:1032-1037.
    [42] Mangasarian O L, Musicant D R. Lagrangian support vector machines. Journal of Machine Learning Research, 2001,1:161-177.
    [43] Mangasarian O L, Musicant D R. Active support vector machine classification. In: Leen T, Dietterich T, Tresp V, eds. Advances in Neural Information Processing Systems 13. Cambridge, MA: MIT Press, 2001. 577-583.
    [44] Lee Yuh-Jye, Mangasarian O L. SSVM: A smooth support vector machine. Computational Optimization and Applications, 2001,20(1): 5-22.
    [45] Fung G, Mangasarian O L. Proximal support vector machine classifiers. In: Provost F, Srikant R, eds. Proceedings KDD-2001: Knowledge Discovery and Data Mining. Menlo Park,CA.: AAAI Press, 2001. 77-86.
    [46] Lee Y, Mangasarian O L. RSVM: Reduced Support Vector Machines, Technical Report 00-07, Data Mining Institute, Computer Sciences Department, University of Wisconsin,Madison, Wisconsin, July 2000. Proceedings of the First SIAM International Conference on Data Mining, Chicago, April 5-7,2001, CD-ROM Proceedings.
    [47] Scholkopf B, Smola AJ, Williamson RC et al. New support vector algorithms. Neural Computation, 2000,12:1207-1245.
    [48] Suykens JAK, Vandewalle J. Least squares support vector machine classifiers. Neural Processing Letters, 1999,9: 293-300.
    [49] Suykens JAK, Vandewalle J, De Moor B. Optimal control by least squares support vector machines, Neural Networks, 2001,14(1): 23-25.
    
    [50] Suykens JAK, Lukas L, Vandewalle J. Sparse least squares support vector machine classifiers. In: Verleysen M, ed. Proceedings of the European Symposium on Artificial Neural Networks (ESANN-2000) Bruges, Belgium, April 2000.37 - 42.
    [51] Cawley GC, Talbot, NLC. Talbot. Improved sparse least-squares support vector machines. Neurocomputing, 2002,48:1025-1031.
    [52] Keerthi SS, Shevade SK, Bhattacharyya C et al. A fast iterative nearest point algorithm for support vector machine classifier design. IEEE Transactions on Neural Network, 2000,11 (1): 124-136.
    [53] Friess TT, Cristianini N, Campbell C. The kernel adatron algorithm: A fast and simple learning procedure for support vector machines. In: Shavlik J, ed. Machine Learning Proceedings of the Fifteenth International Conference (ICML'98). San Francisco: Morgan Kaufmann, 1998.188-196.
    [54] Vijayakumar S, Wu S. Sequential support vector classifiers and regression. In: Parenti R,Masulli F. Proceedings of the International Conference on Soft Computing (SOCO' 99),Genova, Italy: ICSC Academic Press, 1999.610-619.
    [55] Cauwenberhs G, Poggio T. Incrimental and decrimental support vector machine. In:Leen T, Dietterich T, Tresp V, eds. Advances in Neural Information Processing Systems 13.Cambridge, MA: MIT Press, 2001.409-415.
    [56] Ralaivola L, d'Alche-Buc F. Incremental support vector machine learning: A local approach. In: Dorffner G, Bischof H, Hornik H et al, eds. Proceedings of ICANN '01. Berlin:Springer-Verlag, 2001. 322-330.
    [57] Lau K W, Wu Q H. Online training of support vector classifier. Pattern Recognition 2003,36:1913-1920.
    [58] Rychetsky M, Ortmann S, Ullmann M et al. Accelerated training of support vector machines. In: Proceedings of International Joint Conference on Neural Networks (IJCNN 99).IEEE, 1999.998-1003.
    [59] Yang MH, Ahuja N. A geometric approach to train support vector machines. In:Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2000.430-437.
    [60] Knerr S, Personnaz L, Dreyfus G. Single-layer learning revisited: A stepwise procedure for building and training a neural network. In: Fogelman-Soulie F, Herault J, eds. Neurocomputing: Algorithms, Architectures and Applications. NATO ASI. Springer, 1990. 41-50.
    
    [61] Krebel U. Pairwise classification and support vector machines. In: Sch(o|¨)lkopf B, Burges, C, Smola, AJ, eds. Advances in Kemel Methods-Support Vector Learning. Cambridge, MA: MIT Press, 1999. 255-268.
    [62] Platt J, Cristianini N, Shawe-Taylor J. Large margin DAG's for multiclass classification, in: Solla S, Leen T, Muller K, eds. Advances in Neural Information Processing Systems 12. Cambridge, MA: MIT Press, 2000. 547-553.
    [63] Weston J, Watkins, C. Support vector machines for multi-class pattern recognition. In: Verleysen M, ed. Proc. ESANN 99. Brussels, Belgium, 1999. 219-224.
    [64] Sch(o|¨)lkopf B, Sung K, Burges C et al. Comparing support vector machines with Gaussian kernels to radial basis function classifiers. IEEE Transactions On Signal Processing, 1997, 45: 2758-2765.
    [65] 卢增祥,李衍达.交互SVM学习算法及其在文本信息过滤中的应用.清华大学学报,1999,39(7):93-97.
    [66] Joachims T. Text categorization with support vector machines: Learning with many relevant features. In: Nedellec C, Rouveirol C, eds. Proceedings of the Tenth European Conference on Machine Learning (ECML'98). Berlin: Lecture Notes in Computer Science, 1998, Vol. 1398. 137-142,
    [67] Osuna E, Freund R, Girosi F. Training support vector machines: An application to face detection. In Proceedings of the 1997 Computer Vision and Pattern Recognition (CVPR'97), Puerto Rico, June 1997. 130-136.
    [68] Jonsson K, Matas J, Kittler J et al. Learning Support Vectors for Face Verification and Recognition. In: Crowley J, ed. Proceedings of Fourth IEEE International Conference on Automatic Face and Gesture Recognition 2000, Grenoble, France, March 2000. 208-213
    [69] Kumar V, Poggio T. Learning-based approach to real time tracking and analysis of faces. In: Crowley J, ed. Proceedings of Fourth IEEE International Conference on Automatic Face and Gesture Recognition 2000, Grenoble, France, March 2000. 96-101.
    [70] 田盛丰,黄厚宽.基于支持向量机的数据库学习算法.计算机研究与发展,2000,37(1):17-22.
    [71] Vapnik V, Golowich S., Smola A. Support vector method for function approximation, regression estimation, and signal processing. In: Mozer M, Jordan M, Petsche T, eds, Neural Information Processing Systems 9. Cambridge, MA: MIT Press, 1997. 281-287.
    [72] Muller KR, Smola AJ, Ratsch G, et al. Predicting time series with support vector machines. In: Gerstuner W, Germond A, Hasler Met al, eds. Artiificial Neural Networks-ICANN'97. Berlin: Springer Lecture Notes in Computer Science, Vol. 1327, 1997. 999-1004.
    
    [73] Mukherjee S, Osuna E., Girosi F. Nonlinear prediction of chaotic time series using a support vector machine. In: Principe J, Gile L, Morgan N et al, eds. Neural Networks for Signal Processing Ⅶ-Proceeding of the 1997 IEEE workshop. NewYork: IEEE, 1997. 511-520.
    [74] Furey TS, Cristianini N, Duffy N et al. Support vector machine classification and validation of cancer tissue samples using microarray expression data. Bioinformatics, 2000,16: 906-914.
    [75] Dubchak I, Muchnik I, Mayor C et al. Recognition of a protein fold in the context of the SCOP classification. Proteins: Structcture, Function, and Genetics, 1999, 35: 401-407.
    [76] Bock J R, Gough, D A. Predicting protein-protein interactions from primary structure. Bioinformatics, 2001, 17: 455-460.
    [77] Cai YD, Liu X J, Xu XB et al. Support vector machines for prediction of protein subcellular location by incorporating quasi-sequence-order effect. Journal of Cellular Biochemistry, 2002, 84: 343-348.
    [78] Hua S, Sun Z. A novel method of protein secondary structure prediction with high segment overlap measure: support vector machine approach. Journal of Molecular Biology, 2001, 308: 397-407.
    [79] Jin L, Tang H, Fang W. Prediction of protein subcellular locations using a new measure of information discrepancy. Journal of Bioinformatics and Computational Biology. To appear.
    [80] 王冰,唐焕文,修志龙,方伟武.基于信息离散性度量方法的大肠杆菌全基因组比较研究.中国生物工程杂志,2003,23(11):57-62.
    [81] 张文,唐焕文,方伟武,修志龙.信息离散性度量方法在SARS病毒研究中的应用.计算机与应用化学,2003,20(6):719-723.
    [82] Anfinsen C B, Haber E, Sela M, White F H. The kinetics of formation of native ribonuclease during oxidation of the reduced polypeptide chain. Proceedings of the National Academy of Sciences of the United States of America, 1961, 47: 1309-1314.
    [83] Anfinsen C B. Principles that govern the folding of protein chains. Science, 1973,181: 223-230.
    [84] Klotz I M, Darnall D M, Langerman N R. Quaternary structure of proteins. In: Neurath H, Hill RL, Eds. The Proteins 1. New York: Academic Press, 1975. 293-411.
    [85] Marcotte E M, Pellegrini M, Ng H L, et al. Detecting protein function and proteinprotein interactions from genome sequences. Science, 1999, 285:751-753.
    [86] Glaser F, Steinberg D M, Vakser I A, et al. Residue frequencies and pairing preference at protein-protein interfaces. Proteins: Structcture, Function, and Genetics, 2001, 43: 89-102.
    [87] Bock J R, Gough D A. Predicting protein-protein interactions from primary structure. Bioinformatics, 2001, 17: 455-460.
    
    [88] Nooren IMA, Thornton JM. Structural characterization and functional significance of transient protein-protein interactions. Journal of Molecular Biology, 2003,325: 991-1018.
    [89] Ofran Y, Rost B. Analysing six types of protein-protein interfaces. Journal of Molecular Biology, 2003,325: 377-387.
    [90] Zhang S, Pan Q, Zhang H, et al. Support Vector Machines for Predicting Protein Homo-Oligomers by Incorporating Pseudo-Amino Acid Composition. Internet Electronic Journal of Molecular Design, 2003,2: 392-402.
    [91] Duan K, et al. Evaluation of simple performance measures for tuning SVM hyperparameters. Neurocomputing, 2003,51:41-59.
    [92] Crisp D J, Burges C J C. A geometric interpretation of v - SVM classifiers. In M. S. Kearns, et al, eds. Advances in neural information processing, Vol 12. Cambridge, MA: MIT Press, 2000.244-251.
    [93] Chang CC, Lin CJ. Training v- Support vector classifiers: theory and applications, Neural Computation, 2001,13:2119-2147.
    [94] Mangasarian O L. Arbitrary-norm separating plane. Operations Research Letters,1999,24:15-23.
    [95] Zhang Y. Solving Large-Scale Linear Programs by Interior-Point Methods Under the MATLAB Environment. Technical Report TR96-01, Department of Mathematics and Statistics,University of Maryland, Baltimore County, Baltimore, MD, July 1995.
    [96] Mehrotra S. On the Implementation of a Primal-Dual Interior Point Method, SIAM Journal on Optimization, 1992,2: 575-601.
    [97] Gill P E, Murray W, Wright M H. Practical Optimization, London:Academic Press,1981.
    [98] Murphy P M, Aha D W. UCI machine learning repository,http://www.ics.uci.edu/Mmeam/MLRepository.html.
    [99] Mangasarian O L. Support vector machine classification via parameterless robust linear programming. Optimization Methods and Software. 2005,20 (1): 115-125.
    [100] Fung G, Mangasarian O L. A feature selection Newton method for Support Vector Machine classification. Computational Optimization and Applications. 2004,28 (2): 185-202.
    [101] Fung G, Mangasarian O L. Finite Newton method for Lagrangian support vector machine classification. Neurocomputing, 2003,55(1-2): 39-55.
    [102] Bradley P S, Mangasarian O L. Feature selection via concave minimization and support vector machines. In: Shavlik J, eds. Machine Learning Proceedings of the Fifteenth International Conference (ICML'98), San Francisco, California: Morgan Kaufmann. 82-90.ftp://ftp.cs.wisc.edu/math-prog/tech-reports/98-03.ps.
    
    [103] Bertsekas DP. Nonlinear Programming. Belmont, MA: Athena Scientific, second edition, 1999.
    [104] Fiacco AV, McCormick GP. Nonlinear Programming: Sequential Unconstrained Minimization Techniques. New York: John Willey &Sons,1968.
    [105] Mangasarian O L. Nonlinear programming. Philadelphia, PA: SIAM, 1994.
    [106] Mangasarian O L. Normal solutions of linear programs. Mathematical Programming Study, 1984,22:206-216.
    [107] Mangasarian O L. Nonlinear perturbation of linear programs. SIAM Journal on Contral and Optimization, 1979,17(6):745-752.
    [108] Mangasarian O L. Parallel gradient distribution in unconstrained optimization. SIAM Journal on Control and Optimization, 1995,33 (6): 1916-1925.
    [109] Golub G H, VanLoan C F. Matrix computations, 3rd edition. Baltimore, Maryland: The John Hopkins University Press, 1996.
    [110] DeRisi J L, et al. Exploring the metabolic and genetic control ofgene expression on a genomic scale. Science, 1997,278: 680-685.
    [111] Golub T R, Slonim D K, Tamayo P, et al. Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science, 1999,286: 531-537.
    [112] Alon U, Barkai N, Notterman D A, et al. Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proceedings of the National Academy of Sciences of the United States of America, 1999, 96:6745-6750.
    [113] Brown M P S, Grundy W N, Lin D, et al. Knowledge-based analysis of microarray gene expression data by using support vector machines. Proceedings of the National Academy of Sciences of the United States of America, 2000,97(1): 262-267.
    [114] Dudoit S, Fridyand J, Speed T P. Comparison of discrimination methods for the classification of tumor using gene expression data. Journal of American Statistical Association,2002,97(457): 77-87.
    [115] Guyon I, Weston J, Barnhill S, et al. Gene selection for cancer classification using support vector machine. Machine Learning, 2002,46(1/3): 389-422.
    [116] Waterman M S. Introduction to Computational Biology: Maps, sequences and genomes. Boca Raton, FL: CRC Press, 1995.
    [117] Randic M. Condensed representation of DNA primary sequences. Journal of Chemical Information and Computer Sciences, 2000,40: 50-56.
    [118] Randic M. On characterization of DNA primary sequences by a condensed matrix, Chemical Physics Letters, 2000,317:29-34.
    
    [119] Hamori E. Graphical representation of long DNA sequences by methods of H-curves, current results and future aspects. BioTechniques 1989,7: 710-720.
    [120] Nandy A. A new graphical representation and analysis of DNA sequence structure: I. Methodology and Application to Globin Genes. Current Science, 1994,66: 309-314.
    [121] Roy A, Raychaudhury C, Nandy A. A novel technique of graphical representation and analysis of DNA sequences - A review. Journal of Biosciences, 1998,23(1): 55-71.
    [122] Randic M, Vracko M, Nandy A, et al. On 3-D graphical representation of DNA primary sequences and their numerical characterization. Journal of Chemical Information and Computer Sciences, 2000,40:1235-1244.
    [123] Randic M, Vracko M, Lers N, et al. Novel 2-D graphical representation of DNA sequences and their numberical characterization. Chemical Physics Letters, 2003,368:1-6.
    [124] Randic M, Vracko M, Lers N, et al. Analysis of similarity/dissimilarity of DNA sequences based on novel 2-D graphical representation. Chemical Physics Letters, 2003, 371:202-207.
    [125] Yuan C, Liao B, Wang T. New 3-D graphical representation of DNA sequences and their numerical characterization. Chemical Physics Letters, 2003,379:412-417.
    [126] Liao B, Wang T. Analysis of similarity/dissimilarity of DNA sequences based on 3-D graphical representation. Chemical Physics Letters, 2004,388:195-200.
    [127] Randic M, Guo X, Basak SC. On the characterization of DNA primary sequence by triplet of nucleic acid bases. Journal of Chemical Information and Computer Sciences, 2001,41:619-626.
    [128] He P, Wang J. Characteristic sequences for DNA primary sequence. Journal of Chemical Information and Computer Sciences, 2002,42:1080-1085.
    [129] Randic M, Balaban AT. On a four-dimensional representation of DNA primary sequences. Journal of Chemical Information and Computer Sciences, 2003,43: 532-539.
    [130] Guo XF, Randic M, Basak SC. A novel 2-D graphical representation of DNA sequences of low degeneracy. Chemical Physics Letters, 2001,350:106-112.
    [131] Liu Y, Guo X, Xu J et al. Some notes on 2-D graphical representation of DNA sequence. Journal of Chemical Information and Computer Sciences, 2002,42: 529-533.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700