生物子序列频数分布与肿瘤亚型分类模型研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
生物信息的爆炸式增长吸引了大量科研人员加入到生物信息学研究领域,使得生物信息学很快成为全球关注与研究的焦点。我们主要研究了生物信息学中的两个基本问题:(1)关于k-长DNA子序列在基因组全序列中出现频数的分布问题;(2)关于基于基因表达谱的肿瘤分子诊断问题。对于这两个问题的研究,都取得了非常好的实验结果,具有理论和现实意义,有助于生物信息学的发展。针对问题一,分别从DNA序列的可视化表示、k-长DNA子序列出现频数分布及其计数算法三个方面展开研究。针对问题二,分别从肿瘤特征抽取和信息基因选择两个方面研究了肿瘤亚型分类模型。
     DNA序列可视化表示对于研究其结构与功能具有至关重要的意义,它有助于重复子序列的识别、内含子与外显子的区分以及DNA序列进化等方面的研究。我们首先综述性研究了几种DNA序列的可视化表示方法,比较了生成DNA序列分形图像的Hao方法与经典的混沌游戏表示方法的异同点,讨论了禁止子序列中回文子序列情况,阐述了迭代函数系统产生分形吸引子的数学机理,详细介绍了根据Moore自动机与迭代函数系统定义的混沌自动机,并研究了以DNA序列驱动混沌自动机产生分形图像的方法,提出DNA序列三联密码子的分形图像表示方法,并对其进行了初步分析研究,提出进一步需要解决的问题。
     我们在生成DNA序列分形图像的Hao方法的基础上进一步提出一种能够直观显示k-长DNA子序列频数分布差异性的三维频数分布图生成方法,其优点是能够更加直观地观察k-长DNA子序列频数分布。然后把三维频数分布图转化为我们提出的一维对数频谱图,突出显示了频数分布的局部特征,并以一维对数频谱图为依据提出k-长DNA子序列频数区划分准则,详细研究了甚高频数区的n阶零间隔现象,发现并论证了,n阶零间隔分布就是基因组进化过程所留痕迹的有力证据,并给出一维对数频谱图特征的生物学解释。实验发现许多DNA序列频数概率分布近似服从非中心F分布,这个新发现有一定的普适性;对于分布呈多峰现象的DNA序列,可采用多个非中心F分布的叠加来拟合。在比较了非中心F分布与Gamma分布后,提出一种结合二者在拟合方面具有互补优势的新分布,实验证明这种新分布能够更好地吻合实际DNA序列的频数分布。然后研究了两种最特异出现频数(最高出现频数与出现频数为1的k-长DNA子序列个数)与k值的关系,发现不同物种的这两种关系具有良好的一致性,比如发现k-长DNA子序列最高出现频数与k值的关系与指数概率分布函数只相差一个常数因子。最后探讨了DNA序列的进化模型。
     因为现实世界中的基因组规模非常大,所以对k-长DNA子序列的出现频数进行计数并不是一件容易的事。我们提出并研究了k-长DNA子序列在DNA全序列中出现频数的计数问题,设计并实现了k-长DNA子序列内部计数算法和外部计数算法。该算法通过一个哈希函数把k-长DNA子序列映射为整数关键字从而把k-长DNA子序列出现频数的计数问题转化为整数关键字的重复计数问题,使得能够利用经典B树算法来解决频数计数问题,并针对待解问题的特点提出三种改进措施以进一步提高算法的性能。
     基于基因表达谱的肿瘤亚型分类方法有望成为临床医学上一种快速有效的肿瘤分子诊断方法,但由于目前肿瘤基因表达谱样本集存在维数过高、样本量很小以及噪音很大等特点,使得选择肿瘤信息基因或从基因表达谱中抽取肿瘤分类特征成为一件有挑战性的工作。国内外专家学者对肿瘤分类问题已开展了广泛深入的研究。我们在总结肿瘤分类研究成果的基础上概括出基于基因表达谱的肿瘤分类过程模型,阐述了分类过程模型的关键环节及其常用方法,提出肿瘤分类过程模型的分类方法,并过程模型比较了前人的研究成果,指出目前肿瘤分类研究中存在的问题。
     针对肿瘤特征抽取问题,设计了六种方法以获得肿瘤分类特征,分别是:1)主成份分析方法PCA,2)因子分析方法FA,3)独立分量分析方法ICA,4)小波包分解方法WPD,5)基于离散余弦变换(DCT)的PCA方法,6)基于离散Fourier变换(DFT)的PCA方法。实验采用两种肿瘤样本集(结肠癌和急性白血病样本集)验证了这六种方法的有效性。实验结果表明,所提出的方法不仅分类性能好而且各有其特点,都能在保持较高的分类准确率前提下大幅地降低基因表达谱数据维数。在分类性能方面,基于DCT变换的PCA方法是一个比较理想的数据降维方法,对于结肠癌组织样本,交叉验证识别准确率高达96.77%,而对于急性白血病组织样本,其准确率高达100%。因子分析方法和独立分量分析方法有助于分析样本集的结构特征,实验发现只需少量的因子或独立分量就可以获得很高的分类性能,由此推测,只需3~4个肿瘤信息基因就可以获得很高的分类性能的假设,为设计优秀的肿瘤信息基因选择算法提供了先验知识。
     尽管采用肿瘤特征抽取方法获得了好的实验结果,但是肿瘤信息基因选择仍是必不可少的工作。从基因表达谱的成千上万个基因中选择尽可能多的、分类能力尽可能强而基因数量却尽可能少的信息基因子集是一个挑战性工作。在没有先验知识的情况下,在如此大的基因空间中进行穷尽搜索是不可能的事情。为此我们提出了两类近似算法来解决肿瘤信息基因的选择问题。一类是采用经典粗糙集模型和邻域粗糙集模型的属性约简算法进行信息基因选择的方法。由于采用经典粗糙集模型的属性约简算法需要对数据进行离散化处理而导致信息损失,致使选出的肿瘤信息基因分类性能不高。为避免这个问题,我们又以邻域粗糙集模型的属性约简算法FARNeM(forward attribute reduction based on neighborhood model)为基础,设计了十一种信息基因选择算法以解决肿瘤亚型分类问题。实验结果表明,该方法能够快速搜索到分类准确率更高的信息基因子集。为提高NEC(neighborhood classifier)分类器在样本不均衡时的分类性能,对NEC分类器进行改进提出了一种适合于样本不均衡数据集的加权邻域分类器;同时我们还把适合于多分类问题的特征选择算法Simba(iterative search margin based algorithm)引入到肿瘤分类领域中,以丰富肿瘤信息基因选择方法的多样性;为增加分类模型的可信度提出一种基于邻域粗糙集模型的概率神经网络集成方法对肿瘤样本集进行分类;为实用的肿瘤分子诊断软件研制奠定了基础。
     另一类是根据获得的肿瘤基因表达谱样本集的结构特征提出的以支持向量机分类器为评估准则的肿瘤信息基因启发式宽度优先搜索算法,其优点是能够同时搜索到基因数量尽可能少而分类能力尽可能强的多个肿瘤信息基因子集。实验采用了三种肿瘤样本集验证了这种分类算法的可行性和有效性。对于急性白血病组织样本集,只需2个信息基因就能获得100%的4-折交叉验证分类准确率(共发现14个这样的两基因子集);而对于难以分类的结肠癌组织样本集,只需4个信息基因就可获得100%的4-折交叉验证分类准确率(共发现7个这样的四基因子集);对于小圆蓝细胞肿瘤(Small Round Blue Cells Tumor,SRBCT)数据集,同样只需4个信息基因就能获得100%的4-折交叉验证分类准确率(共发现504个这样的四基因子集);实验结果与我们的预测假设十分吻合。与国内外其它优秀的肿瘤分类算法相比,我们的实验结果在综合分类性能方面超过目前所有已知的分类算法。为更加客观地评价肿瘤分类模型的分类性能,我们提出一种能够消除肿瘤样本集的不同划分对分类性能造成影响的一种称之为全折交叉验证的方法,实验证明这是一种更加客观反映分类性能的评估方法;同时针对多肿瘤亚型样本集提出一种推断肿瘤亚型相关信息基因的方法。
The exponential growth of the cumulative biological data has attracted a number of scientists to be engaged on the study of bioinformatics which has become the focus of world's attention. The two basic problems on bioinformatics are investigated in this dissertation. One is the occurrence frequency distribution of k-mers in whole chromosome, and another is the molecular diagnosis of tumor based on gene expression profiles. The satisfactory experiment results have been achieved from the study on the two problems, which has a profound implication theoretically and realistically and which is helpful to the development of bioinformatics. The visual and fractal representation of DNA sequences, the occurrence frequency distribution of k-mers in DNA sequences and the counting algorithm for k-mers in DNA sequences are deeply investigated to the first basic problem. The feature extraction and informative gene selection for tumor classification are studied on the second problem.
     The visual representation of DNA sequences plays a profound role in DNA structure and its function, which is especially helpful to the recognition of repeatitive subsequences, the recognition of intron and exon, and the evolution of DNA sequences, etc. Firstly, the two methods including Hao's method and the classic chaos game representation method that can generate the fractal image of DNA sequences are introduced, and further the comparison and contrast between the two methods are provided in details. On the basis of these work, the palindromes are discussed in forbidden k-mers. Secondly, after the mathematical principle of the iterated function system generating fractal attractor is introduced, the chaos automata is defined according to the Moore finite state machine together with the iterated function system; and the method of chaos automata generating fractal image driven by DNA sequences is investigated deeply and further. At last, the method of generating fractal image driven by codon sequences in DNA sequences is proposed and investigated, and further many unsolved problem are presented.
     Furthermore, on the basis of Hao's method which allows the depiction of the occurrence frequency of k-mers in the form of fractal image, a novel method that can generate 3D occurrence frequency distribution map (OFDM) of k-mers in genome is proposed, and its advantage is that the difference in occurrence frequency of k-mers is obviously exhibited for biologists. Then the partition criterion for occurrence frequency segment is proposed according to the 1D logarithm histogram into which is transformed from 3D OFDM. The 1D histogram can show the local feature of the occurrence frequency distribution (OFD) of k-mers, i.e. the OFD of k-mers in ultrahigh frequency segment appears discontinuous in integer. The palindromes in forbidden k-mers are further studied in forbidden segment. The phenomena of n-order zero interval (n-OZI) in ultrahigh frequency segment is deeply investigated. Moreover, it is proposed that the distribution of n-OZI is the mark in the process of genome evolving and many features of the logarithm histogram of occurrence frequency are successfully explained from the view of biology. On the basis of many experiments, it is discovered and validated that the OFD of k-mers is subjected to non-central F distribution. Adopting several non-central F distributions can fit the density distribution of the occurrence frequency of k-mers in genome which has the same number of peaks. On the basis of experiments, the comparison between non-central F distribution and Gamma distribution which was proposed to fit genome distribution by Hsieh and Luo is studied through experiments. Due to the complement of the two distributions in fitting the density distribution of genome, a new probability distribution which combines non-central F distribution with Gamma distribution is presented, and experiments show that the new distribution is better than any single of the two distributions in fitting genome density distribution. After the relationship between the maximal frequency of k-mers in genome and the length of k-mers and the relationship between the number of different k-mers which occur only once in genome and the length of k-mers are deeply investigated, and it is discovered that the two relationships among many species are consistent each other, which are the evidences on the neutral evolution theory of genome.
     The problem that all k-mers in whole genome are counted simultaneously is researched, which is not an easy task because the size of the natural genome is too great. Therefore, the internal and external counting algorithms which count all k-mers in genome are designed and implemented. These algorithms firstly translate the problem of counting all k-mers into the problem of counting integer keys by the help of a hash function which maps a k-mer into an integer, so we can apply the classic B-tree algorithm to solve the problem of counting all k-mers in genome. At last, three measures are proposed to further improve the efficiency of the algorithm according to the features of DNA sequences.
     The tumor diagnosis method based on gene expression profiles will be developed into the fast and effective method in clinical domain in the near future. Although DNA microarray experiments provide us with huge amount of gene expression data, only a few of genes relate to tumor among the gene expression profiles. Moreover, it is a challenging task to extract feature or select informative genes related to tumor from gene expression profiles because of its characteristics such as the high dimensionality, the small sample set and many noises and redundancy in gene expression profiles. Therefore, the molecular diagnosis of tumor has been broadly and deeply investigated and a large number of relevant papers are published. At first we detailedly introduce the techniques and methods in tumor classification process model, and then the classification method for the tumor classification process model is proposed. At last the research results in tumor classification are summarized in the past several years, and the problem such as how to rationally assess the results of tumor classification and the further research on tumor classification are presented.
     After analyzing those tumor classification methods, we have designed six feature extraction approaches to extract the classification feature of tumor based on gene expression profiles. The six approaches are principal component analysis (PCA), factor analysis (FA), independent component analysis (ICA), wavelet package decomposition (WPD), PCA method based on discrete cosine transform (DCT) and PCA based on discrete Fourier transform (DFT) which are assessed by experiments on two well-known datasets which are the colon dataset and the leukemia dataset. The experiments with support vector machines (SVM) show that the six approaches not only can obtain good classification performance but also have their own characteristics. The proposed six approaches can extract a small quantity of components which are validated classification features related to tumor when retaining higher recognition rate; and the results of classification can be visualized in the form of 2D or 3D scatter plot. Among the six approaches, the PCA method based on DCT is the best method on reducing dimensionality and can achieve the highest classification accuracy; the cross-validation accuracy of 96.77% has been achieved for colon dataset using SVM classifier and 100% for leukemia dataset also. To FA and ICA, experiment results indicate that only several components can obtain higher classification performance. Therefore, we can deduce that only several genes (i.e. 3 or 4) in gene expression profiles can achieve higher classification accuracy, which is the basis of our further designing informative gene selection algorithm.
     The informative gene selection is need despite that the proposed feature extraction methods are effective in tumor classification. However, the accurate classification of tumor by selecting the tumor-related genes from thousands of genes is a difficulty task due to the large number of redundant genes, and usually it is impossible to apply an exhaustive algorithm to search informative gene subset in such large gene space. Therefore, two kinds of approximate algorithm which can search informative gene subsets are proposed and implemented.
     One is applying attribute reduction method based on classical rough set model and neighborhood rough set model to searching informative gene subsets. The classification accuracy rate of informative gene subsets obtained by using attribute reduction method based on classical rough set model is not high because there is information loss caused by discretization of gene expression profiles. To avoid this problem we design eleven gene selection methods based on the FARNeM (forward attribute reduction based on neighborhood model) algorithm to classify tumor sample set, which are proved effectively and quickly in searching informative gene subset. To improve the classification accuracy of the imbalanced tumor dataset, a weighted neighborhood classifier based on the neighborhood classifier is proposed, which is proved more effectively to tumor dataset with complex boundary between every two subtypes, and we introduce feature algorithm Simba (iterative search margin based algorithm) to the tumor classification domain based on gene selection, which also adapts to multi-class tumor dataset. The probability neural network ensemble based on the neighborhood rough set model is proposed to classify tumor dataset to obtain more reliable experiment results.
     Another method is the novel heuristic breadth-first search algorithm (HBSA) based on SVM classifier which can simultaneously find many informative gene subsets in which the number of informative genes is almost least but its classification accuracy is almost highest in spite of its characteristic of time-consuming. Experiments on three tumor sample sets show that the proposed approach is feasible and effective. The 4-fold cross-validation accuracy of 100% has been achieved by only two genes for leukemia dataset (14 2-gene subsets are found totally like this) and 100% by only four genes for colon dataset (7 4-gene subsets are found totally like this), which are superior to the results of other classification methods, and 100% by only four genes for SRBCT (Small Round Blue Cells Tumor) dataset (504 4-gene subsets are found totally like this). Experiment results are consistent with our prediction assumption. Compared with other tumor classification methods, our experiment results are obviously superior. To reflect the true classification accuracy rate of classifier, we proposed a full-fold cross-validated method which can eliminate the affect of the different partition for tumor sample set and can more objectively evaluate classification model.
引文
[1]贺林.解码生命-人类基因组计划和后基因组计划.北京:科学出版社,2000年4月
    [2]International Human Genome Sequencing Consortium.Initial sequencing and analysis of the human genome.Nature,2001,409:860-941
    [3]Lander E S.The new genomics:global views of biology.Science,1996,274(5287):536-539
    [4]Michael Y Galperin.The molecular biology database collection:2007 update.Nucleic Acids Research,2007,Volume 35(Database issue):D3-D4
    [5]Watson J D and Crick F H C.Genetic implications of the structure of deoxyribonucleic acid.Nature,1953,171:964-967
    [6]易翔,殷勤伟.RNA干扰技术和2006年诺贝尔医学奖.中国科学基金,2006,20(6):330-332.
    [7]Tusterman M,Ketting R F,Plasterk R H.The genetics of RNA silencing[J].Annu Rev Genet,2002,36:489-519.
    [8]Baldi P,Brunak S著,张东晖等译.生物信息学-机器学习方法.北京:中信出版社,2003
    [9]Zhang M Q.Large-scale gene expression data analysis:a new challenge to computational biologists.Genome Res.,1999,9:681-688
    [10]陈铭.后基因组时代的生物信息学.生物信息学,2004,2(2):29-34
    [11]Ouzounis C A and Valencia A.Early bioinformatics:the birth of a discipline-a personal view.Bioinformatics,2003,19(17):2176-2190
    [12]Gamow G,Rich A,and Ycas M.The problem of information transfer from nucleic acids to proteins.Adv.Biol.Med.Phys.,1956,4:23-68
    [13]Anfinsen C B.Principles that govern the folding of protein chains.Science,1973,181:223-230
    [14]Anfinsen C B and Scheraga H A.Experimental and theoreticai aspects of protein folding.Adv.Protein Chem.,1975,29:205-300
    [15]Horowitz N H.On the evolution of biochemical syntheses.Proc.Natl Acad.Sci.USA,1945,31:153-157
    [16]Britten R J and Davidson E H.Gene regulation for higher cells:a theory.Science,1969,165:347-357
    [17]Turing A M.The chemical basis for morphogenesis.Phil.Trans.R.Soc.London B,1952,237:37-72
    [18]Chaitin G J.On the length of programs for computing finite binary sequences.J. ACM, 1966, 13: 547-569
    [19] Shannon C E and Weaver W. The Mathematical Theory of Communication. University of Illinois Press, Urbana, IL., 1962
    [20] Chomsky N. On certain formal properties of grammar. Inform. Control, 1959, 2:137-167
    [21] Martin-L(?)f P. The definition of random sequences. Inform. Control, 1966, 9:602-619
    [22] Neumann J V. and Morgenstern, O.. Theory of Games and Economic Behavior.Princeton University Press, Princeton, USA, 1953
    [23] Neumann J V. Theory of Self-Reproducing Automata. University of Illinois Press,Urbana,IL, 1966
    
    [24] Ingram V M. Gene evolution and the haemoglobins. Nature, 1961, 189: 704-708
    [25] Margoliash, E.. Primary structure and evolution of cytochrome. Proc. Natl Acad.Sci. USA, 1963, 50:672-679
    [26] Zuckerkandl E and Pauling L. Molecules as documents of evolutionary history. J.Theor. Biol., 1965, 8: 357-366
    [27] Ramachandran G N, Ramakrishnan C, and Sasisekharan V. Stereochemistry of polypeptide chain configurations. J. Mol.Biol., 1963, 7: 95-99
    [28] Gatlin L L. The information content of DNA. J. Theor. Biol., 1966, 10: 281-300
    [29] Nolan C and Margoliash E. Comparative aspects of primary structures of proteins.Ann. Rev. Biochem., 1968, 37: 727-791
    
    [30] Crick F H C. The origin of the genetic code. J. Mol. Biol., 1968, 38:367-379
    [31] Woese C R. The problem of evolving a genetic code.BioScience, 1970, 20:471-485
    [32] Alff-Steinberger C. The genetic code and error transmission. Proc. Natl Acad. Sci.USA, 1969,64:584-591
    [33] Crick F H C. Codon-anticodon pairing: the wobble hypothesis. J. Mol. Biol., 1966,19:548-555
    [34] Fitch W M and Margoliash E. Construction of phylogenetic trees. Science, 1967,155:279-284
    [35] Cantor C R. The occurrence of gaps in protein sequences. Biochem. Biophys. Res.Comm., 1968,31:410-416
    
    [36] Kimura M. Evolutionary rate at the molecular level. Nature, 1968, 217: 624-626
    [37] Nei M. Gene duplication and nucleotide substitution in evolution. Nature, 1969,221:40-42
    [38] Gibbs A J and McIntyre G A. The diagram, a method for comparing sequences.Eur. J. Biochem., 1970, 16: 1-11
    [39] Needleman S B and Wunsch C D. A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol., 1970, 48:443-453
    
    [40] King J L and Jukes T H. Non-Darwinian evolution. Science, 1969, 164: 788-798
    [41] Clarke B. Selective constraints on amino-acid substitution during the evolution of proteins. Nature, 1970,228: 159-160
    [42] Epstein C J. Non-randomness of amino-acid changes in the evolution of homologous proteins. Nature, 1967, 215: 355-359
    [43] Krzywicki A and Slonimski P P. Formal analysis of protein sequences: I. Specific long-range constraints in pair associations of amino acids. J. Theor. Biol., 1967, 17:136-158
    [44] Pain R H and Robson B. Analysis of the code relating sequence to secondary structure in proteins. Nature, 1970, 227, 62-63
    [45] Ptitsyn O B and Finkelstein A V. Similarities of protein topologies: evolutionary divergence, functional convergence or principles of folding? Qu. Rev. Biophys.,1969, 13: 339-386
    [46] Dunnill P. The use of helical net-diagrams to represent protein structures. Biophys.J., 1968, 8: 865-875
    [47] Schiffer M and Edmundson A B. Use of helical wheels to represent the structures and to identify segments with helical potential. Biophys. J., 1967, 7: 121-135
    [48] Fitch W M and Margoliash E. Usefulness of amino acid and nucleotide sequences in evolutionary studies. Evol. Biol., 1970, 4: 67-109
    [49] Jukes T H. Recent advances in studies of evolutionary relationships between proteins and nucleic acids. Space Life Sci., 1969, 1: 469-490
    [50] West M W and Ponnamperuma C. Chemical evolution and the origin of life. Space Life Sci., 1970,2:225-295
    
    [51] Ohno S. Evolution by Gene Duplication. Springer-Verlag, New York, 1970.
    [52] Crick F H C. Central dogma of molecular biology. Nature, 1970, 227: 561-563
    [53] Koch R E. The influence of neighboring base pairs upon base-pair substitution mutation rates. Protein Natl Acad. Sci. USA, 1971, 68: 773-776
    [54] Lee B and Richards F M. The interpretation of protein structures: estimation of static accessibility. J. Mol. Biol., 1971, 55: 379-400
    [55] Fitch W M. Toward defining the course of evolution: minimum change for a specific tree topology. Syst. Zool., 1971, 20: 406-416
    [56] Tinoco I, Uhlenbeck O C, and Levine M D. Estimation of secondary structure in ribonucleic acids. Nature, 1971, 230: 362-367
    [57] Beyer W A, Stein M L, Smith T F, and Ulam S M. A molecular sequence metric and evolutionary trees. Math. Biosci., 1974, 19: 9-25
    [58] Gibbs A J, Dale M B, Kinns H R, and MacKenzie H G. The transition matrix method for comparing sequences; its use in describing and classifying proteins by their amino acid sequences. Syst. Zool., 1971, 20: 417-425
    [59] Grantham R. Amino acid difference formula to help explain protein evolution.Science, 1974, 185: 862-864
    [60] Kimura M. The rate of molecular evolution considered from the standpoint of population genetics. Proc. Natl Acad. Sci. USA, 1969, 63: 1181-1188
    [61] Ohta T and Kimura M. Functional organization of genetic material as a product of molecular evolution. Nature, 1971, 233: 118-119
    [62] Kimura M. The Neutral Theory of Molecular Evolution. Cambridge University Press, Cambridge, 1983.
    [63] Jukes T H and Holmquist R. Evolutionary clock: nonconstancy of rate in different species. Science, 1972, 177: 530-532
    [64] Kimura M and Ohta T. On some principles governing molecular evolution. Proc.Natl Acad. Sci. USA, 1974, 71: 2848-2852
    [65] Levin L A. On the notion of a random sequence. Soviet Math. Dokl., 1973, 14:1413-1416
    [66] Sankoff D and Sellers P H. Shortcuts, diversions and maximal chains in partially ordered sets. Discr. Math., 1973, 4: 287-293
    [67] Wagner R A and Fischer M J. The string to string correction problem. J. ACM,1974,21: 168-173
    [68] Beyer W A, Stein M L, Smith T F, and Ulam S M. A molecular sequence metric and evolutionary trees. Math. Biosci., 1974, 19: 9-25
    [69] Gordon A D. A sequence-comparison statistic and algorithm. Biometrika, 1973, 60:197-200
    [70] Kimura M and Ohta T. On the stochastic model for estimation of mutational distance between homologous proteins. J. Mol. Evol., 1972, 2: 87-90
    [71] Wu T T, Fitch W M, and Margoliash E. The information content of protein amino acid sequences. Ann. Rev. Biochem., 1974, 43: 539-566
    [72] Novotny J. Genealogy of immunoglobulin polypeptide chains: a consequence of amino acid interactions, conserved in their tertiary structure. J. Theor. Biol., 1973,41: 171-180
    [73] Holmquist R, Jukes T H, and Pangburn S. Evolution of transfer RNA. J. Mol. Biol.,1973,78:91-116
    [74] Aho V A, Hirschberg D S, and Ullman J D. Bounds on the complexity of the longest common subsequences problem. J. ACM, 1976, 23: 1-12
    [75] Chvatal V and Sankoff D. Longest common subsequences of two random sequences. J. Appl. Prob., 1975, 12: 306-315
    [76] Delcoigne A and Hansen P. Sequence comparison by dynamic programming. Biometrika, 1975, 62: 661-664
    [77] Hirschberg D S. A linear space algorithm for computing maximal common subsequences. Commun. ACM, 1975, 18: 341-343
    [78] Lowrance R and Wagner R A. An extension of the string-to-string correction problem. J. ACM, 1975, 22: 177-183
    [79] Okuda T, Tanaka E, and Kasai T. A method for correction of garbled words based on the Levenshtein metric. IEEE Trans. Comput. C, 1976, 25: 172-177
    [80] Waterman M S, Smith T F, and Beyer W A. Some biological sequence metrics.Adv. Math., 1976, 20: 367-387
    
    [81] Felsenstein J. The number of evolutionary trees. Syst. Zool., 1978, 27: 27-33
    [82] Klotz L C, Komar N, Blanken R L, and Mitchell R M. Calculation of evolutionary trees from sequence data. Proc. Natl. Acad. Sci. USA, 1979, 76: 4516-4520
    [83] Sattath S and Tvertsky A. Additive similarity trees. Psychometrika, 1977, 42:319-345
    [84] Waterman M S and Smith T F. On the similarity of dendrograms. J. Theor. Biol.,1978, 73: 789-800
    [85] Waterman M S, Smith T F, Singh M, and Beyer W A. Additive evolutionary trees.J. Theor. Biol., 1977, 64: 199-213
    
    [86] Chothia C. Structural invariants in protein folding. Nature, 1975, 254: 304-308
    [87] Chothia C, Levitt M, and Richardson D. Structures of proteins: packing of alpha-helices and pleated sheets. Proc. Natl Acad. Sci. USA, 1977, 74: 4130-4134
    [88] Chou P Y and Fasman G D. Prediction of protein conformation. Biochemistry,1974,13:222-244/225
    [89] Chou P Y and Fasman G D. Prediction of the secondary structure of proteins from their amino acid sequence. Adv. Enzymol., 1978, 47: 45-148
    [90] Crippen G M. The tree structural organization of domains in globular proteins. J.Mol. Biol., 1978, 126: 315-332.
    [91] Gamier J, Osguthorpe D J, and Robson B. Analysis of the accuracy and implications of simple methods for predicting the secondary structure of globular proteins. J. Mol. Biol., 1978, 120: 97-120
    [92] Hagler A T and Honig B. On the formation of protein tertiary structure on a computer. Proc. Natl Acad. Sci. USA, 1978, 75: 554-558
    [93] Lim V I. Algorithms for prediction of a -helical and β -structural regions in globular proteins. J. Mol. Biol., 1974, 88: 873-894
    [94] Crippen G M. A novel approach to the calculation of conformation: distance geometry. J. Comput. Phys., 1977, 26: 449-452
    [95] Feldmann R J. The design of computing systems for molecular modeling. Ann. Rev. Biophys. Bioeng., 1976, 5: 477-510
    [96] Goodman M, Moore G W, and Masuda G. Darwinian evolution in the genealogy of haemoglobin. Nature, 1975, 253: 603-608
    [97] Jukes T H. and King J L. Evolutionary loss of ascorbic acid synthesizing ability. J.Hum. Evol., 1975,4:85-88
    [98] Albery W J and Knowles J R. Evolution of enzyme function and the development of catalytic efficiency. Biochemistry, 1976, 15: 5631-5640
    [99] Dickerson R E, Timkovich R, and Almassy R J. The cytochrome fold and the evolution of bacterial energy metabolism. J. Mol. Biol., 1976, 100: 473-491
    [100] Heinrich R and Rapoport T A. Metabolic regulation and mathematical models.Prog. Biophys. Mol. Biol., 1977, 32: 1-82
    
    [101] Gilbert W. Why genes in pieces? Nature, 1978,271: 501-501
    [102] Riley M and Anilionis A. Evolution of the bacterial genome. Ann. Rev.Microbiol., 1978, 32: 519-560
    [103] Waterman M S and Smith T F. RNA secondary structure: a complete mathematical analysis. Math. Biosci., 1978, 42: 257-266
    [104] Schwartz R M and Dayhoff M O. Origins of prokaryotes, eukaryotes,mitochondria, and chloroplasts. Science, 1978, 199: 395-403
    
    [105] Savageau M A. Allometric morphogenesis of complex systems: derivation of the basic equations from first principles. Proc. Natl Acad. Sci. USA, 1979, 76:6023-6025
    [106] Savageau M A. Growth of complex systems can be related to the properties of their underlying determinants. Proc. Natl Acad. Sci. USA, 1979, 76: 5413-5417
    [107] Dayhoff M O. Atlas of Protein Sequence and Structure, 1978, Vol. 4, Suppl. 3.National Biomedical Research Foundation, Washington, D.C., U.S.A
    [108] Bernstein F C, Koetzle T F, Williams G J B, Meyer E F, Brice M D et al. The protein data bank: a computer based archival file for macromolecular structures. J.Mol. Biol., 1977, 112, 535-542
    [109] Devereux J, Haeberli P, and Smithies O. A comprehensive set of sequence analysis programs for the VAX. Nucleic Acids Res., 1984, 12: 387-395
    [110] Gingeras T R and Roberts R J. Steps toward computer analysis of nucleotide sequences. Science, 1980, 209: 1322-1328
    [111] Hall P A V and Dowling G R. Approximate string matching. Comput. Surv., 1980,12:381-402
    [112] Maizel J V J and Lenk R P. Enhanced graphic matrix analysis of nucleic acid and protein sequences. Proc. Natl Acad. Sci. USA, 1981, 78: 7665-7669
    [113] Grantham R, Gautier C, Gouy M, Mercier R, and Pave A. Codon catalog usage and the genome hypothesis. Nucleic Acids Res., 1980, 8: 49-62
    [114] Trifonov E N and Sussman J L. The pitch of chromatin DNA is reflected in its nucleotide sequence. Proc. Natl Acad. Sci. USA, 1980, 77: 3816-3820
    [115] Nussinov R and Jacobson A B. Fast algorithm for predicting the secondary structure of single-stranded RNA. Proc. Natl Acad. Sci. USA, 1980, 77:6309-6313
    [116] Fox G E, Stackenbrandt E, Hespell R B, Gibson J, Maniloff J, Dyer T A, Wolfe R S, Balch W E, Tanner R S, Magrum L J et al. The phylogeny of prokaryotes.Science, 1980, 209: 457-463
    [117] Doolittle W F and Sapienza C. Selfish genes, the phenotype paradigm and genome evolution. Nature, 1980, 284: 601-603
    [118] Dover G and Doolittle W F. Modes of genome evolution. Nature, 1980, 288:646-647
    [119] Hopfield J J. Neural networks and physical systems with emergent collective computational abilities. Proc. Natl Acad. Sci. USA, 1982, 79: 2554-2558
    [120] Conrad M. On design principles for a molecular computer. Comm. ACM, 1985,28: 464-480
    [121] Drexler K E. Molecular engineering: an approach to the development of general capabilities for molecular manipulation. Proc. Natl Acad. Sci. USA, 1981, 78:5275-5278
    [122] Burks C and Farmer D. Towards modeling DNA sequences as automata. Physica D,1984, 10: 157-167
    [123] Reggia J A, Armentrout S L, Chou H-H, and Peng Y. Simple systems that exhibit self-directed replication. Science, 1993, 259: 1282-1287
    [124] Wolfram S. Cellular automata as models of complexity. Nature, 1984, 311:419-424
    [125] Shepard RN. Multidimensional scaling, tree-fitting and clustering. Science, 1980,210: 390-398
    [126] Smith T F and Waterman M S. Comparison of biosequences. Adv. Appl. Math.,1981,2:482-489
    [127] Smith T F and Waterman M S. Identification of common molecular subsequences.J. Mol. Biol., 1981, 147: 195-197
    [128] Lipman D J and Pearson W R. Rapid and snseitive protein similarity searches.Science, 1985, 227: 1435-1441
    [129] Sellers P H. The theory and computation of evolutionary distances: pattern recognition. J. Algorithms, 1980, 1: 359-373
    [130] Ukkonen E. Algorithms for approximate string matching. Inform. Control, 1985,64: 100-118
    [131] Guibas L J and Odlyzko A M. Long repetitive patterns in random sequences. Z. Wahrschr. verw. Gebiete, 1980, 53: 241-262
    [132] Steele J M. Long common subsequences and the proximity of two random strings.SIAM J. Appl. Math., 1982, 42: 731-737
    [133] DeWachter R. The number of repeats expected in random nucleic acid sequences and found in genes. J. Theor. Biol., 1981, 91: 71-98
    [134] Martinez H M. An efficient method for finding repeats in molecular sequences.Nucleic Acids Res., 1983, 11: 4629-4634
    [135] Fristensky B. Improving the efficiency of dot-matrix similarity searches through use of an oligomer table. Nucleic Acids Res., 1986, 14: 597-610
    [136] Brutlag D L, Clayton J, Friedland P, and Kedes L H. SEQ: a nucleotide sequence analysis and recombination system. Nucleic Acids Res., 1982, 10: 279-294
    [137] Lyall A, Hammond P, Brough D, and Glover D. BIOLOG—a DNA sequence analysis system in Prolog. Nucleic Acids Res.,1984, 12: 633-642
    [138] Carrillo H and Lipman D J. The multiple sequence alignment problem in biology.SIAM J. Appl. Math., 1988, 48: 1073-1082
    [139] Feng D-F, Johnson M S and Doolittle R F. Aligning amino acid sequences: commonly used methods. J. Mol. Evol., 1985, 21: 112-125
    [140] Hogeweg P and Hesper B. The alignment of sets of sequences and the construction of phylogenetic trees. An integrated method. J. Mol. Evol., 20,175-186
    [141] Sankoff D and Cedergren R J. Simultaneous comparison of three or more sequences related by a tree. In: Sankoff, D. and Kruskal, J.B. (eds) Time Warps,String Edits, and Macromolecules: The Theory and Practice of Sequence Comparison. Addison-Wesley, Reading, MA, 1983, 253-263
    [142] Feng D F and Doolittle R F. Progressive sequence alignment as a prerequisite to correct phylogenetic trees. J. Mol. Evol., 1987, 25: 351-360
    [143] Higgins D G and Sharp P M. CLUSTAL: a package for performing multiple sequence alignment on a microcomputer. Gene, 1988, 73: 237-244
    [144] Gribskov M, Homyak M, Edenfield J, and Eisenberg D. Profile scanning for three-dimensional structural patterns in protein sequences. Comput. Appl. Biosci.,1988,4:61-66
    [145] Gribskov M, McLachlan M, and Eisenberg D. Profile analysis: detection of distantly related proteins. Proc. Natl Acad. Sci. USA, 1987, 84: 4355-5358
    [146] Altschul S F and Erickson B W. Optimal sequence alignment using affine gap costs. Bull. Math. Biol., 1986, 48: 603-616
    [147] Arratia R, Gordon L, and Waterman M. An extreme value theory for sequence matching. Ann. Stat., 1986, 14: 971-993
    [148] Arratia R and Waterman M S. Critical phenomena in sequence matching. Ann. Prob., 1985, 13: 1236-1249
    [149] Arratia R and Waterman M S. An Erdos-Renyi law with shifts. Adv. Math., 1985,55: 13-23
    [150] Karlin S, Ghandour G, Ost F, Tavare S, and Korn L J. New approaches for computer analysis of nucleic acid sequences. Proc. Natl Acad. Sci. USA, 1983,80:5660-5664
    [151] Tavare S. Some probabilistic and statistical problems in the analysis of DNA sequences. In: Miura, R.M. (ed.) Some Mathematical Questions in Biology—DNA Sequence Analysis, Vol. 17. American Mathematical Society,Providence, RI, 1986, 57-86
    [152] Wilbur W J and Lipman D J. The context dependent comparison of biological sequences. SIAM J. Appl. Math., 1984, 44: 557-567
    [153] Abarbanel R M, Wieneke P R, Mansfield E, Jaffe D A, and Brutlag D L. Rapid searches for complex patterns in biological molecules. Nucleic Acids Res., 1984,12: 263-280
    [154] Sellers P H. Pattern recognition in genetic sequences by mismatch density. Bull.Math. Biol., 1984, 46: 501-514
    [155] Fitch W M and Smith T F. Optimal sequence alignments. Proc. Natl Acad. Sci.USA, 1983, 80: 1382-1386
    [156] Schneider T D, Stormo G D, Gold L, and Ehrenfeucht A. Information content of binding sites on nucleotide sequences. J. Mol. Biol., 1986, 188: 415-431
    [157] Ebeling W and Jimenez-Montafio M A. On grammars, complexity, and information measures of biological macromolecules. Math. Biosci., 1980, 52:53-71
    [158] Jimenez-Montano M A. On the syntactic structure of protein sequences and the concept of grammar complexity. Bull. Math. Biol., 1984, 46: 641-659
    [159] Hopp T P and Woods K R. Prediction of protein antigenic determinants from amino acid sequences. Proc. Natl Acad. Sci. USA, 1981, 78: 3824-3828
    [160] Fickett J W. Recognition of protein coding regions in DNA sequences. Nucleic Acids Res., 1982, 10: 5303-5318
    [161] Shepherd J C W. Method to determine the reading frame of a protein from the purine/pyrimidine genome sequence and its possible evolutionary justification.Proc. Natl Acad. Sci. USA, 1981, 78, 1596-1600
    [162] Staden R and McLachlan A D. Codon preference and its use in identifying protein coding regions in long DNA sequences. Nucleic Acids Res., 1982, 10: 141-156
    [163] Stormo G D, Schneider T D, Gold L, and Ehrenfeucht A. Use of the 'perceptron' algorithm to distinguish translational initiation sites in E.coli. Nucleic Acids Res.,1982, 10:2997-3011
    [164] Dumas J-P and Ninio J. Efficient algorithms for folding and comparing nucleic acid sequences. Nucleic Acids Res., 1982, 10: 197-206
    [165] Turner D H, Sugimoto N, and Freier S M. RNA structure prediction. Ann. Rev.Biophys. Biophys. Chem., 1988, 17: 167-192
    [166] Felsenstein J. Numerical methods for inferring evolutionary trees. Qu. Rev. Biol.,1982, 57: 379-404
    [167] Doolittle R F. Of URFs and ORFs: A primer on how to analyze derived amino acid sequences. University Science Books, Mill Valley, CA, 1986
    [168] Heijne G V. Sequence analysis in molecular biology: treasure trove or trivial pursuit. Academic Press, San Diego, CA, 1987
    
    [169] Philipson L. The DNA data libraries. Nature, 1988, 332: 676-686
    [170] Bilofsky H S, Burks C, Fickett J W, Goad W B, Lewitter F I, Rindone W P,Swindell C D, and Tung C S. The GenBank genetic sequence data bank. Nucleic Acids Res., 1986, 14: 1-4
    [171] Hamm G H and Cameron G N. The EMBL data library. Nucleic Acids Res., 1986,14: 5-9
    [172] Kelly J M and Meyer E F J. Storage and retrieval of nucleic acid sequence data.Comput. Chem., 1983, 4: 107-111
    [173] Lesk A M. The EMBL data library. In: Lesk, A.M. (ed.) Computational Molecular Biology. Sources and Methods for Sequence Analysis. Oxford University Press, Oxford, 1988, 55-65
    
    [174] Kristofferson D. The BIONET electronic network. Nature, 1987, 325: 555-556
    [175] Smith D H, Brutlag D L, Friedland P, and Kedes L H. BIONET: a national computer resource for molecular biology. Nucleic Acids Res., 1986, 14: 17-20
    [176] Burks C, Lawton J R, and Bell G I. The LiMB database. Science, 1988, 241:888-898
    [177] Lawton J R, Martinez F A, and Burks C. Overview of the LiMB database. Nucleic Acids Res., 1989, 17: 5885-5899
    
    [178] Cannon G C. Sequence analysis on microcomputers. Science, 1987, 238: 97-103
    [179] Davison D. Sequence similarity ('homology') searching for molecular biologists.Bull. Math. Biol., 1985, 47, 437-474
    [180] Henikoff S and Wallace J C. Detection of protein similarities using nucleotide sequence databases. Nucleic Acids Res., 1988, 16: 6191-6204
    [181] Lawrence C B, Goldman D A, and Hood R T. Optimized homology searches of the gene and protein sequence data banks. Bull. Math. Biol., 1986, 48: 569-583
    [182] Orcutt B C and Barker W C. Searching the protein sequence database. Bull. Math.Biol., 1984,46:545-552
    [183] Thornton J M and Gardner S P. Protein motifs and data-base searching. Trends Biochem. Sci., 1989,14: 300-304
    
    [184] Heijne G V. Getting sense out of sequence data. Nature, 1988, 333: 605-607
    [185] Islam S A and Sternberg M J E. A relational database of protein structures designed for flexible enquiries about conformation. Protein Eng., 1989, 2:431-442.
    [186] Rawlings C J. Designing databases for molecular biology. Nature, 1988, 334:477-477.
    [187] Collins J F and Coulson A F W. Applications of parallel processing algorithms for DNA sequence analysis. Nucleic Acids Res., 1984, 12: 181-192
    [188] Core N G, Edmiston E W, Saltz J H, and Smith R M. Supercomputers and biological sequence comparison algorithms. Comput. Biomed. Res., 1989, 22:497-515
    [189] Edmiston E W, Gore N G, Saltz J H, and Smith R M. Parallel processing of biological sequence comparison algorithms. Int. J. Parallel Program, 1988, 17,259-275
    [190] Gotoh O and Tagashira Y. Sequence search on a supercomputer. Nucleic Acids Res., 1986, 14: 57-64
    [191] Huang X. A space-efficient parallel sequence comparison algorithm for a message passing multiprocessor. Int. J. Parallel Program., 1989, 18: 223-239
    [192] Lopresti D. P-NAC: a systolic array for comparing nucleic acid sequences.Computer, 1987, 20: 98-99
    [193] DeLisi C. Computers in molecular biology: current applications and emerging trends. Science, 1988, 240: 47-52
    [194] Rossmann M G and Argos P. Exploring structural homology of proteins. J. Mol.Biol., 1976, 105: 75-95
    
    [195] Rashin A A. Locations of domains in globular proteins. Nature, 1981, 291: 85-86
    [196] Kyte J and Doolittle R F. A simple method for displaying the hydropathic character of a protein. J. Mol. Biol., 1982, 157: 105-132
    [197] Sweet R M and Eisenberg D. Correlation of sequence hydrophobicities measures similarity in three-dimensional protein structure. J. Mol. Biol., 1983, 171:479-488
    [198] Lesk A M and Hardman K D. Computer-generated schematic diagrams of protein structures. Science, 1982, 216: 539-540
    [199] Brooks B and Karplus M. Fractal surfaces of proteins. Proc. Natl Acad. Sci. USA,1983, 80: 6571-6575
    [200] Braun W. Representation of short- and long-range handedness in protein structures by signed distance maps. J. Mol. Biol., 1983, 163: 613-621
    [201] Connolly M L. Solvent-accessible surfaces of protein and nucleic acids. Science, 1983,221:709-713
    [202] Swanson R. A vector representation for amino acid sequences. Bull. Math. Biol.,1984,46:623-639
    [203] Yamamoto K and Yoshikura H. A new representation of protein structure: vector diagram. Comput. Appl. Biosci., 1986, 2: 83-88
    [204] Jones T A and Thirup S. Using known substructures in protein model building and crystallography.EMBOJ., 1986, 5: 819-822
    [205] Taylor W R. The classification of amino acid conservation. J. Theor. Biol., 1986,119:205-218
    [206] Rackovsky S and Goldstein D A. Protein comparison and classification: a differential geometric approach. Proc. Natl Acad. Sci. USA, 1988, 85: 777-781
    [207] Rooman M and Wodak S J. Identification of predictive sequence motifs limited by protein structure data base size. Nature, 1988, 335: 45-49
    [208] Unger R, Harel D, Wherland S and Sussman J L. A 3D building blocks approach to analyzing and predicting structure in proteins. Proteins, 1989, 5: 355-373
    [209] Jones T A. Interactive computer graphics: FRODO. Meth. Enzymol., 1985, 115:157-171
    
    [210] Priestle J P. RIBBON: a stereo cartoon drawing program for proteins. J. Appl.Crystallogr., 1988, 21: 572-576
    [211] Cohen F E and Sternberg M J E. On the prediction of protein structure: the significance of the root-mean-square deviation. J. Mol. Biol., 1980, 138: 321-333
    [212] McLachlan A D. Rapid comparison of protein structures. Acta Cryst. A, 1982, 38:871-873
    [213] Taylor W R and Orengo C A. Protein structure alignment. J. Mol. Biol., 1989,208:1-22
    [214] Klein P. Prediction of protein structural class by discriminant analysis. Biochim.Biophys. Acta, 1986, 874: 205-215
    [215] Klein P and DeLisi C. Prediction of protein structural class from the amino acid sequence. Biopolymers, 1986, 25: 1659-1672
    [216] Nishikawa K, Kubota Y and Ooi T. Classification of proteins into groups based on amino acid composition and other characters. I. J. Biochem., 1983, 94:981-995
    [217] Nishikawa K, Kubota Y, and Ooi T. Classification of proteins into groups based on amino acids composition and other characters. II. J. Biochem., 1983, 94:997-1007.
    [218] Greer J. Comparative model-building of the mammalian serine proteases. J. Mol.Biol., 1981, 153: 1027-1042
    [219] Kabsch W and Sander C. On the use of sequence homologies to predict protein structure: identical pentapeptides can have completely different conformations.Proc. Natl Acad. Sci. USA, 1984, 81: 1075-1078
    [220] Holm L and Sander C. Fast and simple Monte Carlo algorithm for side chain optimization in proteins: application to model building by homology. Proteins,1992,14,213-223
    [221] Ponder J W and Richards F M. Tertiary templates for proteins: use of packing criteria in the enumeration of allowed sequences for different structural classes. J.Mol. Biol., 1987, 193: 775-791
    [222] Chothia C. Principles that determine the structures of proteins. Ann. Rev.Biochem., 1984, 53: 537-572
    [223] Richardson J. Anatomy and taxonomy of protein structure. Adv. Protein Chem.,1981,34: 168-339
    [224] Branden C-I. Relation between structure and function of a/p proteins. Qu. Rev.Biophys., 1980, 13:317-338
    [225] Cohen F E, Steinberg M J E, and Taylor W R. Analysis of the tertiary structure of protein P-sheet sandwiches. J. Mol. Biol., 1981, 148: 253-272
    [226] Chothia C, Levitt M and Richardson D. Helix to helix packing in proteins. J. Mol.Biol., 1981,145:215-250
    [227] Chothia C and Janin J. Relative orientations of close-packed β-pleated sheets in proteins. Proc. Natl Acad. Sci. USA, 1981, 78: 4146-4150
    [228] Sibanda B L and Thornton J L. β-Hairpin families in globular proteins. Nature,1985,316: 170-174
    [229] Lasters I, Wodak S J, Alard P, and Cutsem E V. Structural principles of parallel β-barrels in proteins. Proc. Natl Acad. Sci. USA, 1988, 85: 3338-3342
    [230] Leszczynski J F and Rose G D. Loops in globular proteins: a novel category of secondary structure. Science, 1986, 234: 849-855
    [231] Cohen C and Parry DAD. A-Helical coiled coils—a widespread motif in proteins.Trends Biochem. Sci., 1986, 11: 245-248
    [232] Craik C S, Sprang S, Fletterick R, and Rutter W J. Intron-exon splice junctions map at protein surfaces. Nature, 1982, 299: 180-182
    [233] Craik C S, Rutter W J, and Fletterick R. Splice junctions: association with variation in protein structure. Science, 1983, 220: 1125-1129
    [234] Wuthrich K. Protein structure determination in solution by nuclear magnetic resonance spectroscopy. Science, 1989, 243: 45-50
    [235] Braun W. Distance geometry and related methods for protein structure determination from NMR data. Qu. Rev. Biophys., 1987, 19: 115-157
    [236] Gower J C. Euclidean distance geometry. Math. Sci., 1982, 7: 1-14
    [237] Gower J. Properties of Euclidean and non-Euclidean distance matrices. Linear Algebra Appl., 1985, 67: 81-97
    [238] Brunger A T, Clore M G, Gronenborn A M, and Karplus M. Three-dimensional structure of proteins determined by molecular dynamics with interproton distance restraints: application to crambin. Proc. Natl Acad. Sci. USA, 1986, 83:3801-3805
    [239] Bajaj M and Blundell T. Evolution and the tertiary structure of proteins. Ann. Rev.Biophys. Bioeng., 1984, 13: 453-492
    [240] Altschuh D, Vernet T, Berti P, Moras D, and Nagai K. Coordinated amino acid changes in homologous protein families. Protem Eng., 1988, 2: 193-199
    [241] Chothia C and Lesk A M. The relation between the divergence of sequence and structure in proteins. EMBO J., 1986, 5: 823-826
    [242] Wilbur W J. On the PAM matrix model of protein evolution. Mol. Biol. Evol.,1985,2:434-447
    [243] Graur D. Amino acid composition and the evolutionary rates of protein-coding genes. J. Mol. Evol., 1985, 22: 53-62
    [244] Reeck G R, Haen C D, Teller D C, Doolittle R F, Witch W M, Dickerson R E,Chambon P, McLachlan A D, Margoliash E, Jukes T H et al. "Homology" in proteins and nucleic acids: a terminology muddle and a way out of it. Cell, 1987,50:667-667
    [245] Bashford D, Chothia C, and Lesk A M. Determinants of a protein fold: unique features of the globin amino acid sequences. J. Mol. Biol., 1987, 196: 199-216
    [246] Lesk A M and Chothia C. How different amino acid sequences determine similar protein structures: the structure and evolutionary dynamics of the globins. J. Mol.Biol., 1980, 136,225-270
    [247] Lesk A M and Chothia C. Evolution of proteins formed by β-sheets. II. The core of the immunoglobulin domains. J. Mol. Biol., 1982, 160: 325-342
    [248] Neurath H. Evolution of proteolytic enzymes. Science, 1984, 224: 350-357
    [249] Mathews F S. The structure, function and evolution of cytochromes. Prog.Biophys. Mol. Biol., 1985, 45: 1-56
    [250] George D G, Hunt T L, Yeh L-S L, and Barker W C. New perspectives on bacterial ferredoxin evolution. J. Mol. Evol., 1985, 22: 20-31
    [251] Rothschild L J, Ragan M A, Coleman A W, Hey wood P, and Gerbi S A. Are rRNA sequence comparisons the Rosetta Stone of phylogenetics? Cell, 1986, 47:640-640
    
    [252] Gilbert W. Genes-in-pieces revisited. Science, 1985, 228.
    [253] Brutlag D L. Molecular arrangement and evolution of heterochromatic DNA. Ann.Rev. Genet., 1980, 14: 121-144
    [254] Cedergren R, Gray M W, Abel Y, and Sankoff D. The evolutionary relationships among known life forms. J. Mol. Evol., 1988, 28: 98-112
    [255] Breslauer K J, Frank R, Blocker H, and Marky L A. PredictingDNAduplex stability from the base sequence. Proc. Natl Acad. Sci. USA, 1986, 83:3746-3750
    [256] Blake R D and Earley S. Distribution and evolution of sequence characteristics in the E.coli genome. J. Biomol. Struct. Dyn., 1986, 4: 291-307
    [257] Sankoff D and Goldstein M. Probabilistic models of genome shuffling. Bull. Math.Biol., 1989, 51: 117-124
    
    [258] Ohta T. Simulating evolution by gene duplication. Genetics, 1987, 115: 207-213
    [259] Sharp P. On the origin of RNA splicing and introns. Cell, 1985, 42: 397-400
    [260] Bulmer M. A statistical analysis of nucleotide sequence of introns and exons in human genes. Mol. Biol. Evol., 1987, 4: 395-405
    [261] Gilbert W, Marchionni M, and McKnight G. On the antiquity of introns. Cell,1986,46: 143-147
    [262] Naora H, Miyahara K, and Curnow R N. Origin of noncoding DNA sequences:molecular fossils of genome evolution. Proc. Natl Acad. Sci. USA, 1987, 84:6195-6199
    [263] Doolittle R F. Of URFs and ORFs: A primer on how to analyze derived amino acid sequences. University Science Books, Mill Valley, CA, 1986.
    [264] Britten R. Rates of DNA sequence evolution difference between taxonomic groups. Science, 1986,231: 1393-1398
    [265] Grantham R, Gautier C, Gouy M, Jacobzone M, and Mercier R. Codon catalog usage is a genome strategy modulated for gene expressivity. Nucleic Acids Res.,1981,9:43-74
    [266] Felsenstein J. Evolutionary trees from DNA sequences: a maximum likelihood approach. J. Mol. Evol., 1981, 17: 368-376
    [267] Felsenstein J. Confidence limits on phylogenies: an approach using the bootstrap.Evolution, 1985, 39: 783-791
    [268] Felsenstein J. Phylogenies from molecular sequences: inference and reliability.Ann. Rev. Genet, 1988, 22: 521-565
    [269] Felsenstein J. PHYLIP: phylogeny inference package. Cladistics, 1988, 5:355-356
    [270] Altschul S F, Gish W, Miller W, Myers E W, and Lipman D J. Basic local alignment search tool. J. Mol. Biol., 1990, 215: 403-410
    [271] Sayle R A and Milner-White E J. RASMOL: biomolecular graphics for all.Trends Biochem. Sci., 1995, 20: 374-374
    [272] Richardson D C and Richardson J S. The kinemage: a tool for scientific communication. Protein Sci., 1992, 1: 3-9
    [273] Brunak S, Engelbrecht J, and Knudsen S. Neural network detects errors in the assignment of mRNA splice sites. Nucleic Acids Res., 1990, 18: 4797-4801
    [274] Fickett J W and Tung C-S. Assessment of protein coding measures. Nucleic Acids Res., 1992,20:6441-6450
    [275] Guigo R, Knudsen S, Drake N, and Smith T F. Prediction of gene structure. J.Mol.Biol., 1992,226: 141-157
    [276] Mural R J and Uberbacher E C. Locating protein-coding regions in human DNA sequences by a multiple sensor-neural network approach. Proc. Natl Acad. Sci.USA, 1991,88: 11261-11265
    [277] States D J and Botstein D. Molecular sequence accuracy and the analysis of protein coding regions. Proc. Natl Acad. Sci. USA, 1991, 88: 5518-5522
    [278] Rost B and Sander C. Improved prediction of protein secondary structure by use of sequence profiles and neural networks. Proc. Natl Acad. Sci. USA, 1993, 90:7558-7562
    [279] Walls P H and Sternberg M J. New algorithm to model protein-protein recognition based on surface complementarity. Applications to antibody-antigen docking. J. Mol. Biol., 1992, 228: 277-297
    [280] Bowie J U, Luethy R, and Eisenberg D. A method to identify protein sequences that fold into a known three-dimensional structure. Science, 1991, 253: 164-170
    [281] Jones D T, Taylor W R and Thornton J M. A new approach to protein fold recognition. Nature, 1992, 358: 86-89
    [282] Ouzounis C, Sander C, Scharf M and Schneider R. Prediction of protein structure by evaluation of sequence-structure fitness: aligning sequences to contact profiles derived from three-dimensional structures. J. Mol. Biol., 1973, 232: 805-825
    [283] Thornton J M, Flores T P, Jones D T, and Swindells M B. Prediction of progress at last. Nature, 1992, 354: 105-106
    [284] Gonnet G H, Cohen M A, and Benner S A. Exhaustive matching of the entire protein sequence database. Science, 1992, 256: 1443-1445
    
    [285] Minoru Kanehisa. Post-genome informatics. Oxford University Press, 2001.
    [286] Phillips A, Cardelli L, and Castagna G. A graphical representation for biological processes in the stochastic Pi-calculus. Transactions in Computational Systems Biology (TCSB), November 2006, 4230: 123-152.
    [287] Pinney J W, Westhead D R, and McConkey G A. Petri Net representations in systems biology, J. Biochemical Society, 2003, 1513-1515
    [288] Peleg M, Yeh I, and Altaian R B. Modelling biological process using workfolow and Petri net models. Bioinformatics, 2002, 18(6): 825-837
    [289] Biermann S, Uhrmacher A M, and Schumann H. Supporting multi-level models in systems biology by visual methods. In: Proceedings of the European Simulation Multiconference in Magdeburg,June,13-16
    [290]赵明生,尚彤,孙冬泳,蒋景宏,汤健,吴佑寿.电子细胞的研究现状与展望,2001,29(12):1740-1743
    [291]黄培堂 沈倍奋.生物恐怖防御.北京:科学出版社,2005
    [292]刘振华,陈晓红.肿瘤防治的现状与思路调整.中国医院,2003,7(2):41-43
    [293]Mitra S and Hayashi Y.Bioinformatics with soft computing.IEEE Transactions on Systems,Man,and Cybernetics,2006,36(5):616-636.
    [294]Valverde S,SoléRV.Logarithmic growth dynamics in software networks.Europhysics Letters,2005,72(5):858-864
    [295]Valverde S,Ferrer C R,and Solé R V.Scale-free networks from optimal design [J].Europhysics Letter,2002,60(4):512-517
    [296]Bornholdt S,Schuster H G.Handbook of graphs and networks:from the genome to the Internet[M].Berlin:Wiley-VCH,2003.
    [297]Dorogovtsev S N,Mendes J F F.Evolution of networks from biological nets to the Internet and WWW[M].New York:Oxford University Press,2003.
    [298]Adleman L M.Molecular computation of solutions to combinatorial problems.Science,1994,266:1021-1024
    [299]Spencer Wells.The journey of Man:a genetic odyssey.Penguin Books,2003.
    [300]James D Watson,Tania A.Baker,Stephen P Bell,Alexander Gann,Michael Levine,and Richard M Losick编著,杨焕明等译.基因的分子生物学,北京:科学出版社,2005
    [301]孙啸,陆祖宏,谢建明编著.生物信息学基础,北京:清华大学出版社,2005.
    [302]沈世镒著.生物序列突变与比对的结构分析.北京:科学出版社,2004.
    [303]王翼飞,史定华主编.生物信息学-智能化算法及其应用.北京:化学工业出版社,2006
    [304]陈浩明,薛京伦编.医学分子遗传学.北京:科学出版社,2005
    [305]Brodeur QM,Horgqty MD.Gene amplification in human cancers:biological and clinical significance.In:Vogelstein B,Kinzler kw.The genetic basis of human cancer.New York:McGraw-Hill,1998,181
    [306]薛开先.基因表达的调控.人类遗传学概论,上海:复旦大学出版社,1996,132
    [307]Putt K S,Chen G W,Pearson J M,Sandhorst J S,Hoagland M S,Kwon J-T,Hwang S-K,Jin H,Churchwell M I,Cho M-H,Doerge D R,Helferich W G,Hergenrother P J.Small molecule activation of procaspase-3 to caspase-3 as a personalized anti-cancer strategy.Nature Chem.Biol.,2006,2:543-550
    [308]Putt K S,Nesterenko V,Dothager R S,Hergenrother P J.The compound 13-D selectively induces apoptosis in white blood cancers versus other cancer cell types. ChemBioChem, 2006, 7: 1916-1922
    [309] Nesterenko V, Putt K S, Hergenrother P J. Identification from a combinatorial library of a small molecule that selectively induces apoptosis in cancer cells. J.Am. Chem. Soc. 2003, 125: 14672-14673
    [310] Dothager R S, Putt K S, Allen B J, Leslie B J, Nesterenko V, Hergenrother P J.Synthesis and identification of small molecules that potently induce apoptosis in melanoma cells through G1 cell cycle arrest. J. Am. Chem. Soc, 2005, 127:8686-8696
    [311] Goode D R, Sharma A K, Hergenrother P J. Using peptidic inhibitors to systematically probe the S1' site of caspases-3 and -7. Org. Lett., 2005, 7,3529-3532
    [312] Hood L, Heath JR, Phelps ME, et al. Systems biology and new technologies enable predictive and preventative medicine. Science, 2004, 306 (5696): 640-643.
    [313] Halvorsen OJ, Oyan AM, Bo TH, et al. Gene expression profiles in prostate cancer: association with patient subgroups and tumor differentiation. Int J Onco 1,2005, 26(2): 329-336.
    [314] Celis JE, Moreira JM, Gromova I, et al. Towards discovery-driven translational research in breast cancer. FEBS J, 2005, 272(1): 2-15.
    [315] Berger JA, Mitra SK, Carli M, and Neri A. Visualization and analysis of DNA sequences using DNA walks. Journal of the Franklin Institute, 2004, 341: 37-53.
    [316] Jeffrey H J. Chaos game representation of gene structure. Nucleic Acid Research,1990, 18(8): 2163-2170
    [317] Hao B L. Fractals from genomes— exact solutions of a biology-inspired problem.Physica A, 2000, 282:225-246
    [318] Hao B L, Lee H C, Zhang S Y. Fractals related to long DNA sequences and complete genomes. Chaos, Solitons and Fractals, 2000, 11: 825-836
    [319] Ashlock D, Golden J B. Iterated function system fractals for the detection and display of DNA reading frame. To appear in the Proceedings of the 2000 Congress on Evolutionary Computation. 2000
    [320] Ashlock D, Golden J. Chaos automata: iterated function systems with memory.Physica D, 2003, 181:274-285
    [321] National Center for Biotechnology Information (NCBI), National Library of Medicine, National Institutes of Health, http://www.ncbi.nlm.gov/, NCBI GenBank, http://www.ncbi.nlm.nih.gov/Genbank/, NCBI Genomes,http://www.ncbi.nlm.nih.gov/Genomes/.
    [322] Peng C K, Buldyrev S, Goldberger A, Havlin S, Sciortino F, Simons M, and Stanley H E. Long-range correlations in nucleotide sequences. Nature, 1992, 356: 168-171
    [323]陈晓燕,鲍伦军,莫金垣,蔡沛祥.用墨西哥帽小波研究DNA序列的碎形特征.化学学报,2003,61(2):273-278
    [324]Mandelbrot B.SIAM Rev.1968,10:422
    [325]Gates M A.J.Theor.Biol.,1986,119:319
    [326]Larionov S A,Loskutov A Y,and Ryadchenko E V.Genome as a two-dimensional walk.Doklady Physics,2005,50(12):634-638
    [327]罗辽复,蔡禄合.核酸序列的分形维数与进化的相关性.内蒙古大学学报(自然科学版),1987,18(4):717-722.
    [328]Li W and Kaneko K.Europhys.Lett.,1992,17:655
    [329]Zhang C T and Zhang R.Analysis of distribution of bases in the coding sequences by a diagrammatic technique.Nucleic Acids Res.,1991,19:6313-6317
    [330]Zhang R and Zhang C T.Z curves,an intuitive tool for visualizing and analyzing DNA sequences.J.Biomol.Struc.Dyn.,1994,11:767-782
    [331]张春霆.人与其他生物基因组若干重要问题的生物信息学研究.自然科学进展,14(12):1367-1374
    [332]Zhang,C.T.,and R.Zhang.An isochore map of the human genome based on the Z curve method.Gene,2003,317:127-135.
    [333]Hao B L,Xie H M,Yu Z G.Factorizable language:from dynamics to bacterial complete genomes.Physica A,2000,288:10-20
    [334]Hao B L,Zheng W M.Applied Symbolic Dynamics and Chaos,Singapore:World Scientific,1998.
    [335]Gorban A N,Popova T G,Sadovsky M G.Classification of symbol sequences over their frequency dictionaries:towards the connection between structure and natural taxonomy.Open system & information Dynamics,2000,7(1).
    [336]Tino P.Multifractal properties of Hao's geometric representation of DNA sequences.Physica A304,2002,480-494
    [337]Hao B L,Xie H M,Yu Z G,and Chen G Y.Avoided strings in bacterial complete genomes and a related combinatorial problem.Annals of Combinatorics,2000,4:247-255
    [338]Barnsley M F,Demko S.Rational approximations of fractals.Lecture Notes in Math,1984,1105:73-88.
    [339]Barnsley M F,Dernko S.Iterated function systems and the global construction of fractals.In:Proc Royal Soc London A399,1985:243-275.
    [340]章立亮.基于概率分布的IFS吸引子生成的研究.东华大学学报:自然科学版,2006,32(5):58-61.
    [341]Fleischmann R D,Adams M D,and White O.Whole-genome random sequencing and assembly of haemophilus influenzae Rd.Science,1995,269:496-512
    [342]The C.elegans Sequencing Consortium.Sequence and analysis of the genome of C.elegans.Science,1998,282:2012-2018.
    [343]Xie H M,Hao B L.Visualization of K-tuple distribution in prokaryote complete genomes and their randomized counterparts[J].IEEE Pro Comp sys Bioinf,2003,31-42
    [344]Shen J J,Zhang S Y,Lee H C,Hao B L.SeeDNA:Visualization of k-string content of long DNA sequences and their randomized counterparts,Genomics,Proteomics & Bioinformatics,2004,2(3):192-196
    [345]陈惟昌,陈志华,陈志义,王自强,邱红霞.遗传密码和DNA序列的高维空间数字编码.生物物理学报,2000,16(4):760-768
    [346]Hsieh L C,Luo L F,Lee H C.Short segmental duplication:parsimony in growth of microbial genomes[J]Genomes Biology,2003,4(9):7-20
    [347]Percus O E and Whitlock P A.Theory and application of Marsaglia's monkey test for pseudorandom number generators.ACM Transactions on Modeling and Computer Simulation,1995,5:87-100
    [348]张尧庭,方开泰.多元统计分析引论.科学出版社,1997,468-473.
    [349]Smith S O and Wilcox K W.A restriction enzyme from Hemophilus influenzae.J.Mol.Biol.,1970,51:379-391
    [350]Hao B L,Ji Q.Prokaryote phylogeny without sequence alignment:from avoidance signature to composition distance.J.Bioinformatics and Computational Biology,2004,2:1-19
    [351]Zhou Y,Mishra B.Models of Genome Evolution.Modeling in Molecular Biology,Lecture Notes in Computer Science,Natural Computing Series,Springer,2004,287-304.
    [352]Hsieh L C,Luo L F and Lee H C.Evidence for Growth of Microbial Genomes by Short Segmental Duplications.Proceedings of the Computational Systems Bioinformatics(CSB'03).Computer Society IEEE,2003,474-475
    [353]Hsieh L C,Luo L F,Ji F M,Lee H C.Minimal model for genome evolution and growth.Phys.Rev.Lett.,2003,90:018101-104
    [354]Bridges C B.The Bar 'gene' a duplication.Science,1936,83:210-211
    [355]Stephens S G.Possible significance of duplication in evolution.Adv.Genet.1951,4:247-265
    [356]Nei M.Gene duplication and nucleotide substitution in evolution.Nature,1969,221:40-42
    [357]Jianzhai Zhang.Evolution by gene duplication:an update.Trends in Ecology and Evolution.2003,18(6):292-298
    [358] Himmelreich R et al. Complete sequence analysis of the genome of the bacterium Mycoplasma pneumoniae. Nucleic Acids Res. 1996, 24: 4420-4449
    [359] Tomb JF et al. The complete genome sequence of the gastric pathogen Helicobacterpylori. Nature, 1997, 388: 539-547
    [360] Rubin GM et al. Comparative genomics of the eukaryotes. Science, 2000, 287:2204-2215
    [361] Klenk HP et al. The complete genome sequence of the hyperthermophilic,sulphate-reducing archaeon Archaeoglobus fulgidus. Nature 1997, 390: 364-370
    [362] The Arabidopsis Genome Initiative. Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature, 2000, 408: 796-815
    [363] Li WH et al. Evolutionary analyses of the human genome. Nature, 2001, 409:847-849
    [364] Woese C. The universal ancestor. Proceedings of National Academy of Sciences of the USA,1998, 95: 6854-6859
    [365] Li WH and Graur D. Fundamentals of Molecular Evolution. Sunderland (Massachusetts): Sinauer Associates, 1991
    [366] Green ED and Chakravarti A. The human genome sequence expedition: views from the "base camp". Genome Res., 2001, 11: 645-651
    [367] Dong S, Searls DB. Gene structure prediction by linguistic methods. Genomics,1994, 23(3): 540-551
    [368] Ferragina P and Grossi R. The string B-tree: A new data structure for string search in external memory and its application. Journal of ACM, 1999, 46(2): 236-280
    [369] Choi J H and Cho H G. Analysis of common k-mers for whole genome sequences using SSB-tree. Genome Informatics, 2002, 13: 30-41
    [370] Sadakane K, Shibuya T. Indexing Huge Genome Sequences for Solving Various Problems. Genome informatics, 2001, 12: 175-183
    [371] Knuth D E. The Art of Computer Programming, Vol.3, 2~(nd) ed. Addison Wesley Longman, 1998.
    [372] Alizadeh A et al. Towards a novel classification of human malignancies based on gene expression, J. Pathol., 2001, 195: 41-52.
    [373] Duggan D J, Bittner M, Chen Y, Meltzer P, and Trent J. Expression profiling using cDNA microarrays. Nature Genetics, 1999, 21: 10-14
    [374] Schena M et al. Quantitative monitoring of gene expression patterns with a cDNA microarray. Science, 1995, 270: 467-470
    [375] Shalon D, Smith S J, Brown P O. A DNA micro-array system for analyzing complex DNA samples using two-color fluorescent probe hybridization. Genome Research, 1996, 6: 639-645
    
    [376] Lipshutz R J, Fodor S P A, Gingeras T R, and Lockhart D J. High density synthetic oligonucleotide arrays.Nature Genetics,1999,21:20-24
    [377]Lockhart D J,Dong H,Byme M C,Follettie M T,Gallo M V,Chee M S,Mittmann M,Wang C,Kobayashi M,Horton H,and Brown E L.Expression monitoring by hybridization to high-density oligonucleotide arrays.Nature Biotechnology,1996,14:1675-1680
    [378]Schadt E E,Cheng L,Cheng S,Wong W H.Analyzing high-density oligonucleotide gene expression array data.Journal of Cellular Biochemistry,1999,80:192-202
    [379]邹健,冉志华,萧树东.分类表达谱基因芯片在医学研究中的作用.肿瘤防治研究,2006,33(3):209-212.
    [380]Chen W,Salojin K V,Mi QS,et al.Insulin-like growth factor (IGF)-1/IGF-binding protein-3 complex:therapeutic efficacy and mechanism of protection against type 1 diabetes[J].Endocrinology,2004,145(2):627-638
    [381]Bostic P,Dodd G L,Villinger F,et al.Dysregulation of the Polo-Like Kinase Pathway in CD4+ T Cells Is Characteristic of Pathogenic Simian Immunodeficiency Vires Infection[J].J Virol,2004,78(3):1464-1472
    [382]Vacca A,Ria R,Semeraro F,et al.Endothelial cells in the bone marrow of patients with multiple myeloma[J].Blood,2003,102(9):3340-3348
    [383]Golub T R,Slonim D K,Yamayo P,Huard C,Gaasenbeek M,Mesirov J P,Coller H,Lob M L,Downing J R,Caligiuri M A,Bloomfield C D,and Lander E S.Molecular classification of cancer:class discovery and class prediction by gene expression monitoring.Science,1999,286:531-537
    [384]Wooster R.Cancer classification with DNA microarrays:is less more? Trends in Genetics,2000,16(8):327-329
    [385]Draghici Set al.Onto-Tools,the toolkit of the modem biologist:Onto-Express,Onto-Compare,Onto-Design and Onto-Translate.Nucl.Acids.Res.,2003.31(13):3775-3781
    [386]Branca M.GENETICS AND MEDICINE:Putting gene arrays to the test.Science,2003,300(5617):238
    [387]Cho S B and Won H H.Machine learning in DNA microarray analysis for cancer classification.Proceedings of the First Asia-Pacific Bioinformatics Conference on Bioinformatics,2003,189-198
    [388]Berrar E P,Dubitzki W,Granzow M.(Eds.) Practical approach to microarray data analysis.Kluwer Academic Press,London,2003.
    [389]Deb K and Reddy A R.Reliable classification of two-class cancer data using evolutionary algorithms,BioSystems,2003,72:111-129
    [390]Deutsch J M.Evolutionary algorithms for finding optimal gene sets in microarray prediction, Bioinformatics, 2003,19: 45-52
    [391] Li Y X, Li J G, Ruan X G. Study of informative gene selection for tissue classification based on tumor gene expression profiles. Chinese journal of computers, 2006, 29(2): 324-330
    [392] Kira K, Rendell L A. The feature selection problem: traditional methods and a new algorithm. In: Swartout W. ed.. Proceedings of the 10th National Conference on Artificial Intelligence. Cambridge, MA: AAAI Press/The MIT Press, 1992,129-134
    [393] Kononenko I.. Estimating attributes: Analysis and extensions of Relief. In:Bergadano F., Raedt L.D. eds.. Proceedings of European Conference on Machine Learning. Berlin: Springer-Verlag, 1994, 171-182
    [394] Wang S L, Wang J, Chen H W, and Zhang B Y. SVM-based tumor classification with gene expression data. Advanced Data Mining and Applications,Springer-Verlag Berlin Heidelberg, 2006, 864-870.
    [395] Wang S L, Wang J, Chen H W, Li S T. Feature extraction and classification of tumor based on wavelet package and support vector machines. PAKDD 2007,LNAI 4426, 2007, 871-878.
    [396] Furlanello C, Serafini M, Merler S, Jurman G. An accelerated procedure for recursive feature ranking on microarray data. Neural Networks, 2003, 16:641-648
    [397] Tang C, Zhang A D, Pei J. Mining phenotypes and informative genes from gene expression data. Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, Aug, 2003, 655-660
    [398] Varma S, Simon R. Iterative class discovery and feature selection using minimal spannig trees. BMC Bioinformatics, 2004, 5: 126
    [399] Liu H, Li J, and Wong L. A comparative study on feature selection and classification methods using gene expression profiles and proteomic patterns.Genome Inform., 2002, 13: 51-60
    [400] Ben-Dor A, Bruhn L, Friedman N, Nachman I, Schummer M, and Yakhini N.Tissue classification with gene expression profiles. Journal of Computional Biology, 2000, 7: 559-584
    [401] Kira K and Rendell L A. The feature selection problem: traditional methods and a new algorithm. National Conference on Machine Intelligence, 1992, 129-134
    [402] Kononenko I. Estimation attributes: analysis and extensions of RELIEF. European Conference on Machine Learning, Catana, Italy, 1994, 171-182
    [403] Wang Y H, Makedon F S, Ford J C and Pearlman J. HykGene: a hybrid approach for selecting marker genes for phenotype classification using microarray gene expression data. Bioinformatics, 2005, 21(8): 1530-1537
    [404]李颖新,阮晓钢.基于支持向量机的肿瘤分类特征基因选取.计算机研究与发展,2005,42(10):1796-1801
    [405]Xinguo Lu,Yaping Lin,Xiaolin Yang and Lijun Cai.Using most similarity tree based clustering to select the top most discriminating genes for cancer detection.ICAISC 2006,LNAI 4029,2006,pp.931-940
    [406]Huerta E B,Duval B,and Hao J K.A hybrid GA/SVM approach for gene selection and classification of microarray data.Evo Workshops,2006,34-44
    [407]Paul Pavlidis,Jason Weston,Jinsong Cai,and William Noble Grundy.Gene functional classication from heterogeneous data.In Proceedings of the fifth annual international conference on Computational biology,2001,pp.249-255,ACM Press
    [408]Wilcoxon F.Individual comparisons by ranking methods.Biometrics,1945,1:80-83
    [409]黄德双,刘海燕,施蕴渝,陈国良.生物信息学中的智能计算理论与方法研究.合肥:中国科学技术大学出版社,2006,56-64
    [410]Zhang H P,Yu C Y,Singer B,and Xiong M M.Recursive partitioning for tumor classification with gene expression microarray data.PNAS,May 2001,98(12):6730-6735
    [411]李泽,包雷,黄英武,孙之荣.基于基因表达谱的肿瘤分型和特征基因的选取.生物物理学报,2002,18(4):413-417
    [412]Simek K,Fujarewicz K,Swierniak A,Kimmel M,Jarzab B,Wiench M,and Rzeszowska J.Using SVD and SVM methods for selection,classification,clustering and modeling of DNA microarray data.Engineering Applications of Artificial Intelligence,2004,17:417-427
    [413]Toronen P,Kolehmainen M,Wong G,and Castr'en E.Analysis of gene expression data using self-organizing maps,FEBS Lett.,1999,451:142-146
    [414]李笑,孙光民,邓超.采用DSOM神经网络进行肿瘤基因表达谱数据的聚类分析.生物医学工程与临床,2006,10(1):43-46.
    [415]Li L.,Weinberg C R,Darden T A,and Pedersen L G.Gene selection for sample classification based on gene expression data:study of sensitivity to choice of parameters of the GA/KNN method.Bioinformatics,2001,17(12):1131-1142
    [416]Dudoit S,Fridlyand J,Speed T P.Comparison of discrimination methods for the classification of tumors using gene expression data.J Am Stat Assoc,2002,77-87
    [417]Lee J W,Lee J B,Park M,and Song S H.An extensive comparison of recent classification tools applied to microarray data.Comput Stat Data Anal,2005,48: 869-885
    [418]Guyon I,Weston J,Barnhill S,and Vapnik V.Gene selection for cancer classification using support vector machines.Machine Learning,2002,46(1-3):389-422.
    [419]Brown M P S,Grundy W N,Lin D,Cristianini N,Sugnet C W,Furey T S,Manuel Ares Jr,and Haussler D.Knowledge-based analysis of microarray gene expression data by using support vector machines.Proceedings of the National Academy of Sciences,2000,97(1):262-267
    [420]Midelfart H,Komorowski J,Nφrsett K,Yadetie F,Sandvik A K,L(?)greid A.Learning rough set classifiers from gene expressions and clinical data.Fundamenta Informaticae,2002,53:155-183
    [421]Khan J,Wei J S,Ringner Met al.Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks[J].Nature,Med 2001,7:673-679
    [422]Asyali M H,Colak D,Demirkaya O,and Inan M S.Gene expression profile classification:a review.Current Bioinformatics,2006,1:55-73
    [423]Vapnik V N.Statistical learning theory.Springer,New York:Wiley Interscience,1998
    [424]Statnikov A,Aliferis C F,Tsamardinos I,Hardin D,and Levy S.A comprehensive evaluation of multicategory classification methods for microarray gene expression cancer diagnosis.Bioinformatics,2005,21:631-43
    [425]Cristianini N and Shawe-Taylor J.An introduction to support vector machines.Cambridge University Press,Cambridge,UK,2000
    [426]Tibshirani R,Hastie T,Narasimhan B,and Chu G.Diagnosis of multiple cancer types by shrunken centroids of gene expression.Proc Natl Acad Sci USA,2002,99:6567-72
    [427]Jain A,Duin P,Mao J.Statistical pattern recognition:A review.IEEE Transactions on PAMI,2000,22:4-37
    [428]邓乃扬,田英杰著.数据挖掘中的新方法:支持向量机.北京:科学出版社,2004.
    [429]Wang S L,Chen H W,Li F R,Zhang D X.Gene selection with rough sets for the molecular diagnosing of tumor based on support vector machines.ICS 2006,Taiwan,pp.1368-1373.
    [430]李颖新,阮晓钢.基于基因表达谱的肿瘤亚型识别与分类特征基因选取研究.电子学报,2005,33(4):651-655
    [431]阮晓钢,李颖新,李建更,龚道雄,王金莲.基于基因表达谱的肿瘤特异基因表达模式研究.中国科学C辑生命科学,2006,36(1):86-96
    [432]Wang J B,Trond Hellem B,Jonassen I,Myklebost O,and Hovig E.Tumor classification and marker gene prediction by feature selection and fuzzy c-means clustering using microoarray data.BMC Bioinformatics,2003,4:61-72
    [433]Komura D,Nakamura H,and Tsutsumi S.Multidimensional support vector machines for visualization of gene expression data.Bioinformatics,2005,21(4):439-444
    [434]Paul T K and Hitoshi Iba.Extraction of informative genes from microarray data.Proceedings of the 2005 Conference on Genetic and Evolution Computation.Washington DC,USA,2005,453-406
    [435]Chu W,Ghahramani Z,Falciani F,and Wild D L.Biomarker discovery in microarray gene expression data with Gaussian processes.Bioinformatics,2005,21(16):3385-3393
    [436]王明怡,吴平,夏顺仁.基于人工神经网络集成的微阵列数据分类.浙江大学学报(工学版),2005,39(7):971-975
    [437]王海芸,李霞,郭政,张瑞杰.四种模式分类方法应用于基因表达谱分析的比较研究生物医学工程杂志,2005,22(3):505-509
    [438]Lin Deng,Jian Pei,Jinwen Ma,and Dik Lun Lee.Arunk sum test method for informative gene discovery.KDD'04,August 22-25,2004,Seattle,Washington,USA.
    [439]Liu C C,Chert W E,Lin C C,Liu H C,Chert H Y,Yang P C,Chang P C,and Chen J W.Topology-based cance classification and related pathway minging using microarray data.Nucleic Acids Research,2006,34(14):4069-4080
    [440]Yeoh E J,Ross M E,Shurtleff S A,et al.Classification,subtype discovery,and prediction of outcome in pediatric acute lymphoblastic leukemia by gene expression profiling[J].Cancer Cell,2002,1(2):133-143
    [441]Alon U,Barkai N,Notterman D A,Gish K,Ybarra S,Mack D,and Levine A,Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues by oligonucleotide arrays.Proc.Nat.Acad.Sci.USA,1999,96:6745-6750
    [442]Lossos I S,Alizadeh A A,Eisen M B,Chan W C,Brown P O,Bostein D,Staudt L M,and Levy R.Ongoing immunoglobulin somatic mutation in germinal center B cell-like but not in activated B cell-like diffuse large cell lymphomas.Proc.Of the Natl.Acad.Of Sci.USA,2000,97(18):10209-10213
    [443]Welsh J B,Sapinoso L M,Su A I et al.Analysis of gene expression identifies candidate markers and pharmacological targets in prostate cancer.Cancer Res 2001,61:5974-5978
    [444]Gordon G J,Jensen R V,Hsiao L L,Gullans S R,Blumenstock J E,Ramaswamy S, Richards W G, Sugarbaker D J, Bueno R. Translation of microarray data into clinically relevant cancer diagnostic tests using gene expression ratios in lung cancer and mesothelioma. Cancer Res. 2002, 62: 4963-4967
    [445] Petricoin E F, Ardekani A M, Hitt B A, Levine P J, Fusaro V A, Steinberg S M,Mills G B, Simone C, Fishman D A, Kohn E C, Liotta L A. Use of Proteomic Patterns in Serum to Identify Ovarian Cancer.The Lancet, February, 2002, 359:572-577
    [446] Van L J, Veer T, Dai H Y, Van M J, Vijver D, He Y D, Hart A A M, Mao M,Peterse H L, Kooy KVD, Marton M J, Witteveen A T, Schreiber G J, Kerkhoven R M, Roberts C, Linsley P S, Bernards R, and Friend S H. Gene expression profiling predicts clinical outcome of breast cancer. Letters to Nature, Nature,2002,415:530-536
    [447] Furey T S, Cristianini N, Duffy N, Bednarski D W, Schummer M, and Haussler D.Support vector machine classification and validation of cancer tissue samples using microarray expression data. Bioinformatics, 2000, 16(10): 906-914.
    [448] Nguyen D V and Rocke D M. Tumor classification by partial least squares using microarray gene expression data. Bioinformatics, 2002, 18(1): 39-45
    [449] Duda R, Hart P, Stork D. Pattern Classification, 2nd Edition ed, Wiley, NY, 2001.
    [450] Breiman L. Bagging predictors, Machine Learning, 1996, 26(2): 123-140
    [451] Freund Y and Schapire R. Experiments with a new boosting algorithm, Machine Learning: Proceedings of the Thirteenth Internation Conference, 1996.
    [452] Breiman L. Arcing Classifiers, the Annals of Statistics, 1998, 26(3): 801-849
    [453] Dettling M. BagBoosting for tumor classification with gene expression data,Bioinformatics, 2004, 20(18): 3583-3593
    [454] Tan A C and Gilbert D. Ensemble machine learning on gene expression data for cancer classification, Applied Bioinformatics, 2003
    [455] Wang C W. New ensemble machine learning method for classification and prediction on gene expression data. Proceedings of the 28th IEEE EMBS Annual International Conference, New York City, USA, Aug 30-Sept 3, 2006.
    [456] Bertoni A, Folgieri R, and Valentini G. Bio-molecular cancer prediction with random subspace ensembles of support vector machines. Neurocomputing, 2005,63: 535-539
    [457] Valentini G, Muselli M, and Ruffino F. Cancer recognition with bagged ensembles of support vector machines. Neurocomputing, 2004, 56: 461-466
    [458] Won H H and Cho S B. Neural network ensemble with negatively correlated features for cancer classification. ICANN/ICONIP 2003, LNCS 2714, 1143-1150
    [459] Bryll R, Gutierrez-Osuna R, Quek F. Attribute bagging: improving accuracy of classifier ensemble by using random feature subsets. Pattern Recognition, 2003, 36(6): 1291-1302
    [460] Thomas G. Dietterich and Ghulum Bakiri. Solving multiclass learning problems via error-correcting output codes. Journal of Artificial Intelligence Research (JAIR), 1995, 2:263-286
    [461] Hotelling H. Analysis of a complex of statistical variables into principal components. Journal of Educational Psychology, 1933, 24:417-441.
    [462] Roweis S T and Saul L K. Nonlinear dimensionality reduction by locally linear embedding. Science, 2000, 290: 2323-2326
    [463] Fisher R A. The use of multiple measurements in taxonomic problems. Annals of Eugenics, 1936, 7, Part 2: 179-188
    [464] Wold H. Soft modeling with latent variables: the nonlinear iterative partial least squares approach. In J. Gani, editor, Perspectives in probability and statistics:Papers in honor of M.S. Barlett, pages 114-142, London, 1975, Academic Press.
    [465] Fukunaga K. Introduction to statistical pattern recognition, second edition.Academic Press, Boston, 1990.
    [466] Duda R O, Hart P E, and Stork D G. Pattern classification. Wiley-interscience,New York, 2001
    [467] Howland P, Jeon M, and Park H. Structure preserving dimension reduction for clustered text data based on the generalized singular value decomposition. SIAM J. Matrix Anal. Appl., 2003, 25(1): 165-179
    [468] Turk M A and Pentland A P. Face recognition using eigenfaces. In: Proceedings of the Computer Vision and Pattern Recognition 1991, 1991, pp. 586-591
    [469] Huber R, Ramoser H, Mayer K, Penz H, and Rubik M. Classification of coins using an eigenspace approach. Pattern Recognition Letters, 2005, 26(1): 61-75
    [470] Nishimura K, Abe K, and Ishikawa S, Tsutsumi S, Hirota K, and Aburatani H. APCA based method of gene expression visual analysis. Genome Informatics,2003, 14: 346-347
    [471] Peterson L E. Partitioning large-sample microarray-based gene expression profiles using principal components analysis. Computer Methods and Programs in Biomedicine, 2003, 70: 107-119
    [472] Liu Z Q, Chen D C, and Bensmail H. Gene expression data classification with kernel principal component analysis. Journal of Biomedicine and Biotechnology,2005,2: 155-159
    
    [473] Joliffe I T. Principal component analysis. Springer-Verlag, New York, 1989.
    [474] Chang C C and Lin C J. LIBSVM: A library for support vector machines, 2001,software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm.
    [475] Jutten C, Herault J. Blind separation of sources, Part I: An adaptive algorithm based on neuromimatic architecture [J]. Signal Processing, 1991, 24(1): 1-10
    [476]Zhang X W,Yap Y L,Wei D,Chen F,and Danchin A.Molecular diagnosis of human cancer type by gene expression profiles and independent component analysis.European Journal of Human Genetics,2005,05(9):1018-4813
    [477]Pen Y H.Wavelet transform and engineering applications.BeiJing.SCIENCE PRESS,2000
    [478]胡昌华,李国华,刘涛,周志杰.基于Matlab 6.X的系统分析与设计-小波分析(第二版).西安:西安电子科技大学出版社,2004.
    [479]Wu M Z,Qi D,Gu Q,and Shao H W.Wavelet package-neural network based on rough set diesel engine vibration signal identification model.IEEE Int.Conf.Neural Networks & Signal Processing,Nanjing,China,December 14-17,2003.
    [480]Ahrned N.,Natarajan T.,and Rao K.R..Discrete Cosine Transform,IEEE Trans.Computers,Vol.C-23,1974,pp.90-94.
    [481]Theodoridis S.and Koutroumbas K..Pattern recognition,Academic Press,pp.341-342,1999.
    [482]Dasarathy B..Nearest Neighbor Norms:NN Pattern Classification Techniques.IEEE Computer Society Press,Los Alamitos,CA,USA,1991.
    [483]Kohavi R and John G H.Wrappers for feature subset selection.Artificial Intelligence Journal,1997,97(1-2):273-324
    [484]Liu H and Motoda H.Feature selection for knowledge discovery and data mining.Kluwer Academic Press,1998.
    [485]Blum A L and Rivest R L.Training a 3-node neural network is NP-complete.Neural Networks,1992,5:117-127
    [486]Hyafil L and Rivest R L.Constructing optimal binary decision trees is NP-complete.Information Processing Letters,1976,5(1):15-17
    [487]Amaldi E,Kann V.On the approximation of minimizing non-zero variables or unsatisfied relations in linear systems,Theor.Comput.Sci.,1998,209:237-260
    [488]Kohavi R.Feature subset selection as search with probabilistic estimates.AAAI Fall Symposium on Relevance,1994.
    [489]Jain A K and Zongker D.Feature selection:evaluation,application and small sample performance.IEEE Transactions on Pattern Analysis and Machine Intelligence,1997,19(2):153-158
    [490]Cover T M.The best two independent measurements are not the two best.IEEE Transactions on System,Man and Cybernetics,1974,4:116-117
    [491]Vafaie H and DeJong K.Robust feature selection algorithms.In Proc.5th Intl.Conf.on Tools with Artificial Intelligence,Rockville,MD,1993,356-363
    [492]Yang J and Honavar V.Feature subset selection using a genetic algorithm.IEEE Intelligent Systems,1998,13(2):44-49
    [493]Kuncheva L.Genetic algorithms for feature selection for parallel classifiers.Information Processing Letters,1993,163-168
    [494]Etxeberria R,Inza I,Larranaga P,and Sierra B.Feature subset selection by bayesian network-based optimization.Artificial Intelligence,2000,123:157-184,2000
    [495]Doak J.An evaluation of feature selection methods and their application to computer security.Technical Report CSE-92-18,Univ.of California at Davis,1992.
    [496]Skalak D B.Prototype and feature selection by sampling and random mutation hill-climbing algorithms.In Proc.11th Intl.Conf.on Machine Learning,pages 293 - 301,New Brunwick,NJ,1993.
    [497]Dash M and Liu H.Feature selection for classification.Intelligent Data Analysis,1997,1:131-156
    [498]王娟,慈林林,姚康泽.特征选择方法综述.计算机工程与科学,2005,27(12):68-71
    [499]Liu H,Setiono R.A probabilistic approach to feature selection:A filter solution [A].Proc of Int'l Conf on Machine Learning[C],1996,319-327
    [500]Chakraborty B.Genetic algorithm with fuzzy fitness function for feature selection [A].Proc of the 2002 IEEE International Symp on Industrial electronics,2002,1:315-319
    [501]Pawlak Z.Rough sets-theoretical aspects of reasoning about data[M].Kluwer Academic Publishers,1991:9-30.
    [502]Golan R,Ziarko W.Methodology for stock market analysis utilizing rough set theory.In:Proceedings of IEEE/IAFE Conference on computational intelligence for financial engineering,New Jersey,1995,32-40
    [503]蔡忠闽,管晓宏,邵萍等.基于粗糙集理论的入侵检测新方法.计算机学报,2003,3:361-366
    [504]Boyun Zhang,Jianping Yin,Jingbo Hao.Malicious code detection based on n-gram analysis and roughest theory.In Proc.IEEE CIS 2006.Lecture Notes in Artifical Intelligence,Springer-Verlag,Guanzhou,Nov 2006.
    [505]Bulashevska S,Dubitzky W,and Eils R.Mining gene expression data using rough set theory.In:Proceeding of Critical Assessment of Techniques for Microarray Data Analysis(CAMDA' 00 Conference),Duke University,NC,US,2000,4-5.
    [506]Fang J W and Grzymala-Busse J W.Leukemia prediction from gene expression data-a rough set approach.2006 Annual Kansas City Area Life Sciences Research Day,Kansas City,MO,2006
    [507]Shah N,Hamilton H,Cercone N.GRG:Knowledge discovery using information generalization,information,reduction,and rule generation.In:Proceedings of the 7th IEEE International Journal of Computational Intelligence,1995,11(2):323-338.
    [508]Hu X,Cerone N.Learning in relational database:A rough set approach.International Journal of Computational Intellignece,1995,11(2):323-338
    [509]Jelonek J,et al.Rough set reduction of attributes and their domains for neural networks.International Journal of Computational Intelligence,1995,11(2):339-347
    [510]Miao D,Wang J.Information-based algorithm for reduction of knowledge.In:Proceddings of IEEE ICIPS'97,1997
    [511]Wroblewski J.Finding minimal reducts using genetic algorithms.In:Proceddings of the 2nd Annual Join Conference on Information Science,1995,186-189
    [512]Wroblewski J.Therotical foundations of order-based genetic algorithms.Fundamenta Informaticae,1996,28(3-4):423-430
    [513]Skowron A,Rauszer C.The discernibility matrices and functions in information systems.In:Slowinski R,ed.Intelligent Decision Support Handbook of Applicaions and Advance of the Rough Sets Theory,1991,331-362
    [514]Wang J,Miao D.Analysis on attribute reduction strategies of rough set.Journal of Computer Science and Technology,1998,13(2):189-193
    [515]Wadman I,Li J X,Bash R O,Forster A,Osada H,Rabbitts T H,and Baer R.Specific in-vivo association between the Bhlh and Lim proteins implicated in human T-cell leukemia.EMBO Journal,1994,13:4831-4839
    [516]Simmons D and Seed B.Isolation of a cDNA encoding CD33,a differentiation antigen of myeloid progenitor cells.Journal of Immunology,1988,141(8):2797-2800
    [517]Chuang H Y,Tsai H K,and Tsai Y F.Ranking genes for discriminability on microarray data.Journal of Information Science and Engineering,2003,19:953-966
    [518]李建中,杨昆,高宏,骆吉洲,郭政.考虑样本不均衡的模型无关的基因选择方法.软件学报,2006,17(7):1485-1493
    [519]Xiong M M,Fang X Z,and Zhao J Y.Biomarker identification by feature wrappers.Genome Research,2001,11:1878-1887
    [520]边肇祺,张学工.模式识别(第二版).北京:清华大学出版社,2000,180-183.
    [521]Duda O R,Hart P E,Stork G D.Pattern classification.Second Edition.John wiley & Sons.,2001:46-48
    [522]Vapnik V N.The nature of statistical learning theory.New York:Springer-Verlag,1995.
    [523]朱云华,李颖新,阮晓钢.基于基因表达谱的小圆蓝细胞瘤亚型识别.计算机应用,2004,24(11):131-134.
    [524]Wang S L,Wang J,Chen H W,and Tang W S.The classification of tumor using gene expression profile based on support vector machines and factor analysis.Intelligent Systems Design and Applications,Jinan,China,IEEE Computer Society Press,2006,2:471-476.
    [525]Wang S L,Chen H W,Wang J,and Zhang D X.Molecular diagnosis of tumor based on independent component analysis and support vector machines.Proceedings of the 2006 international conference on computational intelligence and security,2006,1:362-367.
    [526]Simmons D and Seed B.Isolation of a eDNA encoding CD33,a differentiation antigen of myeloid progenitor cells.Journal of Immunology,1988,141(8):2797-2800
    [527]Yu D,Seitz P K,Selvanayagam P et al.Effects of vasoactive intestinal peptide on adenosine 3',5'monophosphate,ornithine decarboxylase,and cell growth in human colon cell line.Endocrinology,1992,131(3):1188-1194
    [528]Tatsuta M,Iishi H,Baba M,et al.Attenuation of vasoactive intestinal peptide enhancement of colon carcinogenesis by ornithine decarboxylase inhibitor.Cancer Lett,1995,93(2):219-225.
    [529]Kwiatkowski D J.Functions of gelsolin:Motility,signaling,apoptosis,cancer.Curr.Opin.Cell.Biol.,11:103-108
    [530]邓林,马尽文,裴健.秩和基因选取方法及其在肿瘤诊断中的应用.科学通报,2004,49(13):1311-1316
    [531]陆巍,忻健,王翼飞.基于非参数方法的肿瘤基因表达数据挖掘.上海大学(自然科学版),2003,9(6):545-548
    [532]William H Kruskal and W Allen Wallis.Use of ranks in one-criterion variance analysis.Journal of the American Statistical Association,47(260):583-621,December 1952
    [533]Lin Deng,Jian Pei,Jinwen Ma,and Dik Lun Lee.A rank sum test method for informative gene discovery.KDD'04,August 22-25,2004,Seattle,Washington,USA
    [534]Trevor Hastie,Robert Tibshirani,Jerome Friedman.The elements of statistical learning:data mining,inference,and prediction.Springer-Verlag,2001
    [535]Breiman L.and Spector P.Submodel selection and evaluation in regression:the X-random case,Intern.Statist.Rev 60:291-319
    [536]Kohavi R.A study of cross-validation and bootstrap for accuracy estimation and model selection.International Joint Conference on Artificial Intelligence(IJCAI), 1995,pp.
    [537]李霞,张田文,郭政.一种基于递归分类树的集成特征基因选择方法.计算机学报,2004,27(5):675-681
    [538]Peng S,Xu Q,Ling X B,Peng X,Du W,and Chen L.Molecular classification of cancer types from microarray data using the combination of genetic algorithms and support vector machines.FEBS Letter,2003,555(2):358-362
    [539]Chao S and Lihui C.Feature dimension reduction for microarray data analysis using locally linear embedding.In APBC,2005,211-217.
    [540]Deb K and Reddy A R.Classification of two-class cancer data reliably using evolutionary algorithms.Technical Report.KanGAL,2003.
    [541]Qinghua Hu,Daren Yu,and Zongxia Xie.Neighborhood classifiers.Expert Systems with Application.doi:10.1016/j.eswa.2006.10.043,2006.
    [542]Dasarathy B.Nearest Neighbor Norms:NN Pattern Classification Techniques.IEEE Computer Society Press,Los Alamitos,CA,USA,1991.
    [543]Deutsch J.Evolutionary algorithms for finding optimal gene sets in microarray prediction.Bioinformatics,2003,19:45-52
    [544]Lipo Wang,Feng Chu,and Wei Xie.Accurate cancer classification using expressions of very few genes.IEEE/ACM transactions on computional biology and bioinformatics,2007,4(1):40-53
    [545]Tan S..Neighbor-weighted K-nearest neighbor for unbalanced test corpus.Expert Systems with Applications,2005,28:667-671.
    [546]Tan S.Neighbor-weighted K-nearest neighbor for unbalanced test corpus.Expert Systems with Applications,2005,28:667-671
    [547]Hansen L K,Salamon P.Neural network ensembles.IEEE Transactions on Pattern Analysis and Machine Intelligence,1990,12(10):993-1001.
    [548]Peter Antal,Geert Fannes,Dirk Timmerman,Yves Moreau,Bart De Moor.Bayesian applications of belief networks and multilayer perceptrons for overian tumor classification with rejection.Artificial Intelligence in Medicine,2003,29(1):39-60
    [549]King H C,Sinha A A.Gene expression profile analysis by DNA microarrays.JAMA,2001,286(18):2280-2288
    [550]Shah-Hosseini H,Safabakhsh R.TASOM:a new time adaptive self-organizing map.IEEE Trans.Systems,Man Cybern.2003,Part B 33(2):271-282
    [551]Mateos A,Herrero J,et al.Supervised neural networks for clustering conditions in DNA array data after reducing noise by clustering gene expression profiles.Second Critical Assessment of Microarray Data Analysis(CAMDA 01),2001,pp.91-103
    [552] Specht D F. Probabilistic neural networks for classification, mapping, or associative memory. In: Proceedings of the IEEE International Conference on Neural Networks, 1988, pp.525-532
    [553] Chenn-Jung Huang, Wei-Chen Liao. A comparative study of feature selection methods for probabilistic neural networks in cancer classification. ictai, 15th IEEE International Conference on Tools with Artificial Intelligence (ICTAI'03), 2003, pp.451-457
    [554] Guangmin Sun, Xiaoying Dong, and Guandong Xu. Tumor tissue identification based on gene expression data using DWT feature extraction and PNN classifier.Neurocomputing, 2006, 69: 387-402
    [555] Ryu J and Cho S B. Gene expression classification using optimal feature/classifier ensemble with negative correlation. Proceedings of the 2002 International Joint Conference on Neural Networks, 2002, 1: 198-203
    [556] Krogh A, Vedelsby J. Neural network ensembles, cross validation, and active learning. In: Tesauro G, Touretzky D S, Leen T K, eds. Advances in Neural Information Processing systems 7, Cambridge, MA: The MIT Press, 1995,231-238
    [557] Kuncheva L I, Whitaker C J. Measures of diversity in classifier ensembles.Machine Learning, 2003, 51(2): 181-207
    [558] Ran Gilad-Bachrach, Amir Navot, and Naftali Tishby. Margin based feature selection— theory and algorithms. In: Proceedings of the 21st International Conference on Machine Learning, Banff, Canada, 2004
    [559] Goh A T C. Probabilistic neural network for evaluating seismic liquefaction potential. Can. Geotech. J. 2002, 39: 19-232
    [560] Zhi-Hua Zhou, Jianxin Wu, Wei Tang, and Zhao-Qian Chen. Selectively ensembling neural classifiers. In: Proceedings of the International Joint Conference on Neural Networks, Honolulu, HI, 2002, 2: 1411-1415
    [561] Krishnapuram B., L. Carin, and A. Hartemink. Gene expression analysis: Joint feature selection and classifier design. In B. Scholkopf, K. Tsuda, and J.-P. Vert,editors, Kernel Methods in Computational Biology, MIT, 2004, pp.299-318.
    [562] Hood L. A personal view of molecular technology and how it has changed biology. J Proteome Res., 2002, 1(5): 399-409.
    
    [563] 罗辽复.DNA序列信息内容的普适关系.合肥学院学报(自然科学版),2005,15(1): 1-6.
    [564] Cohen J. Computer science and bioinformatics. Communications of the ACM,2005, 48(3): 72-78.
    [565] Knuth D. Computer literacy interview (Dec. 7, 1993), http://www.literate-programming.com/clb93.pdf.
    [566]Setubal J,Meidanis J.Introduction to Computational Molecular Biology.1~(st) ed..Cole Publishing Company,a division of Thomson Learning,1997.
    [567]许进,张雷.DNA计算机原理、进展及难点(Ⅰ):生物计算机系统及其在图论中的应用.计算机学报,2003,26(1):1-11
    [568]许进,黄布毅.DNA计算机:原理、进展及难点(Ⅱ):计算机“数据库”的形成-DNA分子的合成问题.计算机学报,2005,28(10):1583-1591
    [569]许进,张社民,范月科,郭养安.DNA计算机原理、进展及难点(Ⅲ):分子生物计算中的数据结构与特性.计算机学报,2007,30(6):869-880
    [570]许进,谭钢军,范月科,郭养安.DNA计算机原理、进展及难点(Ⅳ):论DNA计算机模型.计算机学报,2007,30(6):881-893
    [571]Cohen J.Bioinformatics-an introduction for computer scientists.ACM Computing Surveys(CSUR),2004,36(2):122-158
    [572]Lawrence D.Incorporating bioinformatics in an algorithms course.Proceedings of the 8th annual conference on Innovation and technology in computer science education,2003,35(3):211-214
    [573]Sakakibara Y.Grammatical inference in bioinformatics.IEEE transactions on pattern analysis and machine intelligence,2005,27(7):1051-1062
    [574]Helden J V,Andre B,and Collado-Vides J.Extracting regulatory sites from the upstream region of yeast genes by computational analysis of oligonucleotide frequencies.J.mol.Biol.,1998,281:827-842
    [575]Helden J V,Olmo M D,and Perez-Ortin J E.Statistical analysis of yeast genomic downstream sequences reveals putative polyadenylation signals.Nucl.Acids Res.,2000,28:1000-1010.
    [576]Hampson S,Kibler D,and Baldi P.Distribution patterns of locally overrepresented k-mers in non-coding yeast DNA.2001.
    [577]Radman M,Wagner R.Carcinogenesis:missing mismatch repair.Nature,1993,366:722.
    [578]毕利军,周亚凤,张先恩.DNA错配修复与癌症的发生及治疗.生物物理学报,2006,22(1):1-6.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700