基于DNA序列的进化树构建算法的研究

英文题名：Study of Methods of Constructing Evolutionary Trees with DNA Sequences
作者：李建伏
论文级别：博士
学科专业名称：人工智能与信息处理
中文关键词：进化树重构 ; 最大似然法 ; 邻接法 ; 分支交换操作 ; quartet方法 ; 结构期望最大化
英文关键词：Evolutionary tree reconstruction ; Maximum likelihood ; Neighbor joining ; Topological rearrangement ; Quartet methods ; Structural expectation maximization
学位年度：2008
导师：郭茂祖
学科代码：081001
学位授予单位：哈尔滨工业大学
论文提交日期：2008-12-01

摘要

进化树又称为系统发生树,是一种用来描述各种生命实体之间进化关系的树型结构。一个可靠的进化树推断不仅有助于了解生物的进化历史和进化机制,而且对生物医药学、计算分子生物学其他分支的研究具有重要意义。
     目前常见的进化树构建方法有距离法、最大简约法和最大似然法三种。其中,最大似然法被认为比其他两种方法更为准确,但是其计算复杂度非常高。为了提高最大似然法,本文从以下4个方面展开了深入研究:
     (1)提出了一个快速高效的分支交换操作
     由于计算复杂度非常高,实际应用中的最大似然法都是基于启发式的。虽然不同的启发式算法具有不同的搜索策略,但是其基本思路都是通过一系列的分支交换操作来提高起始进化树。并且,启发式算法的性能在一定程度上取决于其所采用的分支交换操作的搜索能力。目前常见的分支交换操作有NNI(Nearest Neighbor Interchange) , SPR(Subtree Prune and Regraft)和TBR(Tree Bisection and Reconnection),其中TBR的搜索空间最广。但研究表明,TBR的搜索空间还不够广,容易陷入局部极优。另一种分支交换操作p-ECR (p-Edge Contraction and Refinement)具有更广的搜索空间,能够在一定程度上避免局部极优。但是由于计算复杂度非常高,p-ECR很少被使用。本文结合邻接法和p-ECR提出了一个分支交换操作p-ECRNJ。p-ECRNJ的基本思想是利用邻接法分解p-ECR中产生的未分解节点。这样,每一次p-ECRNJ操作只需要O(p3)去分解未分解节点,而无须花费大量的时间去尝试所有的可能(最多(2p+1)!!),从而大大降低了时间复杂度。对12组真实数据的测试结果表明基于p-ECRNJ的进化树构建算法能够在合理的时间内找到比其他流行的算法更好的进化树。并且,p-ECRNJ还可以有效地提高其他分支交换操作,如NNI。从而,证明了p-ECRNJ的有效性。
     (2)提出了一种PSO的进化树构建算法
     爬山算法在进化树重构中得到了广泛应用并取得了一定的成功。但是由于进化树的似然函数通常含有多个局部极优解,而爬山算法没有逃离局部极优的能力,因此基于爬山算法的进化树构建算法很容易陷入局部极优。从本质上讲,粒子群优化算法(Particle Swarm Optimization,简称PSO)是一种并行的、动态的爬山算法,具有一定的逃离局部极优的能力。本文提出了一个基于PSO的进化树构建算法PSOML。该算法采用PSO的基本框架,利用p-ECRNJ完成粒子状态的更新,其中p值是根据粒子的速度确定的。对真实数据的测试结果表明虽然要花费较长的时间,PSOML的准确性要明显优于基于爬山算法的PHYML和RAxML。并且,PSOML可以在合理的时间内找到比基于遗传算法GARLI更好的进化树。
     (3)结合Quartet puzzling和邻接法提出了一种进化树构建方法
     由于最大似然法的时间复杂度非常高,基于分治思想的最大似然法——quartet方法得到了人们的关注。最流行的quartet方法是Quartet Puzzling(简称QP)。它首先利用最大似然法估计quartet拓扑结构集合Q,然后利用一个贪心算法将Q进行重组构成一个包含所有序列的进化树。研究表明,QP的准确性不够高,甚至比邻接法还要低。如何快速有效地将Q进行重组是QP所面临的一个难题。另一方面,邻接法具有很好的理论特性,但其准确性取决于作为输入的距离矩阵的质量。长分支一直是困扰邻接法的一个问题。本文结合邻接法和QP提出了一个进化树构建算法QPNJ。QPNJ的基本思想是首先用最大似然法估计quartet拓扑结构集合Q,然后根据Q估计序列之间的进化距离,构成距离矩阵M,最后利用邻接法根据M构建进化树。QP和邻接法的这种结合会达到取长补短的效果。一方面利用更有理论依据的邻接法完成Q的重组可以提高QP的准确性,另一方面利用quartet估计序列之间的进化距离可以在一定程度上避免邻接法所面临的长分支问题。理论上,QPNJ与QP具有相同的时间复杂度。需要指出的是,邻接法和QP的结合不是唯一的,任意类似于邻接法的聚类过程如Weightor都可以按照QPNJ的思想代替邻接法与QP相结合。模拟实验表明,QPNJ和QPWNJ(结合了Weightor与QP)比QP更加准确,甚至比邻接法和Weightor还准确。并且,QPNJ和QPWNJ的准确性不像QP一样依赖于模型树的结构。从而证明了QP与聚类算法如邻接法和Weighbor的结合是有效的。
     (4)结合同伦方法和SEM提出了一种进化树构建算法
     根据当前物种构建进化树是一个典型的从非完全数据学习的问题。结构期望最大化(Structural Expectation Maximization,简称SEM)算法是一个根据不完全数据求解模型结构的有效方法,它通过迭代交替搜索的简单方式能够简化最大似然估计问题。但是,由于SEM算法直接采用贝叶斯公式计算隐变量的条件概率,使得每次迭代的结果都是上次迭代中期望似然值的最优解,因此算法对于初始点的选择具有很强的依赖性。尤其是进化树的似然函数往往具有多个局部极优解,所以直接利用SEM构建进化树很容易陷入局部极优。同伦方法是一个全局方法,其基本思想是构造一个同伦函数将一个已知解的问题与待解问题联系起来,然后从已知解的问题开始,利用同伦参数的变化,最终求得待解问题的最优解。本文结合同伦方法和SEM提出了一种进化树构建算法HSEMPHY。HSEMPHY首先利用最大熵原理计算隐变量的条件概率,引入同伦参数β,然后借鉴同伦方法的过程优化似然函数。对真实数据的测试结果表明HSEMPHY与目前两个流行的进化树构建算法PHYML和RAxML性能相当,并且对初始点具有较低的敏感性。
An evolutionary tree, also known as phylogenetic tree, is a tree structure showing the evolutionary relationship among various organisms. A reliable evolutionary tree inference not only helps to understand the evolutionary history and evolution mechanisms in biology, it is also of great significance to many branches research in biomedical science and computational molecular biology.
     Currently, a rich variety of evolutionary tree reconstruction methods have been developed, which fall into three categories, (a) distance based methods, (b) maximum parsimony methods and (c) approaches applying the maximum likelihood principle. It is known that maximum likelihood can recover correct trees more frequently than other methods. But high computational complexity is the main disadvantage. In order to improve maximum likelihood, the paper launched an in-depth study from the following four aspects:
     (1) Proposed a fast and effective topological rearrangement operator
     Due to the very high computational complexity, maximum likelihood in application is based on heuristics. Although using different search strategies, the heuristics are all to try to improve a starting tree / starting trees by a series of elementary topological rearrangement operators, until local optima is found. It is obvious that the performance of the heuristics depends on the degree of exhaustiveness of the topological rearrangement operators on some extent. The topological rearrangement operators often used in heuristics are Nearest Neighbor Interchange (NNI), Subtree Prune and Regraft (SPR) and Tree Bisection and Reconnection (TBR), and TBR is the most exhaustive. Even TBR searches, however, can often get trapped in local optima, since there are not many trees accessible in one step from any given tree. Another more exhaustive topological rearrangement operator is p-Edge Contraction and Refinement (p-ECR). However, due to its high computation complexity, p-ECR has rarely been used in practice. This paper presented a fast and efficient topological rearrangement operation combining p-ECR and neighbor joining. The main idea of p-ECRNJ is to to use neighbor joining to refine the unresolved nodes produced in p-ECR. Thus, in very iteration, p-ECRNJ only spends O(p3) to resolve unresolved nodes, and without having to spend a lot of time to evaluate all possible trees (at most (2p+1)!!), which significantly reduced the time complexity of p-ECR. Experiments with 12 real datasets show that p-ECRNJ based heuristics can find better trees than other two popular evolutionary tree reconstruction methods (PHYML and RAxML) in considerable time. Moreover, p-ECRNJ can improve efficiently other local topological rearrangement operators, such as NNI. All the results prove that p-ECRNJ is efficient.
     (2) Proposed a PSO based evolutionary tree reconstruction method
     Hill climbing has been widely used in evolutionary tree reconstruction and has achieved some success. However, for hill-climbing strategies to be guaranteed to find the global optimum the optimality landscape must be unimodal. Moreover, research has demonstrated that multiple maxima exist under the maximum likelihood principle. Therefore, hill climbing based evolutionary tree reconstruction methods often return local optima. In principle, PSO (Particle Swarm Optimization) is the parallel and dynamic version of hill climbing and has a higher probability to break away from the local optima effectively. This paper presented a PSO based evolutionary tree reconstruction method named PSOML(Particle Swarm Optimization based Maximum Likelihood). PSOML uses the framework of the basic PSO principle, every particle in the swarm corresponds to a possible evolutionary tree and the update of particles’state is accomplished by p-ECRNJ, while p is dependent on particles’velocity. Experiments with real datasets show that PSOML clearly outperforms PHYML and RAxML in terms of solution optimality, although it cannot compete in terms of runtimes. Moreover, PSOML can find better trees than GARLI across a range of datasets in considerable time.
     (3) Proposed an evolutionary tree reconstruction combing Quartet Puzzling and neighbor joining
     Due to the high time complexity of maximum likelihood, divide-and-conqure based quartet methods of phylogeny reconstruction are currently receiving considerable attention. The most popular is Quartet Puzzling (QP). It first use maximum likelihood to estimate the topology quartet set Q, and then use a greedy algorithm to merge Q into an evolutionary tree containing all sequences. But research shows that QP is not as accurate as expected. How to effectively recombine the information contained in Q is still a hard problem facing QP. Neighbor joining is a clustering technique, which has many good characteristics. However, the performance of neighbor joining depends on the quality of the estimated evolutionary distances of different sequences. Long branch is still a problem facing neighbor joining. Therefore, this paper proposed a reconstruction method QPNJ (Quartet puzzling and Neighbor Joning) combing neighbor joining and QP. The main idea of QPNJ is to use maximum likelihood to estimate the set Q of quartet topologies, estimate the evolutionary distances between every two sequences according to Q, and use neighbor joining to reconstruct an evolutionary tree according to the evolutionary distances. One hand, using neighbor joining with better theoretical guarantees as the recombining technique can improve QP; On the other hand, QPNJ can avoid the long branch problem faced by neighbor joining. In theory, QPNJ has the same time complexity with QP. It is worth noting that the combination of QP and neighbor joining is not only, any clustering method like neighbor joining can replace neighbor joining to combine with QP. Finally, simulation experiments show that QPNJ and QPWNJ(the combination of QP and Weighbor) are more accurate than the QP, and even than neighor joining and Weighbor. Moreover, not like QP, QPNJ and QPWNJ do not rely on the structures of model trees. These experiments results prove the combination of QP and clustering methods such as neighbor joining and Weighbor is efficient.
     (4) Proposed an evolutionary tree reconstruction combing homotopy with SEM
     Reconstructing evolutionary trees according to current sequences is a typical machine learning problem from incomplete data. SEM(Structural Expectation Maximization) is very efficient to estimate model structures using maximum likelihood with incomplete data. Starting from a structure, SEM iteratively probabilistically complete the data according to the distribution induced by current model and use the completed data to evaluate different candidate structures. However, in SEM, the condition probability of the hidden variables is directly computed by Bayes’rule and the structure obtained in every iteration is optimized with respect to the expected likelihood value of the optima in last iteration. As a result, at later iterations of the procedure, the trees that maximize this expected likelihood value will tend to be similar to the tree found in the previous iteration. Therefore, the performance of SEM depends on the starting points. Moreover, research has demonstrated that multiple maxima exist under the maximum likelihood principle. Thus, the reconstruction algorithm using SEM can often be strapped in local optima. The homotopy method belongs to the field of global optimization techniques. The main idea is that a smoothed version of the objective function is first optimized. With enough smoothing, the optimization will be convex and the global optimum can be found. Smoothing then increases and the new optimal are computed, where the solution found in the previous step serves as a starting point. The algorithm iterates until there is no smoothing. On the basis of SEM and homotopy, this paper proposed an evolutionary tree reconstruction, which is named HSEMPHY. HSEMPHY computes the condition probability of hidden variables through maximum entropy principle. It can reduce the influence of the initial value on the final solution by introducing the homotopy parameterβand simulating the process of homotopy continuation principle. Experimental results with real datasets show that HSEMPHY is at least as good as the two most popular methods PHYML and RAxML while it is very robust to poor starting trees.

引文

1.黄丽红,荀鹏程,赵杨,陈峰.系统树的构建及其在SARS病毒分类中的应用.中国卫生统计. 2006, 23(4): 315~318
    2.聂婷婷,肖强,殷坤山,邢丽萍,张传溪.茶毛虫NPV的P24、Rr1、Lef1基因及其分子进化分析.茶叶科学. 2005, 25(4):242~248
    3. C. Notredame, D. G. Higgins, J. Heringa. T-Coffee: A Novel Method for Fast and Accurate Multiple Sequence Alignment. Journal of Molecular Biology. 2000, 302(1):205~217
    4. Irmtraud M. Meyer. A Pratical Guide to the Art of RNA Gene Prediction. Briefings in Bioinformatics. 2007, 8(6): 396~414
    5. S. Gribaldo, D. Casane, P. Lopez, H. Philippe. Functional Divergence Prediction from Evolutionary Analysis: A Case Study of Vertebrate Hemoglobin. Molecular Biology and Evolution. 2003, 20(11):1754~1759
    6. M. Blanchette. Algorithms for Phylogenetic Footprinting. Proceedings of the
    5th Conference on Research in Computational Molecular Biology. 2001: 49~58
    7. B. Knudsen, J. Hein. RNA Secondary Structure Prediction Using Stochastic Context-free Grammars and Evolutionary History. Bioinformatics. 1999, 15(6): 446~454
    8. M. Rehmsmeier, M. Vingron. Phylogeny Meets Sequence Search. Proceedings of the 14th German Conference on Bioinformatics. Hamburg, Germany, 1999: 66~72
    9. J. Felsenstein. The Number of Evolutionary Trees. Systematic Zoology. 1978, 27(1): 27~33
    10. Z. Yang. Computational Molecular Evolution. Oxford University Press, Oxford, England. 2006: 1~38
    11. Pietro Lio, Nick Goldman. Models of Molecular Evolution and Phylogeny. Genome Research. 1998, 8(12): 1233~1244
    12. M.A.Steel. Recovering a Tree from the Leaf Colourations it Generates under a Markov Model. Applied Mathematics Letters. 1994, 7(2): 19~24
    13. V. Ranwez. Effective Methods to Reconstruct Large Phylogenies Followingthe Principle of Maximum Likelihood. Dissertation of the University of Texas. 2002
    14. T. Simon. Some Probabilistic and Statistical Problems in the Analysis of DNA Sequences. Lectures on Mathematics in the Life Sciences. 1986, 17: 57~86
    15. T. H. Jukes, C. R. Cantor. Evolution of Protein Molecules. Mammalian Protein Metabolism. Academic Press, New York. 1969, 3: 21~123
    16. T. Gojobori, W.H. Li, D. Graur. Patterns of Nucleotide Substitution in Pseudogenes and Functional Genes. Journal of Molecular Evolution. 1982, 18(5): 360~369
    17. T. D. Kocher, A. C. Wilson. Sequence Evolution of Mitochondrial DNA in Humans and Chimpanzees: Control Region and a Protein-coding Region. Evolution of life. 1991, 391~413
    18. M. Kimura. A Simple Method for Estimating Evolutionary Rate of Base Substitution through Comparative Studies of Nucleotide Sequences. Journal of Molecular Evolution. 1980, 16(2):111~120
    19. M. Hasegawa, H. Kishino, T. Yano. Dating the Human-ape Splitting by a Molecular Clock of Mitochondrial DNA. Journal of Molecular Evolution. 1985, 22(2):160~174
    20. K. Tamura, M. Nei. Estimation of the Number of Nucleotide Substitutions in the Control Region of Mitochondrial DNA in Humans and Chimpanzees. Molecular Biology and Evolution. 1993, 10(3): 512~526
    21. Zharkikh, W. H. Li. Estimation of Confidence in Phylogeny: the Complete and Partial Bootstrap Technique. Molecular Phylogenetics and Evolution. 1995, 4(1): 44~63
    22. M. Kimura. Estimation of Evolutionary Distances between Homologous Nucleotide Sequences. Proceedings of the National Academy of Sciences of the United States of America. 1981, 78(1): 454~458
    23. Z. Yang. Maximum-likelihood Phylogenetic Estimation from DNA Sequences with Variable Rates over Sites: Approximate Methods. Journal of Molecular Evolution. 1994, 39(3): 306~314
    24. Z. Yang. Among-Site Rate Variation and Its Impact on Phylogenetic Analyses. Trends in Ecology and Evolution, 1996, 11(9): 367~372
    25.根井正利,库马著(吕宝忠,钟扬,高莉萍译,赵寿元,张建之校).分子进化与系统发育.高等教育出版社,2004: 76~100
    26. Dave Thomas. Example Calculation of Phylogenies: the UPGMA Method. http://www.nmsr.org/upgma.htm. 2005
    27. Chris. Fitch-Margoliash Algorithm for Calculating the Branch Lengths. http://www.bioinfo.rpi.edu/~bystrc/courses/biol4540/lecture12/sld002.htm. 2005
    28. N. Saitou, M. Nei. The Neighbor-Joining Method: a New Method for Reconstructing Phylogenetic Trees. Molecular Biology and Evolution. 1987, 4(4): 406~425
    29. Gascuel. Concerning the NJ Algorithm and Its Unweighted Version, UNJ. Mathematical Hierarchies and Biology, DIMACS series in Discrete Mathematics and Theoretical Compute Science. 1997: 149~170
    30. K. Atteson. The Performance of the Neighbor-Joining Methods of Phylogenetic Reconstruction. Algorithmica. 1999, 25: 251~278
    31. D. Bryant. On the Uniqueness of the Selection Criterion in Neighbor-joining. Journal of Classification. 2005, 22(1): 3~15
    32. Gascuel, M. Steel. Neighbor-Joining Revealed. Molecular Biology and Evolution. 2006, 23(11):1997~2000
    33. R. Mihaescu, D. Levy, L. Pachter. Why Neighbor-joining Works. CoRR abs/cs/0602041. 2006
    34. K. St. John, T. Warnow, B. Moret, L. Vawter. Performance Study of Phylogenetic Methods: (Unweighted) Quartet Methods and Neighbor Joining. Journal of Algorithms. 2003, 48(1): 174~193
    35. K. Tamura, M. Nei, S. Kumar. Prospects for Inferring Very Large Phylogenies by Using the Neighbor Joining Method. Proceedings of the National Academy of Sciences of the United States of America. 2004, 101(30): 11030~11035
    36. W. Day, D. Johnson, D. Sankoff. The Computational Complexity of Inferring Rooted Phylogenies by Parsimony. Mathematical Biosciences. 1986, 81(1):33~42
    37. J. Felsenstein. Maximum likelihood and Minimum-steps Methods for Estimating Evolutionary Trees from Data on Discrete Characters. SystematicZoology. 1973, 22(3): 240~249
    38. S. Roch. A Short Proof that Phylogenetic Tree Reconstruction by Maximum Likelihood is Hard. IEEE/ACM Transactions on Computational Biology and Bioinformatics. 2006, 3(1): 92~94
    39. J. Felsenstein. Cases in Which Parsimony or Compatibility Methods will be Positively Misleading. Systematic Zoology. 1978, 27(4):401~410
    40. http://evolution.genetics.washington.edu/phylip.html. 2006
    41. G. Olsen, H. Matsuda, R. Hagstrom, R. Overbeek. Fastdnaml: A Tool for Construction of Phylogenetic Trees of DNA Sequences Using Maximum Likelihood. Computer Application in Biosciences. 1994, 10(1):41~48
    42. M. Wolf, S. Easteal, M. Kahn, B. McKay, L. Jermiin. TrExML: A Maximum Likelihood Program for Extensive Tree-space Exploration. Bioinformatics. 2000, 16(4):383~394
    43. Y. H. Kim, S. K. Lee, B. R. Moon. Optimizing the Order of Taxon Addition in Phylogenetic Tree Construction Using Genetic Algorithm. Genetic and Evolutionary Computation Conference. Chicago, USA. 2003, 2168~2178
    44. C. Babel. Design and Implementation of Distance-based Heuristics in a Program for Genome Sequence Analysis. System Development Project, Technical University of Munich, 2003
    45. S. Guindon and O. Gascuel. A Simple, Fast and Accurate Algorithm to Estimate Large Phylogenies by Maximum Likelihood. Systematic Biology. 2003, 52(5): 696~704
    46. Stamatakis, T. Ludwig, H. Meier. RAxML-III: a Fast Program for Maximum Likelihood-based Inference of Large Phylogenetic Trees. Bioinformatics. 2005, 21(4): 456~463
    47. Z. Yang. Computational Molecular Evolution. Oxford University Press, Oxford, England. 2006: 104~105
    48. B. Chor, M. D. Hendy, B. R. Holland, D. Penny. Multiple Maxima of Likelihood in Phylogenetic Trees: an Analytic Approach. Molecular Biology and Evolution. 2000, 17(10): 1529~1541
    49. P. Lewis. A Genetic Algorithm for Maximum-likelihood Phylogeny Inference Using Nucleotide Sequence Data. Molecular Biology and Evolution. 1998, 15(3):277~283
    50. Lemmon, M. Milinkovitch. The Metapopulation Genetic Algorithm: an Efficient Solution for the Problem of Large Phylogeny Estimation. Proceedings of the National Academy of Sciences of the United States of America. 2002, 99 (16):10516~10521
    51. D. J. Zwickl. Genetic Algorithm Approaches for the Phylogenetic Analysis of Large Biological Sequence Datasets under the Maximum Likelihood Criterion. Dissertation of the University of Texas. 2006
    52. L. A. Salter, D. K. Pearl. Stochastic Search Strategy for Estimation of Maximum Likelihood Phylogenetic Trees. Systematic Biology. 2001, 50(1):7~17
    53. Stamatakis. An Efficient Program for Phylogenetic Inference Using Simulated Annealing. Proceedings of 19th IEEE/ACM International Parallel and Distributed Processing Symposium: High Performance Computational Biology Workshop. Denver, Colorado. 2005, 8:198~205
    54. R Fleissner, D Metzler, A von Haeseler. Simultaneous Statistical Multiple Alignment and Phylogeny Reconstruction. Systematic Biology. 2005, 54(4):548~561
    55. H. A. Schmidt, K. Strimmer, M. Vingron, A. von Haeseler. TREE-PUZZLE: Maximum Likelihood Phylogenetic Analysis Using Quartets and Parallel Computing. Bioinformatics. 2002, 18(3):502~504
    56. V. Ranwez, O. Gascuel. Quartet Based Phylogenetic Inference: Improvements and Limits. Molecular Biology and Evolution. 2001, 18(6): 1103~1116
    57. P. L. Erdos, K. Rice, M. A. Steel, L.A. Szekely, T. J. Warnow. Constructing Big Trees from Short Sequences. Proceedings of the 24th International Colloquium on Automata, Languages and Programming. Bologna, Italy, 1997: 827~837
    58. V. Berry, D. Bryant, T. Jiang, P. Kearney, M. Li, T. Wareham, H. Zhang. A Practical Algorithm for Recovering the Best Supported Edges of an Evolutionary Tree. Proceedings of the 11th ACM-SIAM Symposium on Discrete Algorithms. San Francisco, CA, USA , 2000: 287~296
    59. M. Hu. A Collapsing Method for Efficient Recovery of Optimal Edges in Phylogenetic Trees. The Third IEEE Symposium on Bioinformatics andBioengineering. 2003: 95~104
    60. N. Saitou, T. Imanishi. Relative Efficiencies of the Fitch-Margoliash, Maximum-Parsimony, Maximum-Likelihood, Minimum-Evolution, and Neighbor Joining Methods of Phylogenetic Tree Construction in Obtaining the Corrent Tree. Molecular Biology and Evolution. 1989, 6(5): 514~525
    61. D. Penny. Towards a Basis for Classification: the Incompleteness of Distance Measures Incompatibility Analysis, and Phenetic Classification. Journal of Theoretical Biology, 1982, 96 (2) :129～142
    62. M. Hoder, P. O. Lewis. Phylogeny Estimation: Traditional and Bayesian Approaches. Nature. 2003, 4: 275~284
    63. David Bryant, Nicolas Galtier, Marie Anne Poursat. Mathematics of Evolution and Phylogeny: Likelihood Calculation in Molecular Phylogeny. Oxford University Press, USA, 2005
    64. Sudhindra R. Gadagkar, Sudhir Kumar. Maximum Likelihood Outforms Maximum Parsiomny Even When Evolutioanry Rates Are Heterotachous. Molecular Biology and Evolution. 2005, 22(11): 2139~2141
    65. M. Spencer, E.Susko, A.J.Roger. Likelihood, Parsimony and Heterogeneous Evolution. Molecualr Biology and Evolution. 2005, 22(5): 1161~1164
    66. J. P. Huelsenbeck. Performance of Phylogenetic Methods in Simulation. Systematic Biology. 1995, 44(1):17~48
    67. M. Rosenberg, S. Kumar. Traditional Phylogenetic Reconstruction Methods Reconstruct Shallow and Deep Evolutionary Relationship Equally Well. Molecular Biology and Evolution. 2001, 18(9):1823~1827
    68. B. Allen, M. Steel. Subtree Transfer Operations and Their Induced Metics on Evolutionary Trees. Annals of Combinatorics, 2001, 5(1): 1~15
    69. G. Ganapathy, V. Ramachandran, T. Warnow. On Contract-and-Refine Transformations between Phylogenetic trees. Proceedings of the Fifteenth ACM-SIAM Symposium on Discrete Algorithms. 2004: 893~902
    70. P. A. Goloboff. Analysing Large Datasets in Reasonable Times: Solutions for Composite Optima. Cladistics. 1999, 15(4): 415~428
    71. Z. Yang. A Heuristic Rate Smoothing Procedure for Maximum Likelihood Estimation of Species Divergence Times. Acta Zoologica Sinica. 2004, 50(4): 645~656
    72. R. C. Winkworeth, D. Bryant, P. J. Lockhart, D. Havell, V. Moulton. Biogeographic Interpretation of Splits Graphs: Least Squares Optimisation of Branch lengths. Systematic Biology. 2005, 54(1): 56~65
    73. K. G. McCracken, M. D. Sorenson. Is Homoplasy or Lineage Sorting the Source of Incongruent mtDNA and Nuclear Gene Trees in the Stiff-Tailed Ducks (Nomonyxoxyura). Systematic Biology. 2005, 54(1): 35~55
    74. Y. M. Yuan, S. Wolhauser, M. Mller, J. Klackenberg, M.W. Callmaander, P. Kupfer. Phylogeny and Biogeography of Exacum (Gentianaceae): a Disjunctive Distribution in the Indian Ocean Basin Resulting from Long Distance Dispersal and Extensive Radiation. Systematic Biology. 2005, 54(1): 21~34
    75. C. A. Stewart, D. Hart, D. K Berry, G. J. Olsen, E. A Wernert, William Fischer. Parallel Implementation and Performance of fastDNAml - a Program for Maximum Likelihood Phylogenetic Inference. Proceedings of Supercomputing, 2001:20~30
    76. P. Vogler, A. Cardoso, T. G. Barraclough. Exploring Rate Variation among and within Sites in a Densely Sampled Species Tree: Species Level Phylogenetics of North American Tiger Beetles (Genus Cicindela). Systematic Biology. 2005, 54(1): 4~20
    77. J. Kennedy, R. Eberhart. Particle Swarm Optimization. Proceedings of the IEEE International Conference on Neural Networks. 1995:1942~1948
    78. R. Ebrhart, J. Kennedy. A New Optimizer Using Particle Swarm Theory. Proceeding of the Sixth International Symposium on Micro and Human Science. Nagoya, Japan, 1995: 39~43
    79. P. J. Angeline. Evolutionary Optimization Versus Particle Swarm Optimization: Philosophy and Performance Difference. Annual Conference Center on Evolutionary Programming, San. 1998: 601~610
    80. Yuhui Shi, Russell C. Eberhart. Empirical Study of Particle Swarm Optimization. Proceedings of SCI Conference. Orlando, FL, 2000: 1945~1950
    81. R. C. Eberhart, Y. H. Shi. Evolving Artificial Neural Networks. Proceedings of International Conference on Neural Networks and Brain. Beijing, 1998: 1423~1447
    82. R. C. Eberhart, X. Hu. Human Tremor Analysis Using Particle Swarm Optimization. Proceedings of Congress on Evolutionary Computation, Washingon D.C, 1999: 1927~1930
    83. Y. Fukuyama, H. A. Yoshida. A Particle Swarm Optimization for Reactive Power and Voltage Control in Electric Power Systems. Proceedings of IEEE International Congress on Evolutionary Computation. Seoul, Korea, 2001: 496~502
    84. R. C. Eberhart, Y. H. Shi. Particle Swarm Optimization: Developments, Applications and Resources. Proceedings of Congress on Evolutionary Computation 2001. Piscataway, NJ: IEEE Press, 2001: 81~86
    85. H. Lv, W. Zhou, C. Zhou. A Discrete Particle Swarm Optimization Algorithm for Phylogenetic Tree Reconstruction. Proceedings of the Third International Conference on Machine Learning and Cybernetics. 2004:2650~2654
    86. D. R. Robinson, L. R. Foulds. Comparison of Phylogenetic Trees. Mathematical Biosciences, 1981, 53: 131~147
    87. R. DM Page. TBMap: a Taxonomic Perspective on the Phylogenetic Database TreeBASE. BMC Bioinformatics. 2007, 8(1): 158
    88. Ben Dor, B. Chor, D.Guar, R.Ophir, D.Pelleg. From Four-taxon Trees to Phylogenies: The Case of Mammalian Evolution. Proceedings of the Second Annal International Conference on Computational Molecular Biology. 1998: 9~19
    89. V. Berry, O. Gascuel. Inferring Evolutionary Trees with Strong Combinatorial Evidence. Proceedings of the Third Annal International Computing and Combinatorics Conference. 1997: 271~298
    90. P. Buneman. The Recovery of Trees from Measures of Dissimilarity. Mathematics in the Archaeological and Historical Sciences, 1971, 4(3): 387~395
    91. M. Steel. The Complexity of Reconstructing Trees from Qualitative Charactersand Subtrees. Journal of Classification. 1992, 9: 91~116
    92. D. Bryant. A Classification of Consensus Methods for Phylogenetics. BioConsensus, DIMACS Series in Discrete Mathematics and Theoretical Computer Science. 2003: 163~184
    93. K. St. John, T. Warnow, B. Moret, L. Vawter. Performance Study ofPhylogenetic Methods: (Unweighted) Quartet Methods and Neighbor Joining. Journal of Algorithms. 2003, 48(1): 174~193
    94. D. C. Hoyle, P. G. Higgs. Factors Affecting the Errors in the Estimation of Evolutionary Distances between Sequences. Molecular Biology and Evolution. 2003, 20(1):1~9
    95. Gascuel. BIONJ: An Improved Version of the NJ algorithm Based on a Simple Model of Sequence Data. Molecular Biology and Evolution. 1997, 14(7): 685~695
    96. W. J. Bruno, N. D. Socci, A. L. Halpern. Weighted Neighbor-Joining: a Likelihood-based Approach to Distance-based Phylogeny Reconstruction. Molecular Biology and Evolution. 2000, 17(1), 189~197
    97. S. Ota, W. H Li. NJML: A Hybrid Algorithm for the Neighbor-Joining and Maximum Likelihood Methods. Molecular Biology and Evolution. 2000, 17(9): 1401~1409
    98. D. D. Pollock, D.J.Zwick, J. A. Mcguire, D. M. Hills. Increased Taxon Sampling Is Advantageous for Phylogenetic Inference. Systematic Biology. 2002, 51(4): 664~671
    99. D. J. Zwick, D. M. Hills. Increased Taxon Sampling Greatly Reduces Phylogenetic Error. Systematic Biology. 2002, 51(4):588~598
    100. V. Ranwez, O. Gascuel. Improvement of Distance-based Phylogenetic Methods by a Local Maximum Likelihood Approach Using Triplets. Molecular Biology and Evolution. 2002, 19(11):1952~1956
    101. Rzhetsky, M. Nei. Theoretical Foundation of the Minimum-Evolution Method of Phylogenetic Inference. Molecular Biology and Evolution, 1993, 10: 1073~1095
    102. Rambaut, N.C.Grassly. Seq-Gen: An Application for the Monte Carlo Simulation of DNA Sequence Evolution along Phylogentic Trees. Computer Applications in Bioscience. 1997, 13: 235~238
    103. P.L. Erdos, M. Steel, L. Székély, and T. Warnow. A few logs suffice to build almost all trees– I. Random Structures and Algorithms. 1999, 14(2): 153~184
    104. Sergei L. Kosakovsky Pond, Spencer V. Muse. Column Sorting: Rapid Calculation of the Phylogenetic Likelihood Function. Systematic Biology.2004, 53(5): 685~692
    105. Stamatakis, T. Ludwig, H. Meier, M. J. Wolf. AxML: a Fast Program for Sequential and Parallel Phylogenetic Tree Calculations Based on the Maximum Likelihood Method. Proceedings of 1st IEEE Computer Society Bioinformatics Conference. 2002: 21~29
    106. Yo Yamamoto, Hidemoto Nakada, Hidetoshi Shimodaira, Satoshi Matsuoka. Parallelization of Phylogenetic Tree Inference using Grid Technologies. Lecture Notes in Computer Science. 2005, 3370: 103~116
    107. Filip Biagojevic, Alexandros Stamatakis, Chiristos D. Antonopoulos, Dimitrios S. Nikolopoulos. RAxML-Cell: Parallel Phylogenetic Tree Inference on the Cell Broadband Engine. IEEE International Parallel and Distributed Processing Symposium, 2007: 1~10
    108. Ding Biwei, Tian Yingjie, Naiyang Deng, Su Zhen, Cai Chun. Gene Expression Analysis with Support Vector Machines in Arabidopsis. Or Transactions. 2006, 10(2): 51~58
    109. N. Friedman. The Bayesian Structural EM Algorithm. In Uncertainty in Artificial Intelligence: Proceedings of the Fourteenth Conference. 1998, 129~138
    110. Gal Elidan. Learning Hidden Variables in Probabilistic Graphical Models. Dissertation of the University of Hebrew. 2004
    111. Yu Zhang, Zhidong Deng, Hongshan Jiang, Peifa Jia. Dynamic Bayesian Network(DBN) with Structure Expecation Maximization(SEM) for Modeling of Gene Network from Time Series Gene Expression Data. Lecture Notes in Computer Science. 2006, 4062: 402~407
    112. N. Friedman, M. Ninio, I. Pe'er, T. Pupko. A Structural EM Algorithm for Phylogenetic Inference. Journal of Computational Biology, 2002, 9(2):331~353
    113. Z. Wu. The Effective Energy Transformation Scheme as a Special Continuation Approach to Global Optimization with Application to Molecular Conformation. SIAM Journal on Optimization. 1996, 6(3): 748~768
    114. Rambant. Estimating the Rate of Molecular Evolution: Incoporating Non-contemporaneous Sequenses into Maximum Likelihood Phylogenies.Bioinformatics. 2000, 16(4): 395~399
    115. W.Messier and C.B.Stewart. Episodic Adaptive Evolution of Primate Lysozymes. Nature. 1997, 385:151~154
    116. R. Masuda , J. V. Lopez , J. P. Slattery , N. Yuhki , S. J. O’Brien. Molecular Phylogeny of Mitochondrial Cytochrome b and 12S rRNA Sequences in the Felidae: Ocelot and Domestic Cat Lineages. Molecular Phylogenetics and Evolution. 1996, 6 (3 ): 351~ 365
    117.金煜,景松岩,张长社,刘英萍.中国猫科动物毛的结构与属间划分.野生动物. 1995, 4 : 29 ~ 35
    118. MY Matter, DA McLennan. Phylogeny and Speciation of Felids. Cladistics. 2000, 16 : 232~253
    119. OR. Emonds, J.L. Gittleman, A. Purvis. Building Large Trees by Combining Phylogenetic Information : a Complete Phylogeny of the Extant Carnivora (Mammalia) . Biological Reviews. 1999, 74(2): 143~175
    120.胡清宇,李水福,陈玲,刘云,吴俊娇.豹骨头骨及其混伪品头骨的聚类分析.基层中药杂志. 1996, 10 (2) : 9 ~ 10
    121. Bininda Emonds. The Utility of Chemical Signals as Phylogenetic Characters: an Example from the Felidae. Biological Journal of the Linnean Society. 2001, 72 : 1 ~15