艾滋病病毒进化的数学分析

作者：陈泽凯
关键词：艾滋病病毒 ; 自然向量 ; Python ; Mega ; 进化树
中文刊名：TXWL
英文刊名：China New Telecommunications
机构：陕西师范大学附属中学;
出版日期：2019-04-20
出版单位：中国新通信
年：2019
期：v.21
语种：中文;
页：TXWL201908174
页数：3
CN：08
ISSN：11-5402/TN
分类号：218-220

摘要

本文通过运用Yau 2011提出的自然向量法,对艾滋病病毒HIV-1, HIV-2, PLV三类病毒进行进化分析。采用Python计算序列的自然向量以及序列间两两的距离,之后利用Mega对计算好的距离矩阵画进化树。为了测试本文方法的可行性,本文选取了HIV Sequence Database的20条全基因数据进行研究,分别用我们的方法和传统的MSA(多序列比对)画进化树。得到的结果显示我们的方法明显优于MSA,而且在耗时上我们也优于MSA。因此,我们的方法能为艾滋病病毒在进化方面的研究提供有利的工具。

引文

[1] Amano K, Nakamura H, Ichikawa H. Self-organizing clustering:a novel non-hierarchical method for clustering large amount of DNA sequences[J]. Genome Informatics, 2003, 14:575-576.
    [2] Emrich S J, Kalyanaraman A, Aluru S. Algorithms for large-scale clustering and assembly of biological sequence data[J]. Handbook of Computational Molecular Biology. pp, 2006:13.1-13.30.
    [3] FitzGerald P C, Shlyakhtenko A, Mir A A, et al. Clustering of DNA sequences in human promoters[J]. Genome research, 2004, 14(8):1562-1574.
    [4] Waterman M S. Introduction to computational biology:maps, sequences and genomes[M]. CRC Press, 1995.
    [5] Abe T, Kanaya S, Kinouchi M, et al. Informatics for unveiling hidden genome signatures[J]. Genome research, 2003, 13(4):693-702.
    [6] Chuzhanova N A, Jones A J, Margetts S. Feature selection for genetic sequence classification[J]. Bioinformatics(Oxford, England),1998, 14(2):139-143.
    [7] Karlin S, Ladunga I. Comparisons of eukaryotic genomic sequences[J]. Proceedings of the National Academy of Sciences, 1994, 91(26):12832-12836.
    [8] Nakashima H, Ota M, Nishikawa K, et al. Genes from nine genomes are separated into their organisms in the dinucleotide composition space[J]. DNA Research, 1998, 5(5):251-259.
    [9] Yau S S T, Wang J, Niknejad A, et al. DNA sequence representation without degeneracy[J]. Nucleic acids research, 2003, 31(12):3078-3080.
    [10] Liu L, Ho Y, Yau S. Clustering DNA sequences by feature vectors[J]. Molecular phylogenetics and evolution, 2006, 41(1):64-69.
    [11] Yau S S T, Yu C, He R. A protein map and its application[J]. DNA and cell biology, 2008, 27(5):241-250.
    [12] Carr K, Murray E, Armah E, et al. A rapid method for characterization of protein relatedness using feature vectors[J]. PLoS One,2010, 5(3):e9550.
    [13] Yu C, Liang Q, Yin C, et al. A novel construction of genome space with biological geometry[J]. DNA research, 2010, 17(3):155-168.
    [14] Larkin M A, Blackshields G, Brown N P, et al. Clustal W and Clustal X version 2.0[J]. bioinformatics, 2007, 23(21):2947-2948.
    [15] Edgar R C. MUSCLE:a multiple sequence alignment method with reduced time and space complexity[J]. BMC bioinformatics, 2004,5(1):113.
    [16] Katoh K, Misawa K, Kuma K, et al. MAFFT:a novel method for rapid multiple sequence alignment based on fast Fourier transform[J].Nucleic acids research, 2002, 30(14):3059-3066.
    [17] Wang L, Jiang T. On the complexity of multiple sequence alignment[J]. Journal of computational biology, 1994, 1(4):337-348.
    [18] Musto H, CacciòS, Rodríguez-Maseda H, et al. Compositional constraints in the extremely GC-poor genome of Plasmodium falciparum[J]. Memórias do Instituto Oswaldo Cruz, 1997, 92(6):835-841.
    [19] Deng M, Yu C, Liang Q, et al. A novel method of characterizing genetic sequences:genome space with biological distance and applications[J]. PloS one, 2011, 6(3):e17293.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700