基于CUDA的蛋白质翻译后修饰鉴定MS-Alignment算法加速研究

英文篇名：Research of CUDA-based acceleration of MS-Alignment for identification of post-translational modifications
中文刊名：计算机应用研究
英文刊名：Application Research of Computers
作者：翟艳堂 ; 涂强 ; 郎显宇 ; 陆忠华 ; 迟学斌
英文作者：ZHAI Yan-tang1 ; 2 ; TU Qiang1 ; LANG Xian-yu1 ; LU Zhong-hua1 ; CHI Xue-bin1 ( 1. Supercomputing Center ; Computer Network Information Center ; Chinese Academy of Science ; Beijing 100190 ; China ; 2. Graduate University of Chinese Academy of Science ; Beijing 100049 ; China)
中文关键词：蛋白质翻译后修饰鉴定 ; MS-Alignment ; 图形处理器 ; 统一计算设备架构
英文关键词：identification of post-translational modifications ; MS-Alignment ; graphics processing units ( GPU) ; compute unified device architecture ( CUDA)
出版日期：2010-09-15
机构：中国科学院计算机网络信息中心超级计算中心;中国科学院研究生院;
年：2010
期：09
出版单位：计算机应用研究

摘要

对MS-Alignment算法进行分析得出该算法很难满足大规模数据对鉴定速度的要求,而且具有的一个特点是相同的任务在不同的数据上重复计算,为数据划分提供了基础。基于CUDA编程模型使用图形处理器(GPU)对步骤数据库检索及候选肽段生成进行加速优化,设计了该步骤在单GPU上的实现方法。测试结果表明,此方法平均加速比为30倍以上,效果良好,可以满足蛋白质翻译后修饰鉴定中大规模数据快速计算的需求。
This paper firstly analyzed MS-Alignment. It could not well meet the challenge of large scale data. One of its features was the same computing operations repeat on different data. This feature provided base for data partition. This paper then used GPU ( graphics processing units) to accelerate the step of database search and candidate generation. And it presented an optimized method based on CUDA ( compute unified device architecture) programming model on single GPU. The experimental results show that the average speedup ratio is more than 30,and the method effectively improves identification speed and is applicable for large scale data requiring for high-speed processing.

引文

[1]NA S J,JEONG J H,PARK H J,et al.Unrestrictive identification of multiple post-translational modifications from tandem mass spectrome-try using an error-tolerant algorithm based on an extended sequence tag approach[J].Molecular and Cellular Proteomics,2008,7(12):2452-2463.
    [2]TSUR D,TANNER S,ZANDI E,et al.Identification of post-transla-tional modifications via blind search of mass-spectra[J].Nature Bio-technology,2005,23:1562-1567.
    [3]谢靖宇,谢深泉.一种鉴定蛋白质突变和翻译后修饰的算法[J].计算机工程与应用,2007,43(28):61-64.
    [4]FRANK A M.Algorithms for tandem mass spectrometry-based pro-teomics[D].San Diego:University of California,2008.
    [5]MANAVSKI S A,VALLE G.CUDA compatible GPU cards as effi-cient hardware accelerators for Smith-Waterman sequence alignment[J].BMC Bioinformatics,2008,9(Suppl2):S10.
    [6]涂强.蛋白质翻译后修饰鉴定软件InsPecT的并行及优化研究[D].北京:中国科学院研究生院,2009.
    [7]NVIDIA Corporation.NVIDIA CUDA Programming Guide version2.3.1[R].2009.
    [8]FESTER T,SCHREIBER F,STRICKERT M.CUDA-based multi-core implementation of MDS-based bioinformatics algorithms[C]//Proc of German Conference on Bioinformatics.2009:67-79.
    [9]李博,刘国峰,刘洪.地震叠前时间偏移的一种图形处理器提速实现方法[J].地球物理学报,2009,52(1):245-252.
    [10]张庆丹,戴正华,冯圣中,等.基于GPU的串匹配算法研究[J].计算机应用,2006,26(7):1735-1737.
    [11]NVIDIA Corporation.Tesla BIO Workbench-助力新型科学[EB/OL].[2010-03-11].http://www.nvidia.cn/object/tesla_bio_work-bench_cn.html.
    [12]NVIDIA Corporation.Tesla Bio Workbench帮助科学家在生物科学领域取得全新突破[EB/OL].[2010-03-11].http://www.nvidia.cn/object/io_1264405248416.html.
    [13]SCHATZ M C,TRAPNELL C,DELCHER A L,et al.High-through-put sequence alignment using graphics processing units[J].BMC Bioinformatics,2007,8(1):474.
    [14]LIU Yong-chao,MASKELL D L,SCHMIDT B.CUDASW++:op-timizing Smith-Waterman sequence database searched for CUDA-ena-bled graphics processing units[J].BMC Research Notes,2009,2(1):73.
    [15]LIGOWSKI L,RUDNICKI W.An efficient implementation of smith waterman algorithm on GPU using CUDA,for massively parallel scan-ning of sequence databases[C]//Proc of IEEE International Work-shop on High Performance Computational Biology.2009:1-8.
    [16]KIRK D,HWU Wen-mei.ECE498AL:applied parallel programming[EB/OL].(2010)[2010-03-11].http://courses.ece.illinois.edu/ece498/al/.
    [17]张舒,褚艳利.GPU高性能运算之CUDA[M].北京:中国水利水电出版社,2009:14,44,58,143,152,166.
    [18]TANNER S,SHU Hong-jun,FRANK A,et al.Inspect:fast and ac-curate identification of post-translationally modified peptides from tan-dem mass spectra[J].Analytical Chemistry,2005,77(14):4626-4639.
    [19]UENG S Z,LATHARA M,BAGHSORKHI S S,et al.CUDA-lite:re-ducing GPU programming complexity:languages and compilers for par-allel computing[C]//Proc of the21th International Workshop.2008:1-15.
    [20]邓仰东.NVIDIA CUDA超大规模并行程序设计训练课程:性能提升[EB/OL].(2009)[2010-03-11].http://cuda.csdn.net/Client/CUDA_lecture.rar.