基于内存计算的基因疾病搜索系统

英文篇名：Genetic disease search system based on memory computing
作者：杨勤 ; 臧天仪
英文作者：YANG Qin;ZANG Tianyi;School of Computer Science and Technology,Harbin Institute of Technology;
关键词：基因型 ; 疾病表型 ; 大数据 ; TrustRank
英文关键词：genotype;;disease phenotype;;big data;;TrustRank
中文刊名：DLXZ
英文刊名：Intelligent Computer and Applications
机构：哈尔滨工业大学计算机科学与技术学院;
出版日期：2019-01-01
出版单位：智能计算机与应用
年：2019
期：v.9
语种：中文;
页：DLXZ201901067
页数：5
CN：01
ISSN：23-1573/TN
分类号：275-279

摘要

随着现代测序技术的发展,产生海量生物数据,快速发展的生物信息学也在不断剖析这些数据的隐藏生物信息。通过生物网络研究基因型与疾病表型的关联关系从而实现致病基因的预测和寻找基因导致的疾病。基于疾病基因模块性特征,提出整合蛋白质相互作用网络、疾病表型相似性网络、疾病-基因对应网络,构建异构生物网络,改进网页排序算法TrustRank,对候选基因与疾病进行优先级排序,实现预测功能。本文还将通过Spark平台开发基因疾病搜索系统,数据存储在HBase中,形成大数据存储、处理、分析的解决方案,对临床诊断和疾病治疗提供新思路。
With the development of modern sequencing technology,resulting in massive biological data,the rapid development of Bioinformatics is also constantly analyzing the hidden information of these data. Through the biological network to study the relationship between genotype and disease phenotype,the prediction of pathogenic genes could be achieved and diseases caused by the genes found. Based on the modular nature of the disease gene,the paper proposes to integrate the protein interaction network,disease phenotype similarity network,disease-gene correspondence network,construct heterogeneous biological network,improve the web page sorting algorithm TrustRank,prioritize candidate genes and realize diseases forecasting function. This paper also develops the genetic disease search system through the Spark platform. The data are stored in HBase,which form a large data storage,processing and analysis solution,and provide newideas for clinical diagnosis and disease treatment.

引文

[1]LI Yongjin,LI Jinyan. Disease gene identification by random walk on multigraphs merging heterogeneous genomic and phenotype data[J]. BMC Genomics,2012,13(S7):S27.
    [2]ZHAO Z Q,HAN G S,YU Z G,et al. Laplacian normalization and random w alk on heterogeneous netw orks for disease-gene prioritization[J]. Computational Biology&Chemistry,2015,57(C):21-28.
    [3]KHLER S,BAUER S,HORN D,et al. Walking the interactome for prioritization of candidate disease genes[J]. American Journal of Human Genetics,2008,82(4):949-958.
    [4] LAGE K,KARLBERG E O,STRLING Z M,et al. A human phenome-interactome netw ork of protein complexes implicated in genetic disorders[J]. Nature biotechnology,2007,25(3):309-316.
    [5] MASINO A J,DECHENE E T,DULIK M C,et al. Clinical phenotype-based gene prioritization:An initial study using semantic similarity and the human phenotype ontology[J]. BM C bioinformatics,2014,15(1):248.
    [6]魏春水.基于随机游走的疾病-基因关联算法[D].西安:西安电子科技大学,2012.
    [7]LE D H,DANG V T. Ontology-based disease similarity network for disease gene prediction[J]. Vietnam Journal of Computer Science,2016,3(3):197-205.
    [8] RAMOS E M,HOFFMAN D,JUNKINS H A,et al. Phenotypegenotype integrator(Phe GenI):Synthesizing genome-wide association study(GWAS)data with existing genomic resources[J].European Journal of Human Genetics,2014,22(1):144-147.
    [9]VAN DRIEL M A,BRUGGEMAN J,VRIEND G,et al. A textmining analysis of the human phenome[J]. European Journal of human genetics,2006,14(5):535-542.
    [10]SCOTT A F,AMBERGER J,BRYLAWSKI B,et al. OMIM:Online mendelian inheritance in man[M]//ETOVSKY S.Bioinformatics:Databases and Systems. Boston,M A:Springer,2002:57-61.
    [11]GYNGYI Z,GARCIA-MOLINA H,PEDERSEN J. Combating w eb spam w ith trustrank[C]//Proceedings of the 30thinternational conference on Very large data bases-Volume 30. Toronto,Canada:M organ Kaufmann,2004:576-587.
    [12]GUO Xingli,GAO Lin,WEI Chunshui,et al. A computational method based on the integration of heterogeneous netw orks for predicting disease-gene associations[J]. PloS one,2011,6(9):e24171.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700