基于网络属性的抗肿瘤药物靶点预测方法及其应用
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:Predicting Antineoplastic Drug Targets Based on Network Properties
  • 作者:范馨月 ; 崔雷
  • 英文作者:Fan Xinyue;Cui Lei;School of Medical Informatics,China Medical University;
  • 关键词:PPI网络 ; 机器学习 ; 决策树 ; 抗肿瘤药靶点预测
  • 英文关键词:PPI Network;;Machine Learning;;Decision Tree;;Antineoplastic Drug Targets Prediction
  • 中文刊名:XDTQ
  • 英文刊名:Data Analysis and Knowledge Discovery
  • 机构:中国医科大学医学信息学院;
  • 出版日期:2018-12-25
  • 出版单位:数据分析与知识发现
  • 年:2018
  • 期:v.2;No.24
  • 基金:赛尔网络下一代互联网技术创新项目“面向高等院校的医学影像学教学平台”(项目编号:NGII20150503)的研究成果之一
  • 语种:中文;
  • 页:XDTQ201812011
  • 页数:11
  • CN:12
  • ISSN:10-1478/G2
  • 分类号:102-112
摘要
【目的】旨在发现潜在的抗肿瘤药物作用靶点,为日后临床工作及实验验证提供参考。【方法】从DrugBank数据库获取抗肿瘤药物靶点,结合HPRD数据库中蛋白质相互作用信息,使用Cytoscape建立药物靶点PPI网络并计算网络节点的拓扑属性,使用SPSS单因素分析和Weka信息增益原理筛选拓扑属性变量,采用SMOTE算法处理不平衡数据集问题,利用决策树方法构建抗肿瘤药物靶点预测模型,并与其他三种常见的机器学习分类算法模型进行性能比较。【结果】应用决策树算法构建的抗肿瘤药物靶点预测模型的预测准确率达73.18%,在CBioPortal中验证发现,结果中预测分数大于等于0.9的16个靶点在多种肿瘤中存在突变和扩增,并以NR5A1为例进行具体分析。【局限】仅使用抗肿瘤药物靶点的PPI网络属性构建预测模型,未加入靶点的功能、序列属性等特征。【结论】基于PPI网络的拓扑属性,采用机器学习方法对潜在的抗肿瘤药物靶点进行预测是有效的,可以为抗肿瘤药物的研发及临床工作提供一定参考。
        [Objective] This paper tries to identify potential targets of antineoplastic drugs, aiming to provide references for future clinical work and experiment. [Methods] First, we retrieved the targets of antineoplastic drugs from the DrugBank database, which were also combined with the protein interaction information from the HPRD database. Then, we established the PPI network for these targets with Cytoscape and calculated the topology properties of the nodes. Third, we used SPSS single factor analysis and Weka's information gain principle to choose the variables for topological attributes. Fourth, we introduced the SMOTE algorithm to process unbalanced data sets and constructed the prediction model for antineoplastic drug targets with the decision tree method. Finally, we compared the performance of our new model with those of the classic ones. [Results] The precision of the proposed model reached 73.18%. With the help of CBioPortal, we found 16 targets' prediction scores higher than 0.9. These targets could mutate and amplify in various tumors, which were analyzed with the case of NR5A1. [Limitations] The characteristics of target functions, sequence attributes, and other factors should also be included to construct the model. [Conclusions] The proposed model could predict the potential targets of antineoplastic drugs effectively.
引文
[1]Allemani C,Matsuda T,Di Carlo V,et al.Global Surveillance of Trends in Cancer Survival 2000-14(CONCORD-3):Analysis of Individual Records for 37513025 Patients Diagnosed with One of 18 Cancers from 322Population-based Registries in 71 Countries[J].The Lancet,2018,391(10125):1023-1075.
    [2]陈万青,孙可欣,郑荣寿,等.2014年中国分地区恶性肿瘤发病和死亡分析[J].中国肿瘤,2018,27(1):1-14.(Chen Wanqing,Sun Kexin,Zheng Rongshou,et al.Report of Cancer Incidence and Mortality in Different Areas of China,2014[J].China Cancer,2018,27(1):1-14.)
    [3]Futreal P A,Coin L,Marshall M,et al.A Census of Human Cancer Genes[J].Nature Reviews Cancer,2004,4(3):177-183.
    [4]Strausberg R L,Simpson A J,Wooster R.Sequence-based Cancer Genomics:Progress,Lessons and Opportunities[J].Nature Reviews Genetics,2003,4(6):409-418.
    [5]Ostlund G,Lindskog M,Sonnhammer E L.Network-based Identification of Novel Cancer Genes[J].Molecular&Cellular Proteomics,2010,9(4):648-655.
    [6]Li L,Zhang K,Lee J,et al.Discovering Cancer Genes by Integrating Network and Functional Properties[J].BMCMedical Genomics,2009,2:61-74.
    [7]尚振伟,李晋,姜永帅,等.基于SVM的药物靶点预测方法及其应用[J].现代生物医学进展,2012,12(20):3943-3946.(Shang Zhenwei,Li Jin,Jiang Yongshuai,et al.AMethod of Drug Target Prediction Based on SVM and Its Application[J].Progress in Modern Biomedicine,2012,12(20):3943-3946.)
    [8]谢倩倩,李订芳,章文.基于集成学习的离子通道药物靶点预测[J].计算机科学,2015,42(4):177-180.(Xie Qianqian,Li Dingfang,Zhang Wen.Predicting Potential Drug Targets for Ion Channel Proteins Based on Ensemble Learning[J].Computer Science,2015,42(4):177-180.)
    [9]蔡立葛.基于失衡数据挖掘的药物靶点预测方法研究[D].哈尔滨:哈尔滨理工大学,2017.(Cai Lige.Research on the Prediction of Drug Targets Based on Imbalance Data Mining[D].Harbin:Harbin University of Science and Technology,2017.)
    [10]Carson M B,Lu H.Network-based Prediction and Knowledge Mining of Disease Genes[J].BMC Medical Genomics,2015,8(S2):S9.
    [11]Jing Y,Bian Y,Hu Z,et al.Deep Learning for Drug Design:An Artificial Intelligence Paradigm for Drug Discovery in the Big Data Era[J].The AAPS Journal,2018,20(3):58.
    [12]Ferrero E,Dunham I,Sanseau P.In Silico Prediction of Novel Therapeutic Targets Using Gene-Disease Association Data[J].Journal of Translational Medicine,2017,15(1):182.
    [13]Wishart D S,Knox C,Guo A C,et al.DrugBank:AKnowledgebase for Drugs,Drug Actions and Drug Targets[J].Nucleic Acids Research,2008,36(Database Issue):901-906.
    [14]Keshava Prasad T S,Goel R,Kandasamy K,et al.Human Protein Reference Database[J].Nucleic Acids Research,2008,37(S1):767-772.
    [15]Shannon P,Markiel A,Ozier O,et al.Cytoscape:A Software Environment for Integrated Models of Biomolecular Interaction Networks[J].Genome Research,2003,13(11):2498-2504.
    [16]Hall M,Frank E,Holmes G,et al.The WEKA Data Mining Software:An Update[J].ACM SIGKDD Explorations Newsletter,2009,11(1):10-18.
    [17]Han L,Cui J,Lin H,et al.Recent Progresses in the Application of Machine Learning Approach for Predicting Protein Functional Class Independent of Sequence Similarity[J].Proteomics,2006,6(14):4023-4037.
    [18]Chawla N V,Bowyer K W,Hall L O,et al.SMOTE:Synthetic Minority Over-Sampling Technique[J].Journal of Artificial Intelligence Research,2002,16(1):321-357.
    [19]杜景林,严蔚岚.基于距离权值的C4.5组合决策树算法[J].计算机工程与设计,2018,39(1):96-102.(Du Jinglin,Yan Weilan.Multiple Classifiers of C4.5 Decision Tree Based on Distance Weight[J].Computer Engineering and Design,2018,39(1):96-102.)
    [20]黄秀霞,孙力.C4.5算法的优化[J].计算机工程与设计,2016,37(5):1265-1270.(Huang Xiuxia,Sun Li.Optimization of C4.5 Algorithm[J].Computer Engineering and Design,2016,37(5):1265-1270.)
    [21]Cerami E,Gao J,Dogrusoz U,et al.The cBio Cancer Genomics Portal:An Open Platform for Exploring Multidimensional Cancer Genomics Data[J].Cancer Discovery,2012,2(5):401-404.
    [22]Delaney J R,Patel C B,Willis K M,et al.Haploinsufficiency Networks Identify Targetable Patterns of Allelic Deficiency in Low Mutation Ovarian Cancer[J].Nature Communications,2017,8:Article No.14423.
    (1)http://cbioportal.org/.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700