用户名: 密码: 验证码:
Screen efficiency comparisons of decision tree and neural network algorithms in machine learning assisted drug design
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:Screen efficiency comparisons of decision tree and neural network algorithms in machine learning assisted drug design
  • 作者:Qiumei ; Pu ; Yinghao ; Li ; Hong ; Zhang ; Haodong ; Yao ; Bo ; Zhang ; Bingji ; Hou ; Lin ; Li ; Yuliang ; Zhao ; Lina ; Zhao
  • 英文作者:Qiumei Pu;Yinghao Li;Hong Zhang;Haodong Yao;Bo Zhang;Bingji Hou;Lin Li;Yuliang Zhao;Lina Zhao;CAS Key Laboratory for Biomedical Effects of Nanomaterials and Nanosafety, Institute of High Energy Physics, Chinese Academy of Sciences;School of Information Engineering, Minzu University of China;University of Chinese Academy of Sciences;School of Computer, Beijing Institute of Technology;
  • 英文关键词:drug design;;affinity prediction;;protein-ligand binding;;machine learning
  • 中文刊名:JBXG
  • 英文刊名:中国科学:化学(英文版)
  • 机构:CAS Key Laboratory for Biomedical Effects of Nanomaterials and Nanosafety, Institute of High Energy Physics, Chinese Academy of Sciences;School of Information Engineering, Minzu University of China;University of Chinese Academy of Sciences;School of Computer, Beijing Institute of Technology;
  • 出版日期:2019-02-25 14:37
  • 出版单位:Science China(Chemistry)
  • 年:2019
  • 期:v.62
  • 基金:supported by the National Natural Science Foundation of China (31571026, 21727817)
  • 语种:英文;
  • 页:JBXG201904016
  • 页数:9
  • CN:04
  • ISSN:11-5839/O6
  • 分类号:110-118
摘要
In view of huge search space in drug design, machine learning has become a powerful method to predict the affinity between small molecular drug and targeting protein with the development of artificial intelligence technology. However, various machine learning algorithms including massive different parameters make the prediction framework choice to be quite difficult. In this work, we took a recent drug design competition(from XtalPi company on the DataCastle platform) as the typical case to find the optimized parameters for different machines learning algorithms and the most effective algorithm. After the parameter optimizations, we compared the typical machine learning methods as decision tree(XGBoost, LightGBM) and artificial neural network(MLP, CNN) with root-mean-square error(RMSE) and coefficient of determination(R~2) evaluation. As a result, decision tree is more effective than the neural network as LightGBM>XGBoost>CNN>MLP in the affinity prediction of the specific drug design problem with ~160000 samples. For a much larger screening task in a more complicated drug design study, the sophisticated neural network model may go beyond the decision tree algorithm after generalization enhancing and overfitting reducing. The advanced machine learning methods could extract more information of protein-ligand bindings than traditional ones and improve the screen efficiency of drug design up to 200–1000 times.
        In view of huge search space in drug design, machine learning has become a powerful method to predict the affinity between small molecular drug and targeting protein with the development of artificial intelligence technology. However, various machine learning algorithms including massive different parameters make the prediction framework choice to be quite difficult. In this work, we took a recent drug design competition(from XtalPi company on the DataCastle platform) as the typical case to find the optimized parameters for different machines learning algorithms and the most effective algorithm. After the parameter optimizations, we compared the typical machine learning methods as decision tree(XGBoost, LightGBM) and artificial neural network(MLP, CNN) with root-mean-square error(RMSE) and coefficient of determination(R~2) evaluation. As a result, decision tree is more effective than the neural network as LightGBM>XGBoost>CNN>MLP in the affinity prediction of the specific drug design problem with ~160000 samples. For a much larger screening task in a more complicated drug design study, the sophisticated neural network model may go beyond the decision tree algorithm after generalization enhancing and overfitting reducing. The advanced machine learning methods could extract more information of protein-ligand bindings than traditional ones and improve the screen efficiency of drug design up to 200–1000 times.
引文
1 Csermely P,Korcsmáros T,Kiss HJM,London G,Nussinov R.Pharmacol Therapeutics,2013,138:333-408
    2 Zhang GB,Maddili SK,Tangadanchu VKR,Gopala L,Gao WW,Cai GX,Zhou CH.Sci China Chem,2018,61:557-568
    3 Song CM,Lim SJ,Tong JC.Briefings BioInf,2009,10:579-591
    4 DiMasi JA,Hansen RW,Grabowski HG.J Health Economics,2003,22:151-185
    5 Begley CG,Ellis LM.Nature,2012,483:531-533
    6 Talele T,Khedkar S,Rigby A.Curr Top Med Chem,2010,10:127-141
    7 Mayr LM,Fuerst P.J Biomol Screen,2008,13:443-448
    8 Zhang H,Liu Y,Sun Y,Li M,Ni W,Zhang Q,Wan X,Chen Y.Sci China Chem,2017,60:366-369
    9 Liu J,Zheng N,Hu Z,Wang Z,Yang X,Huang F,Cao Y.Sci China Chem,2017,60:1136-1144
    10 Evers A,Klabunde T.J Med Chem,2005,48:1088-1097
    11 Ferrari S,Morandi F,Motiejunas D,Nerini E,Henrich S,Luciani R,Venturelli A,Lazzari S,Calo S,Gupta S,Hannaert V,Michels PAM,Wade RC,Costi MP.J Med Chem,2010,54:211-221
    12 Su P,Chen H,Wu W.Sci China Chem,2016,59:1025-1032
    13 Gerogiokas G,Calabro G,Henchman RH,Southey MWY,Law RJ,Michel J.J Chem Theor Comput,2013,10:35-48
    14 Rastelli G,Del Rio A,Degliesposti G,Sgobba M.J Comput Chem,2010,31:797-810
    15 Sliwoski G,Kothiwale S,Meiler J,Lowe EW.Pharmacol Rev,2014,66:334-395
    16 Montavon G,Rupp M,Gobre V,Vazquez-Mayagoitia A,Hansen K,Tkatchenko A,Müller KR,Anatole von Lilienfeld O.New J Phys,2013,15:095003
    17 Ain QU,Aleksandrova A,Roessler FD,Ballester PJ.WIREs Comput Mol Sci,2015,5:405-424
    18 Kurczab R,Smusz S,Bojarski AJ.J Cheminform,2014,6:32
    19 Domingos P.Commun ACM,2012,55:78
    20 Jordan MI,Mitchell TM.Science,2015,349:255-260
    21 Sidorov G,Velasquez F,Stamatatos E,Gelbukh A,Chanona-Hernández L.Expert Syst Appl,2014,41:853-860
    22 Nanni L,Lumini A,Ferrara M,Cappelli R.Neurocomputing,2015,149:526-535
    23 Libbrecht MW,Noble WS.Nat Rev Genet,2015,16:321-332
    24 Michalski RS,Carbonell JG,Mitchell TM.Machine Learning:An Artificial Intelligence Approach.Berlin-Heidelberg:Springer Science&Business Media,2013
    25 Lavecchia A.Drug Discov Today,2015,20:318-331
    26 Murphy RF.Nat Chem Biol,2011,7:327-330
    27 Barros RC,Basgalupp MP,de Carvalho ACPLF,Freitas AA.IEEETrans Syst Man Cybern C,2012,42:291-312
    28 Fan CY,Chang PC,Lin JJ,Hsieh JC.Appl Soft Comput,2011,11:632-644
    29 Garg V,Kumar H,Sinha R.Speech based emotion recognition based on hierarchical decision tree with SVM,BLG and SVR classifiers.In:2013 National Conference on Communications.New Delhi:IEEE,2013.1-5
    30 Zhang Z.Artificial neural network.In:Zhang Z,Ed.Multivariate Time Series Analysis in Climate and Environmental Research.Cham:Springer,2018.1-35
    31 Li H,Lin Z,Shen X,Brandt J,Hua G.A convolutional neural network cascade for face detection.In:Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Boston:IEEE,2015.5325-5334
    32 Moal IH,Agius R,Bates PA.Bioinformatics,2011,27:3002-3009
    33 Medina F,Aguila S,Baratto MC,Martorana A,Basosi R,Alderete JB,Vazquez-Duhalt R.Enzyme Microbial Tech,2013,52:68-76
    34 Pereira JC,Caffarena ER,Dos Santos CN.J Chem Inf Model,2016,56:2495-2506
    35 Tian K,Shao M,Wang Y,Guan J,Zhou S.Methods,2016,110:64-72
    36 http://www.dcjingsai.com/
    37 Gilson MK,Liu T,Baitaluk M,Nicola G,Hwang L,Chong J.Nucleic Acids Res,2015,44:D1045-D1053
    38 Rehurek R,Sojka P.Software framework for topic modelling with large corpora.In:Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks.Valletta:IEEE,2010
    39 Chen T,Guestrin C.Xgboost:a scalable tree boosting system.In:Proceedings of the 22nd Acm Sigkdd International Conference on Knowledge Discovery and Data Mining.San Francisco,2016.785-794
    40 Ke G,Meng Q,Finley T,Wang T,Chen W,Ma W,Ye Q,Liu TY.Lightgbm:a highly efficient gradient boosting decision tree.In:Advances in Neural Information Processing Systems.Long Beach,2017.3146-3154
    41 Tang J,Deng C,Huang GB.IEEE Trans Neural Netw Learn Syst,2016,27:809-821
    42 Krizhevsky A,Sutskever I,Hinton GE.Imagenet classification with deep convolutional neural networks.In:Advances in Neural Information Processing Systems.Lake Tahoe,2012.1097-1105
    43 Chen T,He T,Benesty M.Xgboost:extreme gradient boosting.RPackage Version 0.4-2,2015.1-4
    44 Orhan U,Hekim M,Ozer M.Expert Syst Appl,2011,38:13475-13481
    45 Zare M,Pourghasemi HR,Vafakhah M,Pradhan B.Arab J Geosci,2013,6:2873-2888
    46 Oquab M,Bottou L,Laptev I,et al.Learning and transferring midlevel image representations using convolutional neural networks.In:Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Columbus,2014.1717-1724
    47 Kim Y.Convolutional neural networks for sentence classification.arXiv preprint,1408.5882,2014
    48 Vedaldi A,Lenc K.Matconvnet:convolutional neural networks for MATLAB.In:Proceedings of the 23rd ACM International Conference on Multimedia.New York:ACM,2015.689-692
    49 Chai T,Draxler RR.Geosci Model Dev Discuss,2014,7:1525-1534
    50 Lee SH,Goddard ME,Wray NR,Visscher PM.Genet Epidemiol,2012,36:214-224

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700