When drug discovery meets web search: Learning to Rank for ligand-based virtual screening
详细信息    查看全文
  • 作者:Wei Zhang (1)
    Lijuan Ji (2)
    Yanan Chen (1)
    Kailin Tang (1)
    Haiping Wang (1) (4)
    Ruixin Zhu (1)
    Wei Jia (3)
    Zhiwei Cao (1)
    Qi Liu (1)

    1. Department of Central Laboratory
    ; Shanghai Tenth People鈥檚 Hospital ; School of Life Sciences and Technology ; Tongji University ; Shanghai ; China
    2. Huai鈥檃n Second People鈥檚 Hospital affiliated to Xuzhou Medical College
    ; Huai鈥檃n ; China
    4. Department of Computer Science
    ; Hefei University of Technology ; Hefei ; 230009 ; China
    3. R & D Information
    ; AstraZeneca ; Shanghai ; China
  • 关键词:Learning to Rank ; Virtual screening ; Drug discovery ; Data integration
  • 刊名:Journal of Cheminformatics
  • 出版年:2015
  • 出版时间:December 2015
  • 年:2015
  • 卷:7
  • 期:1
  • 全文大小:2,266 KB
  • 参考文献:1. Agarwal S, Dugar D, Sengupta S. Ranking Chemical Structures for Drug Discovery: A New Machine Learning Approach. J Chem Inf Model. 2010;50(5):716鈥?1. CrossRef
    2. Shoichet BK. Virtual screening of chemical libraries. Nature. 2004;432(7019):862鈥?. CrossRef
    3. Walters WP, Stahl MT, Murcko MA. Virtual screening鈥揳n overview. Drug Discov Today. 1998;3(4):160鈥?8. CrossRef
    4. Fechner U, Schneider G. Evaluation of Distance Metrics for Ligand鈥怋ased Similarity Searching. Chem BioChem. 2004;5(4):538鈥?0.
    5. Nantasenamat C, Isarankura-Na-Ayudhya C, Naenna T, Prachayasittikul V. A practical overview of quantitative structure-activity relationship. EXCLI J. 2009;8:74鈥?8.
    6. Trotman A. Learning to rank. Inf Retr. 2005;8(3):359鈥?1. CrossRef
    7. Liu T-Y. Learning to rank for information retrieval. Foundations and Trends in Information Retrieval. 2009;3(3):225鈥?31. CrossRef
    8. Wassermann AM, Geppert H, Bajorath JR. Searching for target-selective compounds using different combinations of multiclass support vector machine ranking methods, kernel functions, and fingerprint descriptors. J Chem Inf Model. 2009;49(3):582鈥?2. CrossRef
    9. Rathke F, Hansen K, Brefeld U, Muller KR. StructRank: A New Approach for Ligand-Based Virtual Screening. J Chem Inf Model. 2011;51(1):83鈥?2. CrossRef
    10. Wale N, Karypis G. Target Fishing for Chemical Compounds Using Target-Ligand Activity Data and Ranking Based Methods. J Chem Inf Model. 2009;49(10):2190鈥?01. CrossRef
    11. Li S, Leihong W, Xiaohui F, Yiyu C. Consensus Ranking Approach to Understanding the Underlying Mechanism With QSAR. J Chem Inf Model. 2010;50(11):1941鈥?. CrossRef
    12. Al-Sharrah G. Ranking Using the Copeland Score: A Comparison with the Hasse Diagram. J Chem Inf Model. 2010;50(5):785鈥?1. CrossRef
    13. Lerche D, S酶rensen PB, Br眉ggemann R. Improved Estimation of the Ranking Probabilities in Partial Orders Using Random Linear Extensions by Approximation of the Mutual Ranking Probability. J Chem Inf Model. 2003;43(5):1471鈥?0. CrossRef
    14. Crammer K, Singer Y. Pranking with ranking. Adv Neur In. 2002;14:641鈥?.
    15. Van Dang: RankLib [people.cs.umass.edu/~vdang/ranklib.html" class="a-plus-plus">http://people.cs.umass.edu/~vdang/ranklib.html]
    16. Burges CJ. From ranknet to lambdarank to lambdamart: An overview. Learning. 2010;11:23鈥?81.
    17. Freund Y, Iyer R, Schapire RE, Singer Y. An efficient boosting algorithm for combining preferences. J Mach Learn Res. 2004;4(6):933鈥?9.
    18. Joachims T. Optimizing search engines using clickthrough data. In Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, ACM; 2002: 133鈥?42.
    19. Joachims T. Training linear SVMs in linear time. In Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, ACM; 2006: 217鈥?26.
    20. Xu J, Li H. Adarank: a boosting algorithm for information retrieval. In Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, ACM; 2007: 391鈥?98.
    21. Cao Z, Qin T, Liu T-Y, Tsai M-F, Li H. Learning to rank: from pairwise approach to listwise approach. In Proceedings of the 24th international conference on Machine learning, ACM; 2007: 129鈥?36.
    22. Chang C-C, Lin C-J. LIBSVM: a library for support vector machines. ACM Transactions on Intelligent Systems and Technology (TIST). 2011;2(3):27.
    23. Jacob L, Vert J-P. Protein-ligand interaction prediction: an improved chemogenomics approach. Bioinformatics. 2008;24(19):2149鈥?6. CrossRef
    24. Liu Q, Che D, Huang Q, Cao Z, Zhu R. Multi鈥恡arget QSAR Study in the Analysis and Design of HIV鈥? Inhibitors. Chin J Chem. 2010;28(9):1587鈥?2. CrossRef
    25. Liu Q, Zhou H, Liu L, Chen X, Zhu R, Cao Z. Multi-target QSAR modelling in the analysis and design of HIV-HCV co-inhibitors: an in-silico study. BMC Bioinformatics. 2011;12(1):294. CrossRef
    26. Liu Q, Xu Q, Zheng VW, Xue H, Cao Z, Yang Q. Multi-task learning for cross-platform siRNA efficacy prediction: an in-silico study. BMC Bioinformatics. 2010;11(1):181. CrossRef
    27. Gao J, Che D, Zheng VW, Zhu R, Liu Q. Integrated QSAR study for inhibitors of hedgehog signal pathway against multiple cell lines: a collaborative filtering method. BMC Bioinformatics. 2012;13(1):186. CrossRef
    28. Gao J, Huang Q, Wu D, Zhang Q, Zhang Y, Chen T, et al. Study on human GPCR鈥搃nhibitor interactions by proteochemometric modeling. Gene. 2013;518(1):124鈥?1. CrossRef
    29. Wu D, Huang Q, Zhang Y, Zhang Q, Liu Q, Gao J, et al. Screening of selective histone deacetylase inhibitors by proteochemometric modeling. BMC Bioinformatics. 2012;13(1):212. CrossRef
    30. Shen Z, Huang Q, Kang H, Liu Q, Cao Z, Zhu R. A new fingerprint of chemical compounds and its application for virtual drug screens. ACTA CHIMICA SINICA. 2011;69(1):1845鈥?0.
    31. Huang S. Genomics, complexity and drug discovery: insights from Boolean network models of cellular regulation. Pharmacogenomics. 2001;2(3):203鈥?2. CrossRef
    32. Adkins DE, 脜berg K, McClay JL, Buksz谩r J, Zhao Z, Jia P, et al. Genomewide pharmacogenomic study of metabolic side effects to antipsychotic drugs. Mol Psychiatry. 2011;16(3):321鈥?2. CrossRef
    33. Wang Y, Bolton E, Dracheva S, Karapetyan K, Shoemaker BA, Suzek TO, et al. An overview of the PubChem BioAssay resource. Nucleic Acids Res. 2010;38 suppl 1:255鈥?6. CrossRef
    34. Wang Y, Xiao J, Suzek TO, Zhang J, Wang J, Zhou Z, et al. PubChem's BioAssay database. Nucleic Acids Res. 2012;40(D1):D400鈥?2. CrossRef
    35. Muslea I, Minton S, Knoblock CA. Active鈥?鈥塻emi-supervised learning鈥?鈥塺obust multi-view learning. ICML. 2002;2:435鈥?2.
    36. Pan SJ, Yang Q. A survey on transfer learning. Knowledge and Data Engineering, IEEE Transactions on. 2010;22(10):1345鈥?9. CrossRef
    37. Li H. Learning to rank for information retrieval and natural language processing. Synthesis Lectures Human Language Technol. 2011;4(1):1鈥?13. CrossRef
    38. Chang K.-Y. A Survey on Learning to Rank. 2010
    39. Labute P. A widely applicable set of descriptors. J Mol Graph Model. 2000;18(4):464鈥?7. CrossRef
    40. Li Z-R, Lin HH, Han L, Jiang L, Chen X, Chen YZ. PROFEAT: a web server for computing structural and physicochemical features of proteins and peptides from amino acid sequence. Nucleic Acids Res. 2006;34 suppl 2:32鈥?. CrossRef
    41. Chapelle O, Metlzer D, Zhang Y, Grinspan P. Expected reciprocal rank for graded relevance. In Proceedings of the 18th ACM conference on Information and knowledge management, ACM; 2009: 621鈥?30.
    42. Yue Y, Finley T, Radlinski F, Joachims T. A support vector method for optimizing average precision. In Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, ACM; 2007: 271鈥?78.
  • 刊物类别:Physics and Astronomy
  • 刊物主题:Computer Applications in Chemistry
    Theoretical and Computational Chemistry
    Computational Biology/Bioinformatics
    Documentation and Information in Chemistry
  • 出版者:Chemistry Central Ltd
  • ISSN:1758-2946
文摘
Background The rapid increase in the emergence of novel chemical substances presents a substantial demands for more sophisticated computational methodologies for drug discovery. In this study, the idea of Learning to Rank in web search was presented in drug virtual screening, which has the following unique capabilities of 1). Applicable of identifying compounds on novel targets when there is not enough training data available for these targets, and 2). Integration of heterogeneous data when compound affinities are measured in different platforms. Results A standard pipeline was designed to carry out Learning to Rank in virtual screening. Six Learning to Rank algorithms were investigated based on two public datasets collected from Binding Database and the newly-published Community Structure-Activity Resource benchmark dataset. The results have demonstrated that Learning to rank is an efficient computational strategy for drug virtual screening, particularly due to its novel use in cross-target virtual screening and heterogeneous data integration. Conclusions To the best of our knowledge, we have introduced here the first application of Learning to Rank in virtual screening. The experiment workflow and algorithm assessment designed in this study will provide a standard protocol for other similar studies. All the datasets as well as the implementations of Learning to Rank algorithms are available at http://www.tongji.edu.cn/~qiliu/lor_vs.html. Graphical Abstract The analogy between web search and ligand-based drug discovery

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700