An Ontology-Enabled Natural Language Processing Pipeline for Provenance Metadata Extraction from Biomedical Text (Short Paper)
详细信息    查看全文
  • 关键词:Ontology ; based natural language processing ; Provenance metadata ; Scientific reproducibility ; Named entity recognition
  • 刊名:Lecture Notes in Computer Science
  • 出版年:2016
  • 出版时间:2016
  • 年:2016
  • 卷:10033
  • 期:1
  • 页码:699-708
  • 全文大小:482 KB
  • 参考文献:1.Sahoo, S.S., Valdez, J., Rueschman, M.: Scientific reproducibility in biomedical research: provenance metadata ontology for semantic annotation of study description. In: American Medical Informatics Association (AMIA) Annual Symposium, Chicago (2016)
    2.Collins, F.S., Tabak, L.A.: Policy: NIH plans to enhance reproducibility. Nature 505, 612–613 (2014)CrossRef
    3.Landis, S.C., Amara, S.G., Asadullah, K., Austin, C.P., Blumenstein, R., Bradley, E.W., Crystal, R.G., Darnell, R.B., Ferrante, R.J., Fillit, H., Finkelstein, R., Fisher, M., Gendelman, H.E., Golub, R.M., Goudreau, J.L., Gross, R.A., Gubitz, A.K., Hesterlee, S.E., Howells, D.W., Huguenard, J., Kelner, K., Koroshetz, W., Krainc, D., Lazic, S.E., Levine, M.S., Macleod, M.R., McCall, J.M., Moxley III, R.T., Narasimhan, K., Noble, L.J., Perrin, S., Porter, J.D., Steward, O., Unger, E., Utz, U., Silberberg, S.D.: A call for transparent reporting to optimize the predictive value of preclinical research. Nature 490, 187–191 (2012)CrossRef
    4.Dean, D.A., Goldberger, A.L., Mueller, R., Kim, M., Rueschman, M., Mobley, D., Sahoo, S.S., Jayapandian, C.P., Cui, L., Morrical, M.G., Surovec, S., Zhang, G.Q., Redline, S.: Scaling up scientific discovery in sleep medicine: the National Sleep Research Resource. SLEEP 39, 1151–1164 (2016)
    5.Meystre, S., Savova, G., Kipper-Schuler, K., Hurdle, J.F.: Extracting information from textual documents in the electronic health record: a review of recent research. IMIA Year Book of Med. Inf. 47, 128–144 (2008)
    6.Crowley, R.S., Castine, M., Mitchell, K.J., Chavan, G., McSherry, T., Feldman, M.: caTIES—a grid based system for coding and retrieval of surgical pathology reports and tissue specimens in support of translational research. J. Am. Med. Inform. Assoc. 17, 253–264 (2010)CrossRef
    7.Friedman, C.: A broad coverage natural language processing system. In: AMIA Fall Symposium, pp. 270–274 (2000)
    8.Jain, N.L., Knirsch, C.A., Friedman, C., Hripcsak, G.: Identification of suspected tuberculosis patients based on natural language processing of chest radiograph reports. In: AMIA Fall Symposium, Philadelphia, pp. 542–546 (1996)
    9.Sneiderman, C.A., Rindflesch, T.C., Bean, C.A.: Identification of anatomical terminology in medical text. In: AMIA Fall Symposium, pp. 428–432 (1998)
    10.Aronson, A.R., Lang, F.M.: An overview of MetaMap: historical perspective and recent advances. J. Am. Med. Inf. Assoc. 17, 229–236 (2010)CrossRef
    11.Aronson, A.R.: MetaMap: Mapping Text to the UMLS Metathesaurus, US NLM 2006 (2006)
    12.Bodenreider, O.: The Unified Medical Language System (UMLS): integrating biomedical terminology. Nucleic Acids Res. 32, 267–270 (2004)CrossRef
    13.Jonquet, C., Shah, N.M., Musen, M.A.: The open biomedical annotator. Presented at the AMIA Summit on Translat Bioinformatics, San Francisco (2009)
    14.Savova, G.K., Masanz, J.J., Ogren, P.V., Zheng, J., Sohn, S., Kipper-Schuler, K.C., Chute, C.G.: Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications. J. Am. Med. Inform. Assoc. 17, 507–513 (2010)CrossRef
    15.Ferrucci, D., Lally, A.: UIMA: an architectural approach to unstructured information processing in the corporate research environment. Nat. Lang. Eng. 10, 327–348 (2004)CrossRef
    16.OpenNLP. http://​opennlp.​sourceforge.​net/​index.​html
    17.Gottlieb, D.J., Punjabi, N.M., Mehra, R., Patel, S.R., Quan, S.F., Babineau, D.C., Tracy, R.P., Rueschman, M., Blumenthal, R.S., Lewis, E.F., Bhatt, D.L., Redline, S.: CPAP versus oxygen in obstructive sleep apnea. New England J. Med. 370, 2276–2285 (2014)CrossRef
    18.Moreau, L., Missier, P.: PROV Data Model (PROV-DM), World Wide Web Consortium W3C 2013 (2013)
  • 作者单位:Joshua Valdez (20)
    Michael Rueschman (21)
    Matthew Kim (21)
    Susan Redline (21)
    Satya S. Sahoo (20)

    20. Division of Medical Informatics and Electrical Engineering and Computer Science Department, Case Western Reserve University, Cleveland, OH, USA
    21. Departments of Medicine, Brigham and Women’s Hospital and Beth Israel Deaconess Medical Center, Harvard University, Boston, MA, USA
  • 丛书名:On the Move to Meaningful Internet Systems: OTM 2016 Conferences
  • ISBN:978-3-319-48472-3
  • 刊物类别:Computer Science
  • 刊物主题:Artificial Intelligence and Robotics
    Computer Communication Networks
    Software Engineering
    Data Encryption
    Database Management
    Computation by Abstract Devices
    Algorithm Analysis and Problem Complexity
  • 出版者:Springer Berlin / Heidelberg
  • ISSN:1611-3349
  • 卷排序:10033
文摘
Extraction of structured information from biomedical literature is a complex and challenging problem due to the complexity of biomedical domain and lack of appropriate natural language processing (NLP) techniques. High quality domain ontologies model both data and metadata information at a fine level of granularity, which can be effectively used to accurately extract structured information from biomedical text. Extraction of provenance metadata, which describes the history or source of information, from published articles is an important task to support scientific reproducibility. Reproducibility of results reported by previous research studies is a foundational component of scientific advancement. This is highlighted by the recent initiative by the US National Institutes of Health called “Principles of Rigor and Reproducibility”. In this paper, we describe an effective approach to extract provenance metadata from published biomedical research literature using an ontology-enabled NLP platform as part of the Provenance for Clinical and Healthcare Research (ProvCaRe). The ProvCaRe-NLP tool extends the clinical Text Analysis and Knowledge Extraction System (cTAKES) platform using both provenance and biomedical domain ontologies. We demonstrate the effectiveness of ProvCaRe-NLP tool using a corpus of 20 peer-reviewed publications. The results of our evaluation demonstrate that the ProvCaRe-NLP tool has significantly higher recall in extracting provenance metadata as compared to existing NLP pipelines such as MetaMap.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700