Concept Recognition in French Biomedical Text Using Automatic Translation
详细信息    查看全文
  • 关键词:Entity recognition ; Concept identification ; Term translation ; French terminology
  • 刊名:Lecture Notes in Computer Science
  • 出版年:2016
  • 出版时间:2016
  • 年:2016
  • 卷:9822
  • 期:1
  • 页码:162-173
  • 全文大小:152 KB
  • 参考文献:1.Ohno-Machado, L.: NIH’s big data to knowledge initiative and the advancement of biomedical informatics. J. Am. Med. Inform. Assoc. 21, 193 (2014)CrossRef
    2.Harpaz, R., Callahan, A., Tamang, S., et al.: Text mining for adverse drug events: the promise, challenges, and state of the art. Drug Saf. 37, 777–790 (2014)CrossRef
    3.Hurle, M.R., Yang, L., Xie, Q., et al.: Computational drug repositioning: from data to therapeutics. Clin. Pharmacol. Ther. 93, 335–341 (2013)CrossRef
    4.Preiss, J., Stevenson, M., Gaizauskas, R.: Exploring relation types for literature-based discovery. J. Am. Med. Inform. Assoc. 22, 987–992 (2015)CrossRef
    5.Andronis, C., Sharma, A., Virvilis, V., et al.: Literature mining, ontologies and information visualization for drug repurposing. Brief. Bioinform. 12, 357–368 (2011)CrossRef
    6.Krauthammer, M., Nenadic, G.: Term identification in the biomedical literature. J. Biomed. Inf. 37, 512–526 (2004)CrossRef
    7.Bodenreider, O.: The unified medical language system (UMLS): integrating biomedical terminology. Nucleic Acids Res. 32, D267–D270 (2004)CrossRef
    8.Névéol, A., Grosjean, J., Darmoni, S.J., Zweigenbaum, P.: Language resources for French in the biomedical domain. In: Language and Resource Evaluation Conference (LREC) 2014, pp. 2146–2151 (2014)
    9.Leaman, R., Miller, C., Gonzalez, G.: Enabling recognition of diseases in biomedical text with machine learning: corpus and benchmark. In: Proceedings of the 3rd International Symposium on Languages in Biology and Medicine (LBM), Jeju Island, South Korea, pp. 82–89 (2009)
    10.Lu, Z., Kao, H.Y., Wei, C.H., et al.: The gene normalization task in BioCreative III. BMC Bioinform. 12(Suppl. 8), S2 (2011)CrossRef
    11.Bada, M., Eckert, M., Evans, D., et al.: Concept annotation in the CRAFT corpus. BMC Bioinform. 13, 161 (2012)CrossRef
    12.Gurulingappa, H., Rajput, A.M., Roberts, A., et al.: Development of a benchmark corpus to support the automatic extraction of drug-related adverse effects from medical case reports. J. Biomed. Inform. 45, 885–892 (2012)CrossRef
    13.Pradhan, S., Elhadad, N., South, B.R., et al.: Evaluating the state of the art in disorder recognition and normalization of the clinical narrative. J. Am. Med. Inform. Assoc. 22, 143–154 (2015)
    14.Kors, J.A., Clematide, S., Akhondi, S.A., van Mulligen, E.M., Rebholz-Schuhmann, D.: A multilingual gold-standard corpus for biomedical concept recognition: the Mantra GSC. J. Am. Med. Inform. Assoc. 22, 948–956 (2015)CrossRef
    15.Névéol, A., Grouin, C., Leixa, J., Rosset, S., Zweigenbaum, P.: The QUAERO French medical corpus: a ressource for medical entity recognition and normalization. In: Fourth Workshop on Building and Evaluating Resources for Health and Biomedical Text Processing (BioTxtM), pp. 24–30 (2014)
    16.Goeuriot, L., Kelly, L., Suominen, H., Hanlen, L., Névéol, A., Grouin, C., Palotti, J., Zuccon, G.: Overview of the CLEF eHealth evaluation lab 2015. In: Mothe, J., Savoy, J., Kamps, J., Pinel-Sauvagnat, K., Jones, G., San Juan, E., Capellato, L., Ferro, N. (eds.) CLEF 2015. LNCS, vol. 9283, pp. 429–443. Springer, Heidelberg (2015). doi:10.​1007/​978-3-319-24027-5_​44 CrossRef
    17.Névéol, A., Grouin, C., Tannier, X., Hamon, T., Kelly, L., Goeuriot, L., Zweigenbaum, P.: CLEF eHealth evaluation lab 2015 task 1b: clinical named entity recognition. CLEF 2015 Online Working Notes. CEUR-WS (2015)
    18.Mantra project website. http://​www.​mantra-project.​eu
    19.Bodenreider, O., McCray, A.T.: Exploring semantic groups through visual approaches. J. Biomed. Inform. 36, 414–432 (2003)CrossRef
    20.Google Translate. https://​translate.​google.​com
    21.Microsoft Translator. http://​www.​bing.​com/​translator
    22.Schuemie, M.J., Jelier, R., Kors, J.A.: Peregrine: lightweight gene name normalization by dictionary lookup. In: Proceedings of the BioCreAtIvE II Workshop, Madrid, Spain, pp. 131–133 (2007)
    23.Hettne, K.M., van Mulligen, E.M., Schuemie, M.J., Schijvenaars, B.J.A., Kors, J.A.: Rewriting and suppressing UMLS terms for improved biomedical term identification. J. Biomed. Semantics 1, 5 (2010)CrossRef
    24.Divita, G., Browne, A.C., Rindflesch, T.C.: Evaluating lexical variant generation to improve information retrieval. In: Proceedings of the American Medical Informatics Association Symposium, pp. 775–779 (1998)
    25.Peregrine indexer. https://​trac.​nbic.​nl/​data-mining
    26.Soualmia, L.F., Cabot, C., Dahamna, B., Darmoni, S.J.: SIBM at CLEF e-Health evaluation lab 2015. CLEF 2015 Online Working Notes. CEUR-WS (2015)
    27.Jain, D.: Supervised named entity recognition for clinical data. CLEF 2015 Online Working Notes. CEUR-WS (2015)
    28.Jiang, J., Guan, Y., Zhao, C.: WI-ENRE in CLEF eHealth evaluation lab 2015: clinical named entity recognition based on CRF. CLEF 2015 Online Working Notes. CEUR-WS (2015)
    29.Aronson, A.R.: Effective mapping of biomedical text to the UMLS metathesaurus: the MetaMap program. In: Proceedings of the American Medical Informatics Association Symposium, pp. 17–21 (2001)
  • 作者单位:Zubair Afzal (21)
    Saber A. Akhondi (21)
    Herman H. H. B. M. van Haagen (21)
    Erik M. van Mulligen (21)
    Jan A. Kors (21)

    21. Department of Medical Informatics, Erasmus University Medical Center, Rotterdam, The Netherlands
  • 丛书名:Experimental IR Meets Multilinguality, Multimodality, and Interaction
  • ISBN:978-3-319-44564-9
  • 刊物类别:Computer Science
  • 刊物主题:Artificial Intelligence and Robotics
    Computer Communication Networks
    Software Engineering
    Data Encryption
    Database Management
    Computation by Abstract Devices
    Algorithm Analysis and Problem Complexity
  • 出版者:Springer Berlin / Heidelberg
  • ISSN:1611-3349
  • 卷排序:9822
文摘
We describe the development of a concept recognition system for French documents and its application in task 1b of the 2015 CLEF eHealth challenge. This community challenge included recognition of entities in a French medical corpus, normalization of the recognized entities, and normalization of entity mentions that had been manually annotated. Normalization had to be based on the Unified Medical Language System (UMLS). We addressed all three subtasks by a dictionary-based approach using Peregrine, our open-source indexing engine. To increase the coverage of our initial French terminology, we explored the use of two automatic translators, Google Translate and Microsoft Translator, to translate English UMLS terms into French. The corpus consisted of 1665 titles of French Medline abstracts and 6 French drug labels of the European Medicines Agency (EMEA). The corpus was manually annotated with concepts from the UMLS, and split in an equally-sized training and test set. The best performance on the training set was obtained with a terminology that contained the intersection of the translated terms in combination with several post-processing steps to reduce the number of false-positive detections. When evaluated on the test set, our system achieved F-scores of 0.756 and 0.665 for entity recognition on the EMEA documents and Medline titles, respectively. For subsequent entity normalization, the F-scores were 0.711 and 0.587. Entity normalization given the manually annotated entity mentions resulted in F-scores of 0.872 and 0.671. Our system obtained the highest F-scores among the systems that participated in the challenge.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700