Creating a medical English-Swedish dictionary using interactive word alignment
详细信息    查看全文
  • 作者:Mikael Nystr?m (1)
    Magnus Merkel (2)
    Lars Ahrenberg (2)
    Pierre Zweigenbaum (3) (4) (5)
    H?kan Petersson (1)
    Hans ?hlfeldt (1)
  • 刊名:BMC Medical Informatics and Decision Making
  • 出版年:2006
  • 出版时间:December 2006
  • 年:2006
  • 卷:6
  • 期:1
  • 全文大小:1582KB
  • 参考文献:1. Cimino JJ: Desiderata for controlled medical vocabularies in the twenty-first century. 1998,37(4-):394-03.
    2. Browne AC, Divita G, Aronson AR, McCray AT: UMLS language and vocabulary tools. / AMIA Annu Symp Proc / (Edited by: Musen MA, Friedman CP, Teich JM). Washington DC: American Medical Informatics Association 2003, 798-02.
    3. Weske-Heck G, Zaiss A, Zabel M, Schulz S, Giere W, Schopen M, Klar R: The German specialist lexicon. / Proc AMIA Symp 2002, 884-88.
    4. Zweigenbaum P, Baud R, Burgun A, Namer F, Jarrousse E, Grabar N, Ruch P, Le Duff F, Forget JF, Douyere M, Darmoni S: UMLF: a unified medical lexicon for French. / Int J Med Inform 2005,74(2-):119-4. CrossRef
    5. Jurafsky D, Martin JH: / Speech and Language Processing Upper Saddle River: Prentice-Hall Inc 2000.
    6. World Health Organization: / International statistical classification of diseases and related health problems -10th revision (ICD-10) Geneva 1992.
    7. Gersenovic M: The ICD family of classifications. / Methods Inf Med 1995,34(1-):172-75.
    8. Socialstyrelsen : / Klassifikation av sjukdomar och h?lsoproblem 1997 Stockholm 1996.
    9. World Health Organization: / International classification of functioning, disability and health (ICF) Geneva 2001.
    10. Socialstyrelsen : / Klassifikation av funktionstillst?nd, funktionshinder och h?lsa Stockholm 2003.
    11. Medical Subject Headings (MESH ? ) Fact Sheet []
    12. Om Karolinska Institutets MeSH-resurs []
    13. Nordic Medico-Statistical Committee (NOMESCO): / NOMESCO Classification of Surgical Procedures (NCSP), version 1.9 Copenhagen 2004.
    14. Socialstyrelsen : / Klassifikation av kirurgiska ?tg?rder 1997 / 2 Edition Stockholm 2004.
    15. Socialstyrelsen : / Klassifikation av sjukdomar och h?lsoproblem 1997 Prim?rv?rd Stockholm 1997.
    16. SoS/EpC/Klassifikationer/Kodtexter []
    17. Melamed ID: / Empirical Methods for Exploiting Parallel Texts Cambridge: The MIT Press 2001.
    18. Tiedemann J: Combining clues for word alignment. / Proceedings of the 10th Conference of the European Chapter of the Association for Computational Linguistics (EACL): 12-7 April 2003; Budapest Programme chairs Copestake A, Hajic J 2003, 339-46.
    19. Och FJ, Ney H: A Systematic Comparison of Various Statistical Alignment Models. / Computational Linguistics 2005,29(1):19-1. CrossRef
    20. Ahrenberg L, Merkel M, Petterstedt M: Interactive Word Alignment for Language Engineering. / Proceedings of the 10th Conference of the European Chapter of the Association for Computational Linguistics (EACL): 12-7 April 2003; Budapest Programme chairs Copestake A, Hajic J 2003, 49-2.
    21. Merkel M, Petterstedt M, Ahrenberg L: Interactive Word Alignment for Corpus Linguistics. / Proceedings from Corpus Linguistics 2003: 29-1 March 2003; Lancaster / (Edited by: Archer D, Rayson P, Wilson A, Mc Enery T). 2003, 533-42.
    22. Deléger L, Merkel M, Zweigenbaum P: Enriching Medical Terminologies: an Approach Based on Aligned Corpora. / To appear in the Proceedings 20th International Congress of the European Federation for Medical Informatics (MIE 2006): 27-0 August 2006; Maastricht
    23. Deléger L, Merkel M, Zweigenbaum P: Using Word Alignment to Extend Multilingual Medical Terminologies. / the Proceedings of Language Resources and Evaluation 2006, Workshop on Acquiring and representing multilingual, specialized lexicons: the case of biomedicine Genova 9-4. May 23 2006
    24. Tapanainen P, J?rvinen T: A non-projective dependency parser. / Proceedings of the 5th Conference on Applied Natural Language Processing: 31 March-3 April 1997; Washington D.C / (Edited by: Jacobs P). 1997, 64-1.
    25. Tiedemann J: ISA & ICA -Two Web Interfaces for Interactive Alignment of Bitexts. / Proceedings of LREC Genova, Italy 2006.
    26. Baud R, Lovis C, Rassinoux AM, Michel PA, Scherrer JR: Automatic Extraction of Linguistic Knowledge from an International Classification. / Medinfo / (Edited by: Cesnik B, McCray AT, Scherrer JR). Amsterdam: IOS Press 1998, 581-85.
    27. Lovis C, Baud R, Rassinoux AM, Michel PA, Scherrer JR: Medical dictionaries for patient encoding systems: a methodology. / Artif Intell Med 1998,14(1-):201-4. CrossRef
    28. Déjean H, Gaussier é, Renders JM, Sadat F: Automatic processing of multilingual medical terminology: applications to thesaurus enrichment and cross-language information retrieval. / Artif Intell Med 2005,33(2):111-4. CrossRef
    29. Markó K, Baud R, Zweigenbaum P, Merkel M, Toporowska-Gronostaj M, Kokkinakis D, Schulz S: Cross-Lingual Alignment of Medical Lexicons. / Proceedings of Language Resources and Evaluation 2006; Workshop on Acquiring and representing multilingual, specialized lexicons: the case of biomedicine Genoa 5-. 23 May 2006
    30. Baud RH, Nystr?m M, Borin L, Ewans R, Schultz S, Zweigenbaum P: Interchanging Lexical Information for a Multilingual Dictionary. AMIA Symp. / AMIA Annu Symp Proc 2005, 31-.
    31. Petersson H, Nilsson G, Strender LE, Ahlfeldt H: The connection between terms used in medical records and coding system: a study on Swedish primary health care data. / Med Inform Internet Med 2001,26(2):87-9. CrossRef
    32. The pre-publication history for this paper can be accessed here:
  • 作者单位:Mikael Nystr?m (1)
    Magnus Merkel (2)
    Lars Ahrenberg (2)
    Pierre Zweigenbaum (3) (4) (5)
    H?kan Petersson (1)
    Hans ?hlfeldt (1)

    1. Department of Biomedical Engineering, Link?pings universitet, SE-58185, Link?ping, Sweden
    2. Department of Computer and Information Science, Link?pings universitet, SE-58183, Link?ping, Sweden
    3. Assistance Publique-H?pitaux de Paris, F-75683, Paris, Cedex 14, France
    4. Inserm, U729, F-75270, Paris, Cedex 06, France
    5. Inalco, CRIM, F-75343, PARIS, Cedex 07, France
Background This paper reports on a parallel collection of rubrics from the medical terminology systems ICD-10, ICF, MeSH, NCSP and KSH97-P and its use for semi-automatic creation of an English-Swedish dictionary of medical terminology. The methods presented are relevant for many other West European language pairs than English-Swedish. Methods The medical terminology systems were collected in electronic format in both English and Swedish and the rubrics were extracted in parallel language pairs. Initially, interactive word alignment was used to create training data from a sample. Then the training data were utilised in automatic word alignment in order to generate candidate term pairs. The last step was manual verification of the term pair candidates. Results A dictionary of 31,000 verified entries has been created in less than three man weeks, thus with considerably less time and effort needed compared to a manual approach, and without compromising quality. As a side effect of our work we found 40 different translation problems in the terminology systems and these results indicate the power of the method for finding inconsistencies in terminology translations. We also report on some factors that may contribute to making the process of dictionary creation with similar tools even more expedient. Finally, the contribution is discussed in relation to other ongoing efforts in constructing medical lexicons for non-English languages. Conclusion In three man weeks we were able to produce a medical English-Swedish dictionary consisting of 31,000 entries and also found hidden translation errors in the utilized medical terminology systems.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700