摘要
从系统架构、语料库构建、应用效果3方面阐述临床文本自然语言处理系统cTAKES构建方法,从设计基于开源框架的系统架构、开发模块化组件、构建临床语料库、注重创新以及针对中文特点建设系统5个方面提出对我国中文临床文本自然语言处理系统构建的建议。
The paper elaborates on the method for building cTAKES, a clinical text natural language processing system, from the three aspects of system architecture, building of corpus and application effect, and puts forward recommendations on the building of Chinese clinical text natural language processing system from the five aspects, including design of system architecture based on open source framework, development of modular components, building of clinical corpus, attention given to innovation and building of the system in the light of the feature of Chinese language.
引文
1包小源,黄婉晶,张凯,等.非结构化电-子病历中信息抽取的定制化方法[J].北京大学学报(医学版),2018, 50(2):256-263.
2 Meystre SM, Savova GK, Kipper-Schuler KC, et al. Extracting Information from Textual Documents in the Electronic Health Record:a review of recent research[J]. IMIA Year book of Medical Informatics 2008, 47(1):128-144.
3 Khudairi, Sally. The Apache Software Foundation Announces Apache cTAKES v4. 0[EB/OL].[2018-11-27]. https://globenewswire. com/news-release/2017/04/25/970806/0/en/The-Apache-Software-Foundation-Announces-Apache-cTAKES-v4-0. html.
4 Jovanovic J, Bagheri E. Semantic Annotation in Biomedicine:the current landscape[J]. Journal of Biomedical Semantics, 2017, 8(1):44.
5 Becker M, Bckmann B. Extraction of UMLS Concepts Using Apache cTAKES for German Language[J]. Stud Health Technol Inform, 2016(223):71-76.
6 James Masanz. cTAKES 4. 0 Component Use Guide[EB/OL].[2018-11-27]. https://cwiki. apache. org/confluence/display/CTAKES/cTAKES+4.0+Component+Use+Guide.
7 Eleni Miltsakaki, Rashmi Prasad, Aravind Joshi, et al.The Penn Discourse Treebank[EB/OL].[2018-11-27]. https://alliance.seas.upenn.edu/-nlp/publications/pdf/miltsakaki2004. pdf.
8 S Kulick, A Bies, M Liberman, M Mandel, et al. Integrated Annotation for Biomedical Information Extraction[C] Boston:HLT/NAACL2004 Workshop:Biolink, 2004:61-68.
9 Savova G K, Masanz J J, Ogren P V, et al. Mayo Clinical Text Analysis and Knowledge Extraction System(cTAKES):architecture, component evaluation and applications[J].Journal of the American Medical Informatics Association Jamia, 2010, 17(5):507.
10 Lars-Erik Bruce. Apache UIMA and Mayo cTAKES UIMA and How It Is Used in the Clinical Domain[EB/OL].[2018-11-27]. https://www.uio.no/studier/emner/matnat/ifi/INF5880/v12/undervisningsmateriale/seminar. pdf.
11 Pramod Chandrayan. A Guide To NLP Implementation Using OpenNLP:making machines speak[EB/OL].[2018-11-27]. https://codeburst. io/nlp-implementation-using-java-opennlp-guide-and-examples-80d86b02b5b5.
12 Sha R, Pereira F. Shallow Parsing with Conditional Random Fields[C]. Edmonton:NLT-NAACL 2003 workshop:2003, 134-141.
13 David Ferrucci, Adam Lally. UIMA:an architectural approach to unstructured information processing in the corporate research environment[EB/OL].[2018-11-27].https://pdfs. semanticscholar. org/9f8e/b04dbafdfda997ac5e06cd6c521f82bf4e4c. pdf.
14 Agah A. Medical Applications of Artificial Intelligence[M].Boca Raton:CRC Press, Inc. 2013, 387-388.
15苏嘉,吴昊,杨锦锋,等.基于中文电子病历的心血管疾病风险因素标注体系及语料库构建[J].自动化学报,2017, 44(X):1-7.
16 Hui W, Weide Z, Qiang Z, et al. Extracting important information from Chinese Operation Notes with natural language processing methods[J]. Journal of Biomedical Informatics, 2014,(48):130-136.
17 James Masanz. cTAKES 4.0-YTEX SentenceAnnotator[EB/OL].[2017-11-27]. https://cwiki. apache.org/confluence/display/CTAKES/cTAKES+4.0+-+YTEX+SentenceAnnotator.
18 Olivier Bodenreider. The UMLS and the Semantic Web[EB/OL].[2018-11-27]. https://www. w3. org/wiki/images/7/71/HCISIG_BioRDF_Subgroup%24%24Meetings%24%242008-09-22_Conference_Call%24080922-BioRDF-UMLS-1. pdf.
19 Lesk M. Automatic Sense Disambiguation Using Machine Readable Dictionaries:how to tell a pine cone from an ice cream cone[C]. New York:Proceedings of the 5th Annual International Gonference on Systems Documentation, 1986:24-26.