Training and Evaluating a Statistical Part of Speech Tagger for Natural Language Applications using Kepler Workflows

设为首页

收藏本站

网站地图 | English | 公务邮箱

远程访问

NSTL服务站

Training and Evaluating a Statistical Part of Speech Tagger for Natural Language Applications using Kepler Workflows

详细信息	查看全文 \| 推荐本文 \|

作者：Doug Briesch^a ; Reginald Hobbs^a ; ^{reginald.hobbs@us.army.mil} ; Claire Jaja^c ; Brian Kjersten^b ; Clare Voss^a
关键词：natural language processing ; part of speech tagging ; computational linguistics ; parallel corpora ; machine translation ; Arabic NLP ; Penn treebank
刊名：Procedia Computer Science
出版年：2012
期刊代码：pc74_18770509
类别：cp
出版时间：2012
卷：9
期：Complete
页码：1588-1594
文件大小：598 K

摘要

A core technology of natural language processing (NLP) incorporated into many text processing applications is a part of speech (POS) tagger, a software component that labels words in text with syntactic tags such as noun, verb, adjective, etc. These tags may then be used within more complex tasks such as parsing, question answering, and machine translation (MT). In this paper we describe the phases of our work training and evaluating statistical POS taggers on Arabic texts and their English translations using Kepler workflows. While the original objectives for encapsulating our research code within Kepler workflows were driven by software engineering needs to document and verify the re usability of our software, our research benefitted as well: the ease of rapid retraining and testing enabled our researchers to detect reporting discrepancies, document their source, independently validating the correct results.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700