摘要
A core technology of natural language processing (NLP) incorporated into many text processing applications is a part of speech (POS) tagger, a software component that labels words in text with syntactic tags such as noun, verb, adjective, etc. These tags may then be used within more complex tasks such as parsing, question answering, and machine translation (MT). In this paper we describe the phases of our work training and evaluating statistical POS taggers on Arabic texts and their English translations using Kepler workflows. While the original objectives for encapsulating our research code within Kepler workflows were driven by software engineering needs to document and verify the re usability of our software, our research benefitted as well: the ease of rapid retraining and testing enabled our researchers to detect reporting discrepancies, document their source, independently validating the correct results.