文摘
In natural language processing, a crucial subsystem in a wide range of applications is a part-of-speech (POS) tagger, which labels (or classifies) unannotated words of natural language with POS labels corresponding to categories such as noun, verb or adjective. Mainstream approaches are generally corpus-based: a POS tagger learns from a corpus of pre-annotated data how to correctly tag unlabeled data. Presented here is a brief state-of-the-art account on POS tagging. POS tagging approaches make use of labeled corpus to train computational trained models. Several typical models of three kings of tagging are introduced in this article: rule-based tagging, statistical approaches and evolution algorithms. The advantages and the pitfalls of each typical tagging are discussed and analyzed. Some rule-based and stochastic methods have been successfully achieved accuracies of 93–96 %, while that of some evolution algorithms are about 96–97 %.KeywordsNatural language processingPOS taggingHidden markov modelsSupport vector machineNeural networksGene expression programming