Corpus based part-of-speech tagging

详细信息查看全文

作者：Chengyao Lv ; Huihua Liu ; Yuanxing Dong…
刊名：International Journal of Speech Technology
出版年：2016
出版时间：September 2016
年：2016
卷：19
期：3
页码：647-654
全文大小：498 KB
刊物类别：Engineering
刊物主题：Signal,Image and Speech Processing
Social Sciences
Artificial Intelligence and Robotics
出版者：Springer Netherlands
ISSN：1572-8110
卷排序：19

文摘

In natural language processing, a crucial subsystem in a wide range of applications is a part-of-speech (POS) tagger, which labels (or classifies) unannotated words of natural language with POS labels corresponding to categories such as noun, verb or adjective. Mainstream approaches are generally corpus-based: a POS tagger learns from a corpus of pre-annotated data how to correctly tag unlabeled data. Presented here is a brief state-of-the-art account on POS tagging. POS tagging approaches make use of labeled corpus to train computational trained models. Several typical models of three kings of tagging are introduced in this article: rule-based tagging, statistical approaches and evolution algorithms. The advantages and the pitfalls of each typical tagging are discussed and analyzed. Some rule-based and stochastic methods have been successfully achieved accuracies of 93–96 %, while that of some evolution algorithms are about 96–97 %.KeywordsNatural language processingPOS taggingHidden markov modelsSupport vector machineNeural networksGene expression programming

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700