文摘
Background Part-of-speech tagging is an important preprocessing step in many natural language processing applications. Despite much work already carried out in this field, there is still room for improvement, especially in Portuguese. We experiment here with an architecture based on neural networks and word embeddings, and that has achieved promising results in English. Methods We tested our classifier in different corpora: a new revision of the Mac-Morpho corpus, in which we merged some tags and performed corrections and two previous versions of it. We evaluate the impact of using different types of word embeddings and explicit features as input. Results We compare our tagger’s performance with other systems and achieve state-of-the-art results in the new corpus. We show how different methods for generating word embeddings and additional features differ in accuracy. Conclusions The work reported here contributes with a new revision of the Mac-Morpho corpus and a state-of-the-art new tagger available for use out-of-the-box.