Combining data-driven systems for improving Named Entity Recognition
详细信息查看全文 | 推荐本文 |
摘要
The increasing flow of digital information requires the extraction, filtering and classification of pertinent information from large volumes of texts. All these tasks greatly benefit from involving a Named Entity Recognizer (NER) in the preprocessing stage. This paper proposes a completely automatic NER system. The NER task involves not only the identification of proper names (Named Entities) in natural language text, but also their classification into a set of predefined categories, such as names of persons, organizations (companies, government organizations, committees, etc.), locations (cities, countries, rivers, etc.) and miscellaneous (movie titles, sport events, etc.). Throughout the paper, we examine the differences between language models learned by different data-driven classifiers confronted with the same NLP task, as well as ways to exploit these differences to yield a higher accuracy than the best individual classifier. Three machine learning classifiers (Hidden Markov Model, Maximum Entropy and Memory Based Learning) are trained on the same corpus in order to resolve the NE task. After comparison, their output is combined using voting strategies. A comprehensive study and experimental work on the evaluation of our system, as well as a comparison with other systems has been carried out within the framework of two specialized scientific competitions for NER, CoNLL-2002 and HAREM-2005. Finally, this paper describes the integration of our NER system in different NLP applications, in concrete Geographic Information Retrieval and Conceptual Modelling.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700