Wikipedia-based WSD for multilingual frame annotation

详细信息查看全文

作者：Sara Tonelli ; ^{satonelli@fbk.eu} ; Claudio Giuliano ^{giuliano@fbk.eu} ; Kateryna Tymoshenko ^{tymoshenko@fbk.eu}
关键词：Frame annotation ; Multilingual FrameNets ; Word sense disambiguation ; FrameNet&ndash ; Wikipedia mapping
刊名：Artificial Intelligence
出版年：2013
出版时间：January, 2013
年：2013
卷：194
期：Complete
页码：203-221
全文大小：408 K

文摘

Many applications in the context of natural language processing have been proven to achieve a significant performance when exploiting semantic information extracted from high-quality annotated resources. However, the practical use of such resources is often biased by their limited coverage. Furthermore, they are generally available only for English and few other languages.

We propose a novel methodology that, starting from the mapping between FrameNet lexical units and Wikipedia pages, automatically leverages from Wikipedia new lexical units and example sentences. The goal is to build a reference data set for the semi-automatic development of new FrameNets. In addition, this methodology can be adapted to perform frame identification in any language available in Wikipedia.

Our approach relies on a state-of-the-art word sense disambiguation system that is first trained on English Wikipedia to assign a page to the lexical units in a frame. Then, this mapping is further exploited to perform frame identification in English or in any other language available in Wikipedia. Our approach shows a high potential in multilingual settings, because it can be applied to languages for which other lexical resources such as WordNet or thesauri are not available.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700