Extract conceptual graphs from plain texts in patent claims

详细信息	查看全文 \| 推荐本文 \|

作者：Shih-Yao Yang^a ; ^{shih_yao@mail2000.com.tw} ; ^{yao@cs.nthu.edu.tw} ; Von-Wun Soo^a ; ^b
关键词：Conceptual graph ; Natural language processing ; Patent document analysis ; Patent claims information extraction ; Ontology ; Dependency tree
刊名：Engineering Applications of Artificial Intelligence
出版年：2012
期刊代码：51_09521976
类别：cp
出版时间：June, 2012
卷：25
期：4
页码：874-887
文件大小：1580 K

摘要

This paper develops techniques to extract conceptual graphs from a patent claim using syntactic information (POS, and dependency tree) and semantic information (background ontology). Due to plenteous technical domain terms and lengthy sentences prevailing in patent claims, it is difficult to apply a NLP Parser directly to parse the plain texts in the patent claim. This paper combines techniques such as finite state machines, Part-Of-Speech tags, conceptual graphs, domain ontology and dependency tree to convert a patent claim into a formally defined conceptual graph. The method of a finite state machine splits a lengthy patent claim sentence into a set of shortened sub-sentences so that the NLP Parser can parse them one by one effectively. The Part-Of-Speech and dependency tree of a patent claim are used to build the conceptual graph based on the pre-established domain ontology. The result shows that 99%sub-sentences split from 1700 patent claims can be efficiently parsed by the NLP Parser. There are two types of nodes in a conceptual graph, the concept and the relation nodes. Each concept or relation can be extracted directly from a patent claim and each relation can link with a fixed number of concepts in a conceptual graph. From 100 patent claims, the average precision and recall of a concept class mapping from the patent claim to domain ontology are 96%and 89%, respectively, and the average precision and recall for Real relation class mapping are 97%and 98%, respectively. For the concept linking of a relation, the average precision is 79%. Based on the extracted conceptual graphs from patents, it would facilitate automated comparison and summarization among patents for judgment of patent infringement.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700