基于SAO结构的中文专利实体关系抽取

英文篇名：Chinese patent entity relation extraction based on subject action object structure
作者：张永真 ; 吕学强 ; 申闫春 ; 徐丽萍
英文作者：ZHANG Yong-zhen;LYU Xue-qiang;SHEN Yan-chun;XU Li-ping;Beijing Key Laboratory of Internet Culture and Digital Dissemination Research,Beijing Information Science and Technology University;Laboratory of VR and System Simulation,Beijing Information Science and Technology University;Beijing Research Center of Urban System Engineering;
关键词：关系抽取 ; 梯度提升算法 ; SAO结构 ; 句法特征 ; 语义特征
英文关键词：relation extraction;;xgboost;;SAO structure;;syntactic features;;semantic features
中文刊名：SJSJ
英文刊名：Computer Engineering and Design
机构：北京信息科技大学网络文化与数字传播北京市重点实验室;北京信息科技大学虚拟现实与系统仿真实验室;北京城市系统工程研究中心;
出版日期：2019-03-16
出版单位：计算机工程与设计
年：2019
期：v.40;No.387
基金：国家自然科学基金项目(61671070);; 北京成像技术高精尖创新中心基金项目(BAICIT-2016003);; 国家社会科学基金重大基金项目(15ZDB017);; 国家语委重点基金项目(ZDI135-53)
语种：中文;
页：SJSJ201903019
页数：7
CN：03
ISSN：11-1775/TP
分类号：113-119

摘要

针对当前中文专利文本实体关系抽取中采用词法特征、上下文特征、距离特征等传统特征导致抽取效率低的问题,提出一种将传统特征和句法语义特征相结合的方法。将中文专利文本的关系抽取问题转换为SAO结构的识别问题,进行分词和实体标注,抽取专利文本中的候选SAO三元组;提取候选SAO三元组的传统特征和句法语义特征;利用xg-boost算法在这些特征上做训练和预测,对特征的有效性进行实验分析。实验结果表明,该方法较使用传统特征的方法有明显提高,验证了句法语义特征的有效性。
To solve the problem that relation exaction from Chinese patent literatures uses traditional features such as word features,context features and distance features,leading to low extraction efficiency,a method combining traditional features with syntactic semantic features was proposed.Relation exaction from Chinese patent literatures was transferred into recognition problem of SAO structure.Word segmentation and entity tagging were used to extract the candidate SAO three tuple in the patent literatures.The traditional features and the syntactic semantic features were extracted in candidate three tuple.The xgboost was used to train these features and the efficiency of those features were analyzed.Experimental results show that the proposed method is more effective than methods using traditional features,and the validity of syntactic semantic features is verified.

引文

[1]Ilevbare I M,Probert D,Phaal R.A review of TRIZ,and its benefits and challenges in practice[J].Technovation,2013,33(2):30-37.
    [2]Miao Q,Zhang S,Zhang B,et al.Extracting and visualizing semantic relationships from Chinese biomedical text[C]//Proceedings of the 26th Pacific Asia Conference on Language,Information and Computation,2012:99-107.
    [3]Li H,Wu X,Li Z,et al.A relation extraction method of Chinese named entities based on location and semantic features[J].Applied Intelligence,2013,38(1):1-15.
    [4]Roth B,Klakow D.Combining generative and discriminative model scores for distant supervision[C]//Conference on Empirical Methods in Natural Language Processing.New York:ACM,2013:24-29.
    [5]Chen Y,Zheng Q,Zhang W.Omni-word feature and soft constraint for Chinese relation extraction[C]//Proceedings of the52nd Annual Meeting of the Association for Computational Linguistics,2014:572-581.
    [6]SUN Yongliang.Unsupervised Chinese named entity relation extraction in open area[D].Shanghai:East China Normal University,2014(in Chinese).[孙勇亮.开放领域的中文实体无监督关系抽取[D].上海:华东师范大学,2014.]
    [7]GUO Xiyue,HE Tingting,HU Xiaohua,et al.Chinese named entity relation extraction based on syntactic and semantic features[J].Journal of Chinese Information Processing,2014,28(6):183-189(in Chinese).[郭喜跃,何婷婷,胡小华,等.基于句法语义特征的中文实体关系抽取[J].中文信息学报,2014,28(6):183-189.]
    [8]GAN Lixin,WAN Changxuan,LIU Dexi,et al.Chinese named entity relation extraction based on syntactic and semantic features[J].Journal of Computer Research and Development,2016,53(2):284-302(in Chinese).[甘丽新,万常选,刘德喜,等.基于句法语义特征的中文实体关系抽取[J].计算机研究与发展,2016,53(2):284-302.]
    [9]DUAN Liguo,XU Qing,LI Aiping,et al.Research on effect of entities semantic information on Chinese entity relation extraction[J].Application Research of Computers,2017,34(1):141-146(in Chinese).[段利国,徐庆,李爱萍,等.实体词语义信息对中文实体关系抽取的作用研究[J].计算机应用研究,2017,34(1):141-146.]
    [10]LIU Dandan,PENG Cheng,QIAN Longhua,et al.The effect of TongYiCi CiLin in Chinese entity relation extraction[J].Journal of Chinese Information Processing,2014,28(2):91-99(in Chinese).[刘丹丹,彭成,钱龙华,等.《同义词词林》在中文实体关系抽取中的作用[J].中文信息学报,2014,28(2):91-99.]
    [11]Chen T,He T.Higgs boson discovery with boosted trees[C]//NIPS Workshop on High-Energy Physics and Machine Learning.New York:ACM,2015:69-80.
    [12]Chen T,Guestrin C.XGBoost:A scalable tree boosting system[C]//ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.New York:ACM,2016:785-794.
    [13]Friedman J H.Greedy function approximation:A gradient boosting machine[J].Annals of Statistics,2001,29(5):1189-1232.
    [14]LTP-cloud[EB/OL].[2015-09-01].http://www.ltp-cloud.com.
    [15]Mikolov T,Sutskever I,Chen K,et al.Distributed representations of words and phrases and their compositionality[C]//Advances in Neural Information Processing Systems.New York:ACM,2013:3111-3119.
    [16]NLPIR[EB/OL].[2017-02-01].http://www.nlpir.org/.
    [17]RAO Qi,WANG Peiyan,ZHANG Guiping.Text feature analysis on SAO structure extraction from chinese patent literatures[J].Acta Scientiarum Naturalium Universitatis Pekinensis,2015,51(2):349-356(in Chinese).[饶齐,王裴岩,张桂平.面向中文专利SAO结构抽取的文本特征比较研究[J].北京大学学报(自然科学版),2015,51(2):349-356.]

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700