摘要
针对当前中文专利文本实体关系抽取中采用词法特征、上下文特征、距离特征等传统特征导致抽取效率低的问题,提出一种将传统特征和句法语义特征相结合的方法。将中文专利文本的关系抽取问题转换为SAO结构的识别问题,进行分词和实体标注,抽取专利文本中的候选SAO三元组;提取候选SAO三元组的传统特征和句法语义特征;利用xg-boost算法在这些特征上做训练和预测,对特征的有效性进行实验分析。实验结果表明,该方法较使用传统特征的方法有明显提高,验证了句法语义特征的有效性。
To solve the problem that relation exaction from Chinese patent literatures uses traditional features such as word features,context features and distance features,leading to low extraction efficiency,a method combining traditional features with syntactic semantic features was proposed.Relation exaction from Chinese patent literatures was transferred into recognition problem of SAO structure.Word segmentation and entity tagging were used to extract the candidate SAO three tuple in the patent literatures.The traditional features and the syntactic semantic features were extracted in candidate three tuple.The xgboost was used to train these features and the efficiency of those features were analyzed.Experimental results show that the proposed method is more effective than methods using traditional features,and the validity of syntactic semantic features is verified.
引文
[1]Ilevbare I M,Probert D,Phaal R.A review of TRIZ,and its benefits and challenges in practice[J].Technovation,2013,33(2):30-37.
[2]Miao Q,Zhang S,Zhang B,et al.Extracting and visualizing semantic relationships from Chinese biomedical text[C]//Proceedings of the 26th Pacific Asia Conference on Language,Information and Computation,2012:99-107.
[3]Li H,Wu X,Li Z,et al.A relation extraction method of Chinese named entities based on location and semantic features[J].Applied Intelligence,2013,38(1):1-15.
[4]Roth B,Klakow D.Combining generative and discriminative model scores for distant supervision[C]//Conference on Empirical Methods in Natural Language Processing.New York:ACM,2013:24-29.
[5]Chen Y,Zheng Q,Zhang W.Omni-word feature and soft constraint for Chinese relation extraction[C]//Proceedings of the52nd Annual Meeting of the Association for Computational Linguistics,2014:572-581.
[6]SUN Yongliang.Unsupervised Chinese named entity relation extraction in open area[D].Shanghai:East China Normal University,2014(in Chinese).[孙勇亮.开放领域的中文实体无监督关系抽取[D].上海:华东师范大学,2014.]
[7]GUO Xiyue,HE Tingting,HU Xiaohua,et al.Chinese named entity relation extraction based on syntactic and semantic features[J].Journal of Chinese Information Processing,2014,28(6):183-189(in Chinese).[郭喜跃,何婷婷,胡小华,等.基于句法语义特征的中文实体关系抽取[J].中文信息学报,2014,28(6):183-189.]
[8]GAN Lixin,WAN Changxuan,LIU Dexi,et al.Chinese named entity relation extraction based on syntactic and semantic features[J].Journal of Computer Research and Development,2016,53(2):284-302(in Chinese).[甘丽新,万常选,刘德喜,等.基于句法语义特征的中文实体关系抽取[J].计算机研究与发展,2016,53(2):284-302.]
[9]DUAN Liguo,XU Qing,LI Aiping,et al.Research on effect of entities semantic information on Chinese entity relation extraction[J].Application Research of Computers,2017,34(1):141-146(in Chinese).[段利国,徐庆,李爱萍,等.实体词语义信息对中文实体关系抽取的作用研究[J].计算机应用研究,2017,34(1):141-146.]
[10]LIU Dandan,PENG Cheng,QIAN Longhua,et al.The effect of TongYiCi CiLin in Chinese entity relation extraction[J].Journal of Chinese Information Processing,2014,28(2):91-99(in Chinese).[刘丹丹,彭成,钱龙华,等.《同义词词林》在中文实体关系抽取中的作用[J].中文信息学报,2014,28(2):91-99.]
[11]Chen T,He T.Higgs boson discovery with boosted trees[C]//NIPS Workshop on High-Energy Physics and Machine Learning.New York:ACM,2015:69-80.
[12]Chen T,Guestrin C.XGBoost:A scalable tree boosting system[C]//ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.New York:ACM,2016:785-794.
[13]Friedman J H.Greedy function approximation:A gradient boosting machine[J].Annals of Statistics,2001,29(5):1189-1232.
[14]LTP-cloud[EB/OL].[2015-09-01].http://www.ltp-cloud.com.
[15]Mikolov T,Sutskever I,Chen K,et al.Distributed representations of words and phrases and their compositionality[C]//Advances in Neural Information Processing Systems.New York:ACM,2013:3111-3119.
[16]NLPIR[EB/OL].[2017-02-01].http://www.nlpir.org/.
[17]RAO Qi,WANG Peiyan,ZHANG Guiping.Text feature analysis on SAO structure extraction from chinese patent literatures[J].Acta Scientiarum Naturalium Universitatis Pekinensis,2015,51(2):349-356(in Chinese).[饶齐,王裴岩,张桂平.面向中文专利SAO结构抽取的文本特征比较研究[J].北京大学学报(自然科学版),2015,51(2):349-356.]