基于BILSTM_CRF的知识图谱实体抽取方法

英文篇名：KNOWLEDGE GRAPH ENTITY EXTRACTION BASED ON BILSTM_CRF
作者：翟社平 ; 段宏宇 ; 李兆兆
英文作者：Zhai Sheping;Duan Hongyu;Li Zhaozhao;School of Computer Science and Technology, Xi'an University of Posts and Telecommunications;Shaanxi Key Laboratory of Network Data Analysis and Intelligent Processing;
关键词：知识图谱 ; 实体抽取 ; 神经网络 ; 词向量 ; BILSTM_CRF模型
英文关键词：Knowledge graph;;Entity extraction;;Neural network;;Word embedding;;BILSTM_CRF model
中文刊名：JYRJ
英文刊名：Computer Applications and Software
机构：西安邮电大学计算机学院;陕西省网络数据分析与智能处理重点实验室;
出版日期：2019-05-12
出版单位：计算机应用与软件
年：2019
期：v.36
基金：工业和信息化部通信软科学项目(2018-R-26);; 陕西省自然科学基金资助项目(2012JM8044);; 陕西省社会科学基金资助项目(2016N008);; 陕西省教育厅科学研究计划资助项目(12JK0733);; 西安邮电大学研究生创新基金项目(CXL2016-13)
语种：中文;
页：JYRJ201905047
页数：7
CN：05
ISSN：31-1260/TP
分类号：275-280+286

摘要

针对传统知识图谱实体抽取方法需要大量人工特征和专家知识的问题,提出一种基于BILSTM_CRF模型的神经网络结构实体抽取方法。它既能使用双向长短时记忆网络BILSTM(Bidirectional Long Short-Term Memory)提取文本信息的特征,又可利用条件随机场CRF(Conditional Random Fields)衡量序列标注的联系。该方法对输入的文本进行建模,把句子中的每个词转换为词向量;利用BILSTM处理分布式向量得到句子特征;使用CRF标注并抽取实体,得到最终结果。实验结果表明,该方法的准确率和召回率更高,F1值提升约8%,具有更强的适用性。
In order to solve the problem that traditional knowledge atlas entity extraction method needed a lot of artificial features and expert knowledge, we proposed a neural network entity extraction method based on BILSTM_CRF model. It could use bidirectional long short-term memory(BILSTM) to extract the features of text information, and use conditional random fields(CRF) to measure the association of sequence labeling. In this method, the input text was modeled, and every word in the sentence was transformed into a word embedding. The distributed vectors were processed by BILSTM to get sentence features. And the CRF tagging and entity extraction were used to get the final results. The experimental results show that the proposed method has higher accuracy and recall rate, and the value of F1 is increased by about 8%, which has better applicability.

引文

[1] 杜泽宇,杨燕,贺樑.基于中文知识图谱的电商领域问答系统[J].计算机应用与软件,2017,34(5):153-159.
    [2] 翟社平,郭琳,高山,等.一种采用贝叶斯推理的知识图谱补全方法[J].小型微型计算机系统,2018,39(5):995-999.
    [3] 张海楠,伍大勇,刘悦,等.基于深度神经网络的中文命名实体识别[J].中文信息学报,2017,31(4):28-35.
    [4] Grishman R.The NYU system for MU-C6 or where's the syntax?[C]//Message Understanding Conference,Columbia,Maryland,USA,2014:13-31.
    [5] Brin S.Extracting Patterns and Relations from the World Wide Web[M]//The World Wide Web and Databases.Springer Berlin Heidelberg,2016:172-183.
    [6] 韩霞,黄德根.基于半监督隐马尔科夫模型的汉语词性标注研究[J].小型微型计算机系统,2015,36(12):2813-2816.
    [7] 张小龙,刘书炘,刘满华.基于级联支持向量机融合多特征的人脸检测[J].计算机应用与软件,2016,33(4):151-154,207.
    [8] 何炎祥,罗楚威,胡彬尧.基于CRF和规则相结合的地理命名实体识别方法[J].计算机应用与软件,2015,32(1):179-185.
    [9] Zhang J.RNN-BLSTM Based Multi-Pitch Estimation[C]//INTERSPEECH,Germany:Inter-speech,2016:1785-1789.
    [10] 冯多,林政,付鹏,等.基于卷积神经网络的中文微博情感分类[J].计算机应用与软件,2017,34(4):157-164.
    [11] Graves A,Mohamed A R,Hinton G.Speech recognition with deep recurrent neural networks[C]//IEEE International Conference on Acoustics,Speech and Signal Processing.IEEE,2013:6645-6649.
    [12] 谢志宁.中文命名实体识别算法研究[D].杭州:浙江大学,2017.
    [13] Bengio Y,Duchme R.A neural probabilistic language model[J].Journal of Machine Le-arning Research,2003,3(6):1137-1155.
    [14] 王蕾.基于神经网络的中文命名实体识别研究[D].南京:南京师范大学,2017.
    [15] 冯艳红,于红,孙庚,等.基于BLSTM的命名实体识别方法[J].计算机科学,2018,45(2):261-268.
    [16] 李航.统计学习方法[M].北京:清华大学出版社,2012.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700