摘要
针对传统知识图谱实体抽取方法需要大量人工特征和专家知识的问题,提出一种基于BILSTM_CRF模型的神经网络结构实体抽取方法。它既能使用双向长短时记忆网络BILSTM(Bidirectional Long Short-Term Memory)提取文本信息的特征,又可利用条件随机场CRF(Conditional Random Fields)衡量序列标注的联系。该方法对输入的文本进行建模,把句子中的每个词转换为词向量;利用BILSTM处理分布式向量得到句子特征;使用CRF标注并抽取实体,得到最终结果。实验结果表明,该方法的准确率和召回率更高,F1值提升约8%,具有更强的适用性。
In order to solve the problem that traditional knowledge atlas entity extraction method needed a lot of artificial features and expert knowledge, we proposed a neural network entity extraction method based on BILSTM_CRF model. It could use bidirectional long short-term memory(BILSTM) to extract the features of text information, and use conditional random fields(CRF) to measure the association of sequence labeling. In this method, the input text was modeled, and every word in the sentence was transformed into a word embedding. The distributed vectors were processed by BILSTM to get sentence features. And the CRF tagging and entity extraction were used to get the final results. The experimental results show that the proposed method has higher accuracy and recall rate, and the value of F1 is increased by about 8%, which has better applicability.
引文
[1] 杜泽宇,杨燕,贺樑.基于中文知识图谱的电商领域问答系统[J].计算机应用与软件,2017,34(5):153-159.
[2] 翟社平,郭琳,高山,等.一种采用贝叶斯推理的知识图谱补全方法[J].小型微型计算机系统,2018,39(5):995-999.
[3] 张海楠,伍大勇,刘悦,等.基于深度神经网络的中文命名实体识别[J].中文信息学报,2017,31(4):28-35.
[4] Grishman R.The NYU system for MU-C6 or where's the syntax?[C]//Message Understanding Conference,Columbia,Maryland,USA,2014:13-31.
[5] Brin S.Extracting Patterns and Relations from the World Wide Web[M]//The World Wide Web and Databases.Springer Berlin Heidelberg,2016:172-183.
[6] 韩霞,黄德根.基于半监督隐马尔科夫模型的汉语词性标注研究[J].小型微型计算机系统,2015,36(12):2813-2816.
[7] 张小龙,刘书炘,刘满华.基于级联支持向量机融合多特征的人脸检测[J].计算机应用与软件,2016,33(4):151-154,207.
[8] 何炎祥,罗楚威,胡彬尧.基于CRF和规则相结合的地理命名实体识别方法[J].计算机应用与软件,2015,32(1):179-185.
[9] Zhang J.RNN-BLSTM Based Multi-Pitch Estimation[C]//INTERSPEECH,Germany:Inter-speech,2016:1785-1789.
[10] 冯多,林政,付鹏,等.基于卷积神经网络的中文微博情感分类[J].计算机应用与软件,2017,34(4):157-164.
[11] Graves A,Mohamed A R,Hinton G.Speech recognition with deep recurrent neural networks[C]//IEEE International Conference on Acoustics,Speech and Signal Processing.IEEE,2013:6645-6649.
[12] 谢志宁.中文命名实体识别算法研究[D].杭州:浙江大学,2017.
[13] Bengio Y,Duchme R.A neural probabilistic language model[J].Journal of Machine Le-arning Research,2003,3(6):1137-1155.
[14] 王蕾.基于神经网络的中文命名实体识别研究[D].南京:南京师范大学,2017.
[15] 冯艳红,于红,孙庚,等.基于BLSTM的命名实体识别方法[J].计算机科学,2018,45(2):261-268.
[16] 李航.统计学习方法[M].北京:清华大学出版社,2012.