基于BiLSTM-CRF的商情实体识别模型

英文篇名：Commercial Intelligence Entity Recognition Model Based on BiLSTM-CRF
作者：张应成 ; 杨洋 ; 蒋瑞 ; 全兵 ; 张利君 ; 任晓雷
英文作者：ZHANG Yingcheng;YANG Yang;JIANG Rui;QUAN Bing;ZHANG Lijun;REN Xiaolei;College of Computer Science,Sichuan University;Sichuan Institute of Computer Sciences;Chengdu Ruibeiyingte Information Technology Co.,Ltd.;Sichuan Zhiqian Science and Technology Co.,Ltd.;China Mobile(Suzhou) Software Technolgy Co.,Ltd.;Sichuan Heima Digital Technology Co.,Ltd.;
关键词：条件随机场 ; 双向长短时记忆网络 ; 语言模型 ; 命名实体识别 ; 深度学习
英文关键词：Conditional Random Field(CRF);;Bidirectional Long Short-Term Memory(BiLSTM) network;;language model;;Named Entity Recognition(NER);;deep learning
中文刊名：JSJC
英文刊名：Computer Engineering
机构：四川大学计算机学院;四川省计算机研究院;成都瑞贝英特信息技术有限公司;四川智仟科技有限公司;中移(苏州)软件技术有限公司;四川黑马数码科技有限公司;
出版日期：2019-05-15
出版单位：计算机工程
年：2019
期：v.45;No.500
基金：四川省科技计划项目(18PTDJ0085,2019YFH0075,2018GZDZX0030);; 泸州市科技计划项目(2017CDLZ-G25)
语种：中文;
页：JSJC201905050
页数：7
CN：05
ISSN：31-1289/TP
分类号：314-320

摘要

结合语言模型条件随机场(CRF)和双向长短时记忆(BiLSTM)网络,构建一种BiLSTM-CRF模型,以提取商情文本序列中的招标人、招标代理以及招标编号3类实体信息。将规范化后的招标文本序列按字进行向量化,利用BiLSTM神经网络获取序列化文本的前向、后向文本特征,并通过CRF提取出双向本文特征中相应的实体。实验结果表明,与传统机器学习算法CRF相比,该模型3类实体的精确率、召回率和F1值平均提升15.21%、12.06%和13.70%。
A BiLSTM-CRF model is constructed by combining the Conditional Random Field(CRF) model of Bidirectional Long Short-Term Memory(BiLSTM) network to extract three kinds of entity information,tenderer,bidding agent and bidding number,in a commercial text sequence.The normalized bidding text sequence is vectorized by word.The forward and backward text features of the serialized text are obtained by BiLSTM neural network,and the corresponding entities in the two-way text features are extracted by CRF.Experimental results show that compared with the traditional machine learning algorithm CRF,the precision,recall rate and F1 value of the three types of entities in the proposed model are improved by 15.21%,12.06% and 13.70% in average,respectively.

引文

[1] ZHANG Lei,ZHANG Yi.Big data analysis by infinite deep neural networks[J].Journal of Computer Research and Development,2016,53(1):68-79.
    [2] BOSCO A,LAGANà D,MUSMANNO R,et al.Modeling and solving the mixed capacitated general routing problem[J].Optimization Letters,2013,7(7):1451-1469.
    [3] MIKOLOV T,SUTSKEVER I,CHEN K,et al.Distri-buted representations of words and phrases and their compositionality[J].Advances in Neural Information Processing Systems,2013,26:3111-3119.
    [4] CHIU J P C,NICHOLS E.Namedentity recognition with bidirectional LSTM-CNNs[EB/OL].[2018-09-30].https://arxiv.org/pdf/1511.08308.pdf.
    [5] ZHANG Suxiang,WANG Xiaojie.Automatic recognition of Chinese organization name based on conditional random fields[C]//Proceedings of International Conference on Natural Language Processing and Knowledge Engineering.Washington D.C.,USA:IEEE Press,2007:229-233.
    [6] BORTHWICK A E.A maximum entropy approach to named entity recognition[D].New York,USA:New York University,1999.
    [7] BIKEL D M,MILLER S,SCHWARTZ R,et al.Nymble:a high-performance learning name-finder[C]//Proceedings of the 15th Conference on Applied Natural Language Processing.Washington D.C.,USA:IEEE Press,1997:194-201.
    [8] ASAHARA M,MATSUMOTO Y.Japanese named entity extraction with redundant morphological analysis[C]//Proceedings of NAACL’03.Stroudsburg,USA:Association for Computational Linguistics,2003:8-15.
    [9] MCCALLUM A,LI Wei.Early results for named entity recognition with conditional random fields,feature induction and Web-enhanced lexicons[C]//Proceedings of CONLL’03.Stroudsburg,USA:Association for Computational Linguistics,2003:188-191.
    [10] CHO K,VAN MERRIENBOER B,GULCEHRE C,et al.Learning phrase representations using RNN encoder-decoder for statistical machine translation[EB/OL].[2018-09-30].http://anthology.aclweb.org/D/D14/D14-1179.pdf.
    [11] SANTOS C N D,GATTIT M.Deep convolutional neural networks for sentiment analysis of short texts[EB/OL].[2018-09-30].http://www.aclweb.org/anthology/C14-1008.
    [12] LI Jiwei,GALLEY M,BROCKETT C,et al.A diversity-promoting objective function for neural conversation models[EB/OL].[2018-09-25].https://arxiv.org/pdf/1510.03055.pdf.
    [13] KARJALA T W,HIMMELBLAU D M,MIIKKULAINEN R.Data rectification using recurrent (Elman) neural networks[C]//Proceedings of International Joint Confe-rence on Neural Networks.Washington D.C.,USA:IEEE Press,1992:901-906.
    [14] GRAVES A.Long short-term memory[M]//GRAVES A.Supervised sequence labelling with recurrent neural networks.Berlin,Germany:Springer,2012:1735-1780.
    [15] ZHOU Guobing,WU Jianxin,ZHANG Chenlin,et al.Minimal gated unit for recurrent neural networks[J].International Journal of Automation and Computing,2016,13(3):226-234.
    [16] MIKOLOV T,CHEN Kai,CORRADO G,et al.Efficient estimation of word representations in vector space[EB/OL].[2018-09-10].http://export.arxiv.org/pdf/1301.3781.
    [17] BENGIO Y,SCHWENK H,SENéCAL J S,et al.Neural probabilistic language models[J].Journal of Machine Learning Research,2001,3(6):1137-1155.
    [18] MIKOLOV T,KARAFIáT M,BURGET L,et al.Recurrent neural network based language model[EB/OL].[2018-09-25].http://www.fit.vutbr.cz/research/groups/speech/publi/2010/mikolov_interspeech2010_IS100722.pdf.
    [19] MIKOLOV T,ZWEIG G.Context dependent recurrent neural network language model[C]//Proceedings of 2012 IEEE Spoken Language Technology Workshop.Washington D.C.,USA:IEEE Press,2012:234-239.
    [20] MIKOLOV T,DEORAS A,POVEY D,et al.Strategies for training large scale neural network language models[C]//Proceedings of 2011 IEEE Workshop on Automatic Speech Recognition and Understanding.Washington D.C.,USA:IEEE Press,2011:196-201.
    [21] GRAVES A,SCHMIDHUBER J.Framewise phoneme classification with bidirectional LSTM and other neural network architectures[J].Neural Networks,2005,18(5):602-610.
    [22] HOCHREITER S,SCHMIDHUBER J.Long short-term memory[J].Neural Computation,1997,9(8):1735-1780.
    [23] GRAVES A,JAITLY N,MOHAMED A R .Hybrid speech recognition with deep bidirectional LSTM[C]//Proceedings of 2013 IEEE Workshop on Automatic Speech Recognition and Understanding.Washington D.C.,USA:IEEE Press,2013:273-278.
    [24] BAHDANAU D,CHO K,BENGIO Y.Neural machine translation by jointly learning to align and translate[EB/OL].[2018-09-15].https://arxiv.org/pdf/1409.0473.pdf.
    [25] RATNAPARKHI A.A maximum entropy model for part-of-speech tagging[C]//Proceedings of Conference on Empirical Methods in Natural Language Processing.Washington D.C.,USA:IEEE Press,1996:133-142.
    [26] MCCALLUM A,FREITAG D,PEREIRA F C N.Maximum entropy Markov models for information extraction and segmentation[C]//Proceedings of the 17th International Conference on Machine Learning.Washington D.C.,USA:IEEE Press,2000:591-598.
    [27] LAFFERTY J D,MCCALLUM A,PEREIRA F C N.Conditional random fields:probabilistic models for segmenting and labeling sequence data[C]//Proceedings of the 18th International Conference on Machine Learning.[S.l.]:Morgan Kaufmann Publishers Inc.,2001:282-289.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700