摘要
医学命名实体识别对于促进医学研究具有重要作用。针对现有方法计算效率低,精度不高的问题,提出基于注意力迭代扩张卷积(AIDC)的识别方法。使用迭代扩张卷积神经网络计算隐状态,融入多头注意力机制解析句子结构,结合CRF计算出最优标签序列。在NCBI疾病和BC5CDR化学数据集上,AIDC比双向长短时记忆网络快1.9倍,同时也获得较高F1值分别为0.856和0.901。
Medical named entity recognition plays an important role in promoting medical research. Aiming at the problems of low computational efficiency and low accuracy of the existing methods, proposes an identification method based on Attention Iterated Dilated Convolutions(AIDC). The iterated dilated convolutions neural network is used to calculate the hidden state, the sentence structure is analyzed by integrating the multi-head attention mechanism, and the optimal tag sequence is calculated by combining CRF. In the chemical data sets of NCBI disease and BC5 CDR, AIDC is 1.9 times faster than the long short term memory network, and the F1 values are 0.856 and 0.901, respectively.
引文
[1]C. H. Wei,et al. Assessing the State of the Art in Biomedical Relation Extraction:Overview of the BioCreative V Chemical-Disease Relation(CDR)Task[J]. Database(Oxford),2016,baw032.
[2]K. Xu,Z. Zhou,T. Gong,T. Hao,W. Liu. SBLC:a Hybrid Model for Disease Named Entity Recognition Based on Semantic Bidirectional LSTMs and Conditional Random Fields[J]. BMC Medical Informatics and Decision Making,2018,18(5):114.
[3]M. Gridach. Character-Level Neural Network for Biomedical Named Entity Recognition[J]. Journal of Biomedical Informatics,2017,70:85-91.
[4]E. Strubell,P. Verga,D. Belanger,A. McCallum[C]. Fast and Accurate Entity Recognition with Iterated Dilated Convolutions. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. 2017:2670-2680.
[5]A. Vaswani,et al. Attention is All You Need[C]. In Advances in Neural Information Processing Systems,2017:5998-6008.
[6]R. Leaman,R. Islamaj Dogan,Z. Lu. DNorm:Disease Name Normalization with Pairwise Learning to Rank[J]. Bioinformatics,2013,29(22):2909-2917.
[7]R. Leaman,Z. Lu. TaggerOne:Joint Named Entity Recognition and Normalization with Semi-Markov Models[J]. Bioinformatics,2016,32(18):2839-2846.