用户名: 密码: 验证码:
基于word2vec和Attention-Seq2Seq的水稻病虫害智能问答方法研究
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:Research on Method of Intelligent Q & A for Rice Pests and Diseases Based on word2vec and Attention-Seq2Seq
  • 作者:许童羽 ; 赵冬雪 ; 周云成 ; 冯帅 ; 王郝日钦
  • 英文作者:XU Tong-yu;ZHAO Dong-xue;ZHOU Yun-cheng;FENG Shuai;WANG Hao-ri-qin;School of Information and Electrical Engineering/Liaoning Agricultural Information Engineering Technology Center,Shenyang Agricultural University;
  • 关键词:水稻病虫害 ; word2vec ; 注意力机制 ; Seq2Seq ; 智能问答
  • 英文关键词:rice pests and diseases;;word2vec;;attention mechanism;;Seq2Seq;;intelligent Q & A
  • 中文刊名:SYNY
  • 英文刊名:Journal of Shenyang Agricultural University
  • 机构:沈阳农业大学信息与电气工程学院/辽宁省农业信息化工程技术研究中心;
  • 出版日期:2019-06-15
  • 出版单位:沈阳农业大学学报
  • 年:2019
  • 期:v.50;No.200
  • 基金:国家重点研发计划项目(2018YFD0300309);; 沈阳市科技计划项目(17174300)
  • 语种:中文;
  • 页:SYNY201903019
  • 页数:7
  • CN:03
  • ISSN:21-1134/S
  • 分类号:128-134
摘要
为了提高水稻病虫害问答的准确性、快捷性和智能性,构建一种基于word2vec和注意力机制(Attention)优化的Seq2Seq问答模型。采用爬虫技术获取网络问答数据2万余条,经Jieba分词对数据进行分词处理,去除停用词无用符号等。同时,为提高模型的准确率,采用word2vec中的Skip-Gram模型将句子中的词语进行转换,得到具有语义等信息的词向量,并将经word2vec训练得到的词向量与加入了Attention(注意力机制)的Seq2Seq(Sequence to Sequence,序列到序列)模型进行问答模型训练。试验选取20000条水稻病虫害问答数据,按照随机选取方式,将数据按7/1/2进行训练、验证与测试。将本研究的问答模型与Seq2Seq模型和仅加入Attention机制的Seq2Seq模型进行对比分析,以BLEU评分标准与问答正确率为判断依据。试验结果表明:采用加入了word2vec与Attention机制的Seq2Seq问答模型相比其他两种模型,其模型的测试结果更为准确。该模型在BLEU评分和问答准确率上均高于其他两种模型,BLEU评分与问答正确率分别为33.58%和71%。比其他两种问答模型分别提高22.34%、9.51%和28%、14%。本研究构建的问答模型显著地提高了问答的准确率,能较好地解决农户在水稻种植生产过程中遇到的难题。
        In order to improve the accuracy, speed and intelligence of rice pest and disease question and answer, a Seq2Seq question and answer model based on word2 vec and Attention optimization was constructed. This study has obtained over 20,000 articles of network Q & A data with the crawler technology. These data are processed with word segmentation with the Jieba word segmentation tool to remove the unusable symbols of stop words, etc. Meanwhile, to improve the accuracy of the model,words in sentences are converted with the Skip-Gram model in word2 vec to obtain the word vector with information of semantics.Besides, Q & A model training is applied to the word vector obtained through word2 vec training and Seq2Seq after adding Attention. The test selects 20,000 articles of rice diseases and insect pests Q & A data which are trained, verified and tested as per 7/1/2 in the mode of random selection. Compared the Q & A model with the Seq2Seq model and the Seq2Seq model with Attention mechanism only,then the BLEU scoring standard and the correct rate of Q&A as the basis for judgment were used.The test results showed that the test results of Seq2Seq QA model with word2 vec and Attention mechanism are more accurate compared with the other two models. The BLEU score and Q & A accuracy rate of the model were higher than those of the other two models and the BLEU score and Q&A accuracy rate were 33.58% and 71% respectively,which are 22.34%, 9.51%and 28%, 14% higher than other two Q & A models. The established Q & A model enhanced the Q & A accuracy obviously and could solve the problems during rice cultivation and production.
引文
[1]杨芳权.基于包装产业大数据知识图谱的智能问答系统设计[J].现代电子技术,2018,41(4):143-146.
    [2]赵明,董翠翠,董乔雪,等.基于BIGRU的番茄病虫害问答系统问句分类研究[J].农业机械学报,2018,49(5):271-276.
    [3]鲁强,刘星昱.基于迁移学习的知识图谱问答语义匹配模型[J].计算机应用,2018,38(7):1846-1852.
    [4]荣光辉,黄震华.基于深度学习的问答匹配方法[J].计算机应用,2017,37(10):2861-2865.
    [5]周云成,许童羽,邓寒冰,等.基于双卷积链Fast R-CNN的番茄关键器官识别方法[J].沈阳农业大学学报,2018,49(1):65-74.
    [6]魏芳芳,段青玲,肖晓琰,等.基于支持向量机的中文农业文本分类技术研究[J].农业机械学报,2015,46(S1):174-179.
    [7]赵明,杜会芳,董翠翠,等.基于word2vec和LSTM的饮食健康文本分类研究[J].农业机械学报,2017,48(10):202-208.
    [8]刘梓权,王慧芳,曹靖,等.基于卷积神经的电力设备缺陷文本分类模型研究[J].电网技术,2018,42(2):644-651.
    [9]廖志芳,周国恩,李俊锋,等.中文短文本语法语义相似度算法[J].湖南大学学报(自然科学版),2016,46(2):135-140.
    [10]范士喜,韩喜双,相洋,等.基于HM-SVMs的问句语义分析模型[J].计算机应用与软件,2016,33(5):84-86.
    [11]梁军,柴玉梅,原慧斌,等.基于极性转移和LSTM递归网络的情感分析[J].中文信息学报,2015,29(5):152-159.
    [12]周博通,孙承杰,林磊,等.基于LSTM的大规模知识库自动问答[J].北京大学学报(自然科学版),2018,54(2):286-292.
    [13] CROSS J,HUANG L.Incremental Parsing with Minimal Features Using Bi-Directional LSTM[EB/OL].[2016-6-21].https://arxiv.org/abs/1606.06406.
    [14] LIU Y,SUN C,LIN L,et al.Learning Natural Language Inference using Bidirectional LSTM model and Inner-Attention[EB/OL].[2016-05-30].https://arxiv.org/abs/1605.09090.
    [15] LILLEBERG J,ZHU Y,ZHANG Y. Support vector machines and Word2vec for text classification with semantic features[C]//IEEE International Conference on Cognitive Informatics&Cognitive Computing,IEEE,2015.
    [16] RONG X.word2vec Parameter Learning Explained[EB/OL].[2016-06-05].https://arxiv.org/abs/1411.2738.
    [17] SUTSKEVER L,VINGYALS O,LE Q V. Sequence to Sequence Learning with Neural Networks[EB/OL].[2014-09-10].https://arxiv.org/abs/1409.3215
    [18]于俊婷,何宏业,刘伍颖,等.基于同义词词林的平滑BLEU研究[J].郑州大学学报(理学版),2017,49(2):55-60.
    [19]叶绍林,郭武.基于句子级BLEU指标挑选数据的半监督神经机器翻译[J].模式识别与人工智能,2017,30(10):937-942.
    [20]梁敬东,崔丙剑,姜海燕,等.基于word2vec和LSTM的句子相似度计算及其在水稻FAQ问答系统中的应用[J].南京农业大学学报,2018,41(5):946-953.
    [21] MA F,CHITTA R,KATARIA S,et al.Long-Term Memory Networks for Question Answering[EB/OL].[2017-07-06].https://arxiv.org/abs/1707.01961.
    [22] MOU L L,SONG Y P,YAN R,et al.Sequence to Backward and Forward Sequences:A Content-Introducing Approach to Generative Short-Text Conversation[EB/OL].[2016-10-13].https://arxiv.org/abs/1607.00970.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700