摘要
通过分析兵棋演习过程中的常见问题,设计了一个针对兵棋演习特殊情景的问句分类模型。问句分类模型基于统计方法,利用Word2vec工具生成词向量,利用TextRank算法结合IDF值来生成词权重,共同完成问句表征。并综合考虑算法复杂度以及问句相似度计算的精确度,通过两个不同的问句相似度模型,以及改进的KNN算法来实现最终的问句分类。WMD(Word Mover's Distance)算法是在词向量基础上计算问句相似度较为精确的算法,但同时存在算法复杂度过高的缺点,论文通过改进的KNN算法将其与传统算法结合,来更好地完成需要的问句分类任务。
By analyzing the common problems in the process of war gaming,a question classification model is designed for aQA(question answering)system oriented a specific situation. The question classification model generates word vectors by word2 vecbased on statistical methods,and generates word weight by TextRank algorithm so as to complete the question representation. Theclassification combines two different models of question similarity calculation through the improved KNN(K Nearest Neighbor)algo-rithm,balancing the computation complexity and accuracy. The WMD(Word Move Distance)algorithm is based on the word vectorto calculate the similarity of questions more accurate algorithm,which also has the disadvantage of high algorithm complexity howev-er. In this paper,the improved KNN algorithm is combined with the traditional algorithm,in order to complete the required questionclassification task better.
引文
[1]胡晓峰.战争工程论:走向信息时代的战争方法学[M].北京:国防大学出版社,2012.HU Xiaofeng. On War System Engineering MethodologyTowards Information Age's War[M]. Beijing:National De-fense University Press,2012.
[2]胡晓峰,司光亚,吴琳,等.战争模拟原理与系统[M].北京:国防大学出版社,2009.HU Xiaofeng,Si Guangya,WU Lin. Principles and sys-tems of war simulation[M]. Beijing:National Defense Uni-versity Press,2009.
[3]镇丽华,王小林,杨思春.自动问答系统中问句分类研究综述[J].安徽工业大学学报(自科版),2015,32(1):48-54.ZHEN Lihua,WANG Xiaolin,YANG Sichun. Overview onQuestion Classification in Question-answering System[J].Journal of Anhui University of Technology(Natural Sci-ence),2015,32(1):48-54.
[4]牛彦清,陈俊杰,段利国,等.中文问句分类特征的研究[J].计算机应用与软件,2012,29(3):108-111.NIU Yanqing,CHEN Junjie,DUAN Liguo,et al. Study OnClassification Features of Chinese Interrogatives[J]. Com-puter Applications and Software,2012,29(3):108-111.
[5]贾明静,董日壮,段良涛.问句相似度计算综述[J].电脑知识与技术,2014,10(31):7434-7437.JIA Mingjing,Deng Rizhuang,DUAN Liangtao. QuestionSimilarity Computation Review[J]. Computer Knowledgeand Technology,2014,10(31):7434-7437.
[6]Mihalcea R,Corley C,Strapparava C. Corpus-based andKnowledge-based Measures of Text Semantic Similarity[J]. Unt Scholarly Works,2006,1:775-780.
[7]Mikolov T,Chen K,Corrado G,et al. Efficient Estima-tion of Word Representations in Vector Space[J]. Comput-er Science,2013.
[8]Mikolov T,Sutskever I,Chen K,et al. Distributed Repre-sentations of Words and Phrases and their Compositionality[J]. Advances in Neural Information Processing Systems,2013,26:3111-3119.
[9]余珊珊,苏锦钿,李鹏飞.基于改进的TextRank的自动摘要提取方法[J].计算机科学,2016,43(6):240-247.YU Shanshan,SU Jindian,LI Pengfei. Improved Tex-tRank-based Method for Automatic Summarization[J].Computer Science.2016,43(6):240-247.
[10]王丽月,叶东毅.面向游戏客服场景的自动问答系统研究与实现[J].计算机工程与应用,2016,52(17):152-159.WANG Liyue,YE Dongyi. Research and implementa-tion of automatic question-answering system in game cus-tomer service scenarios[J]. Computer Engineering andApplications. 2016,52(17):152-159.
[11]Kusner M J,Sun Y,Kolkin N I,et al. From Word Em-beddings To Document Distances[C]//ICML,2015:957-966.
[12]贾可亮,樊孝忠,许进忠.基于KNN的汉语问句分类[J].微电子学与计算机,2008,25(1):156-158.JIA keliang,FAN Xiaozhong,XU Jinzhong.Chinese Ques-tion Classification Based on KNN[J]. Microelectronics&Computer,2008,25(1):156-158.
[13]张雪芬,李德玉,王素格,等.基于统计方法的面向旅游问句分类实验研究[J].电脑开发与应用,2009,22(1):14-16.ZHANG Xuefen,LI Deyu,WANG Suge. An EmpricalStudy On Question Sentence Classification For Tour Do-main based on Statistic Methods[J]. Computer Develop-ment&Applications,2009,22(1):14-16.
[14]刘挺.人机对话浪潮:语音助手、聊天机器人、机器伴侣[J].中国计算机学会通讯,2015,11(10):54-56.LIU Ting. Man-machine Dialogue Wave:Voice Assis-tant,Chat Robot,Machine Mate[J]. Communications ofthe CCF,2015,11(10):54-56.
[15]张宁,朱礼军.中文问答系统问句分析研究综述[J].情报工程,2016,2(1):32-42.ZHANG Ning,ZHU Lijun. A Survey of Chinese QA Sys-tem's Question Analysis[J]. Information Engineering,2016,2(1):32-42.