藏文句子语义块识别方法

英文篇名：Tibetan Sentence Semantic Chunking Method
作者：柔特 ; 色差甲 ; 才让加
英文作者：ROU Te;SE Chajia;CAI Rangjia;Department of Computer Science,Qinghai Normal University;Provincial Key Laboratory of Tibetan Intelligent information Processing and Machine Translation,Qinghai Normal University;
关键词：藏文 ; 语义块 ; 语义分割 ; 语义分析
英文关键词：Tibetan;;semantic chunk;;semantic segmentation;;semantic analysis
中文刊名：MESS
英文刊名：Journal of Chinese Information Processing
机构：青海师范大学计算机学院;青海师范大学青海省藏文信息处理与机器翻译重点实验室;
出版日期：2019-06-15
出版单位：中文信息学报
年：2019
期：v.33
基金：国家重点研发计划项目(2017YFB1402200);; 国家自然科学基金(61662061);; 国家社会科学基金(14BYY132,15BYY167,16YY167)
语种：中文;
页：MESS201906006
页数：8
CN：06
ISSN：11-2325/N
分类号：47-54

摘要

语义理解是自然语言理解的一项关键任务,传统上采用以语法为中心的词法和句法分析等技术来解析句义。该文提出了一种以语义块分析藏文句义的新方法,其中藏文语义块识别通过采用Bi-LSTM和ID-CNN两种神经网络构架对该任务进行建模和对比分析。经实验,上述的两种模型在测试数据集上取得了良好的性能表现,F_1值平均分别为89%和92%。这种语义块分析和识别技术能够较好地替代词义消歧和语义角色标注等工作。
Semantic understanding is an essential task in natural language understanding.Conventionally,grammarrule-based approaches including lexical and sentence analysis are leveraged to parse the semantic meaning of given text.In this work,we present a new method to address Tibetan sentence semantic parsing via semantic chunking.The semantic chunking is modeled by Bi-LSTM and ID-CNN neural network,respectively.In experiments,the proposed model shows a remarkable performance,achieving the average F_1 of 89% and 92%,respectively.

引文

[1]A Schmauss.Appearance-based Arabic sign language recognition using Hidden Markov Models[C]//Proceedings of Engineering and Technology(ICET)2014International Conference:IEEE,2014:1-6.
    [2]Sharma P,Sharma U,Kalita J.Named entity recognition in Assamese using CRFS and rules[C]//Proceedings of Asian Language Processing(IALP)2014International Conference:IEEE,2014:15-18.
    [3]Fan S X,Chen L D,Wang X,et al.Shallow parsing with Hidden Markov Support Vector Machines[C]//Proceedings of Machine Learning and Cybernetics(IC-MLC)2014International Conference:IEEE,2014:827-830.
    [4]李丹.基于朴素贝叶斯方法的中文文本分类研究[D].保定:河北大学硕士学位论文,2011.
    [5]Yang Z,Li M,Zhu Z,et al.A maximum entropy based reordering model for Mongolian-Chinese SMT with morphological information[C]//Proceedings of Asian Language Processing(IALP),2014International Conference:IEEE,2014:175-178.
    [6]完么扎西.藏语句法分析系统的研究与实现[D].西藏:西藏大学硕士学位论文,2015.
    [7]George A.miller.WordNet:A lexical database for english[J].Communication of the ACM(CACM),1995,38:39-41.
    [8]Dan Tufis,Dan Stefanescu.Experiments with a differential semantics annotation for WordNet 3.0[J].Decision Support Systems,2012,53(4):695-703.
    [9]于江生,俞士汶.中文概念词典的结构[J]中文信息学报,2002,16(4):12-20.
    [10]董振东,董强.知网[OL].www.keenage.com/.
    [11]刘扬,陆顾婧,汉英双语概念对应的实证研究[J].云南师范大学学报,2012,44(1):35-39.
    [12]塔娜,林民,李小庆,面向跨语言信息检索的蒙汉语义词典构建初探[J].计算机与数字工程,2010,38(8):42-45.
    [13]刘知远,孙茂松,林衍凯,等.知识表示学习研究进展[J].计算机研究与发展,2016,53(2):247-261.
    [14]Gong C,Li X,Wu X.Recurrent neural network language model with part-of-speech for Mandarin speech recognition[C]//Proceedings of Chinese Spoken Language Processing(ISCSLP)2014 9th International Symposium:IEEE,2014:459-463.
    [15]史晓东,卢亚军.央金藏文分词系统[J].中文信息学报,2011,25(4):54-56.
    [16]才智杰.藏文自动分词系统中紧缩词的识别[J].中文信息学报,2009,23(1):35-37.
    [17]多杰卓玛.藏文名词短语的语义研究[J].西北民族大学学报(自然科学版),2016,(3):35-40.
    [18]诺明花,张立强,刘汇丹,等.汉藏短语抽取[J].中文信息学报,2011,25(2):105-110.
    [19]江荻.现代藏语组块分词的方法与过程[J].民族语文,2003,(4):30-39.
    [20]李琳,龙从军,江荻.藏语句法功能组块的边界识别[J].中文信息学报,2013,27(6):165-168.
    [21]李业刚,黄河燕.汉语组块分析研究综述[J].中文信息学报,2013,27(3):1-8.
    [22]高定国,扎西加,赵栋材.计算机识别藏语虚词的方法研究[J].中文信息学报,2014,28(1):113-117.
    [23]拉巴顿珠,欧珠,赵栋材.藏文自动分词系统中虚词识别算法研究[J].计算机应用与软件,2017,(9):299-301.
    [24]才藏太.基于最大熵分类器的藏文句子边界自动识别方法研究[J].计算机工程与科学,2012,(6):187-190.
    [25]丁海兰,祁坤钰.基于依存句法的藏文属格结构统计研究[J].西北民族大学学报(自然科学版),2016,(2):32-36.
    [26]于静.汉语句子的组块识别研究[D].大连:大连理工大学硕士学位论文,2008.
    [27]王天航,史树敏,龙从军,等.基于错误驱动学习策略的藏语句法功能组块边界识别[J].中文信息学报,2014,28(5):170-175.
    [28]王天航.面向机器翻译的藏语功能组块识别研究[D].北京:北京理工大学硕士学位论文,2016.
    [29]WRAY A.Formulaic language and the lexicon[M].Cambridge University Press,2002.9.
    [30]石志亮.语料库语言学视角下的语块自动提取研究[J].天津外国语大学学报,2012(6):22-26.
    [31]ABNEY S.Parsing by Chunks[M].Dordrecht:Kluwer Academic Publishers,1991:257-278.
    [32]赵军,黄昌宁.基于转换的汉语基本名词短语识别模型[J].中文信息学报,1999,13(2):1-7.
    [33]周强,孙茂松,黄昌宁.汉语句子的组块分析体系[J].计算机学报,1999,22(11):1158-1165.
    [34]常若愚.汉语语义组块识别研究[D].杭州:杭州电子科技大学硕士学位论文,2015.
    [35]Lample G,Ballesteros M,Subramanian S,et al.Neural architectures for named entity recognition[C]//Proceedings of NAACL-HLT,2016:260-270.
    [36]Dong C,Zhang J,Zong C,et al.Character-based LSTM-CRF with radical-level features for Chinese named entity recognition[C]//Proceedings of International Conference on Computer Processing of Oriental Languages.Springer International Publishing,2016:239-250.
    [37]Huang Z,Xu W,Yu K.Bidirectional LSTM-CRFmodels for sequence tagging[J].arXiv preprint arX-iv:1508.01991,2015.
    [38]Ma X,Hovy E.End-to-end sequence labeling via bidirectional LSTM-CNNs-CRF[J].arXiv preprint arXiv:1603.01354,2016.
    [39]Chiu J P C,Nichols E.Named entity recognition with bidirectional LSTM-CNNs[J].arXiv preprint arXiv:1511.08308,2015.
    [40]Yu Fisher,Vladlen Koltun.Multi-scale context aggregation by dilated convolutions[J].arXiv preprint arXiv:1511.07122,2015.
    [41]李丽双,郭元凯.基于CNN-BLSTM-CRF模型的生物医学命名实体识别[J].中文信息学报,2018,32(1):116-122.
    [42]柔特,才让加,孙茂松.基于语序变换的藏文复述句生成方法[J].计算机工程,2018,44(4):231-235.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700