汉语句子框架语义结构分析技术研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
语义分析是自然语言处理领域中最重要也是最为困难的问题。如何对句子进行有效的、深入的自动语义分析,一直是国内外学者关注的主要目标之一。汉语句子的框架语义结构分析技术是以框架语义学为理论基础,借助山西大学的汉语框架网语义资源,针对汉语句子语义结构,展开了汉语句子框架语义结构建模、目标词识别、框架排歧和框架语义角色标注等核心技术研究,同时,对基于汉语框架语义分析的旅游问答系统进行了应用研究。主要研究成果如下:
     (1)针对汉语句子语义结构,系统地进行了汉语句子框架语义结构分析,提出了汉语框架语义依存图模型,包括单框架语义依存图、完全框架语义依存图、核心框架语义依存图,为汉语句子语义结构表示提供了新的方法。
     (2)针对目标词识别问题,提出了基于相似度计算、最大熵模型的未登录目标词识别方法,充分考虑了词义信息、依存特征及上下文语境,有效地解决了未登录目标词的识别,为实现准确的框架排歧提供了保障。
     (3)针对框架排歧问题,提出了基于T-CRF的框架排歧方法,通过加入依存特征中长距离的依存关系提升了汉语框架排歧的性能,同时与基于SVM和最大熵模型排歧方法进行了对比实验,验证了基于T-CRF框架排歧的有效性。
     (4)针对框架语义角色标注问题,在总结对比现有主流算法的基础上,提出了基于T-CRF模型的框架语义角色标注方法,并通过加入依存特征提升了标注准确率。其次,基于框架语义角色标注进行了句子相似度计算,从框架语义的角度出发,提出了基于多框架及其重要度的句子语义相似度计算方法,实验结果验证了框架语义角色对句子语义相似度计算的有效性。
     (5)针对汉语框架网语义资源与语义分析方法的应用研究,设计并实现了面向山西旅游领域的问答实验原型系统。系统以旅游景点五台山为例,针对每个景点的简介文本进行了全文框架语义角色标注。系统包括问题输入、问句分析及答案抽取,验证了基于框架语义分析进行问答系统应用的可行性。
     本文的研究成果进一步丰富了汉语句子框架语义结构分析理论与方法,为实现汉语句子深层语义分析提供了新的途径,为自然语言处理领域基于语义分析的应用系统提供了一种新的技术支撑。
Semantic analysis is most significant and difficult problem in natural language process fields. One of major goals that scholars both at home and abroad are all concerned about is how to realize effective, deep and automatic semantic analysis of sentences. Frame semantic structure of Chinese sentences is based on Frame Semantics and represents formally semantic structure of sentences with the aid of Chinese FrameNet of Shanxi University. This paper develop the core technology research of Chinese sentence frame semantic structure model, target words identification, frame disambiguation and frame semantic role labeling, meanwhile conduct the application research of tourism question-answering system based on the Chinese frames semantic analysis. The main research results as follow:
     (1) According to semantics struture of Chinese sentences, this thesis systematically analyzes frame semantic struture of Chinese sentences, puts forward frame semantic dependency graph models, including single frame semantic dependency graph, complete frame semantic dependency graph and core frame semantic dependency graph, and provides a novel way to represent semantic struture of Chinese sentences.
     (2) According to target words identification, the paper proposes the methods of unknown target words based on similarity computation and maximum entropy model. Because of full consideration words senses, dependency feature and context feature, they solve unknown target words identification and realize automatic extending to lexical units.
     (3) According to frame disambiguation, this thesis proposes a method based on T-CRF for frame disambiguation. It promotes the performance of frame disambiguation by adding long dependency relations in dependency feature. In addition, it compares with the frame disambiguation methods based on SVM and Maximum Entropy, which verifies the effectiveness of the frame disambiguation method based on T-CRF.
     (4) According to frame semantic role labeling, after summarizing and contrasting currently popular algorithms, we bring forward frame semantic role labeling based on T-CRF, which promotes labeling precision by adding dependency feature. Besides, based on frame semantic role labeling, we conduct the similarity computation from the point of frame semantics and put forward the semantic similarity computation of sentences based on multiple frames and significantion. The results of similarity computation based on frame semantic role labeling verify effectiveness of frame semantic role labeling for semantic similarity computation of sentences.
     (5) According to application research of semantic resource and semantic analysis methods of Chinese FrameNet, this paper designs and implements tourism question-answering system that orients tourism fields of Shanxi Province. The system takes Wutai Mountain for example, labels full-text frame semantic role to each introduction of scenic spots. The system includes question input, question analysis and answer extraction, which verifies the effectiveness of tourism question-answering system based on Chinese frame semantic analysis.
     The fruits of this paper further enrich theory and method for Frame semantic structure analysis research of Chinese sentences, provide a novel way to realize deep understanding of the semantic for Chinese sentences and provide a new technical support for more application systems based on semantic analysis in natural language processing fields.
引文
[1]http://www.cipsc.org.cn/upload/%E4%B8%AD%E6%96%87%..
    [2]张普.信息处理用现代汉语语义分析的理论与方法[J].中文信息学报,1991,5(8):7-18.
    [3]GildeaD, JurafskyD. Automatic labeling of semantic roles[J]. Computational Linguistics.2002,28(3):245-288.
    [4]Min Zhang, Wanxiang Che, Guodong Zhou, Ai Ti Aw, Chew Lim Tan, Ting Liu, Sheng Li. Semantic Role Labeling Using a Grammar-Driven Convolution Tree Kernel[J]. IEEE Transactions on Audio, Speech, and Language Processing.2008, 16(7):1315-1329.
    [5]Wanxiang Che, Min Zhang, Ai Ti Aw, Chew Lim Tan, Ting Liu, and Sheng Li. Using a Hybrid Convolution Tree Kernel for Semantic Role Labeling[J]. ACM Transactions on Asian Language Information Processing.2008,7(4):1-23.
    [6]C. Baker, M. Ellsworth, K. Erk. SemEval-2007 Task 19:Frame semantic structure extraction[C]. In Proceedings of the 4th International Workshop on Semantic Evaluations.2007:99-104.
    [7]郝晓燕,刘伟,李茹,刘开瑛.汉语框架语义知识库及软件描述体系[J].中文信息学报,2007,21(5):96-100.
    [8]Fillmore. Frame semantics[J]. Linguistics in the Morning Calm.1982:111-137.
    [9]Baker, Collin F, Charles J. Fillmore, and John B. Lowe. The Berkeley FrameNet project[C]. In Proceedings of the Thirty-Sixth Annual Meeting of the Association for Computational Linguistics and Seventeenth International Conference on Computational Linguistics.1998:86-90.
    [10]C.Cardie. Empirical Methods in Information Extraction[J]. AI Magazine.1997, 18(4):65-80.
    [11]D.Freitag. Toward General-purpose Learning for Information Extraction[C]. In Proceedings of the Thirty-Sixth Annual Meeting of the Association for Computational Linguistics and Seventeenth International Conference on Computational Linguistics.1998:404-408.
    [12]D. M. Bikel, R. L. Schwartz, R. M. Weischedel. An Algorithm That Learns What's in a Name[J]. Machine Learning.1999,34(1-3):211-231.
    [13]车万翔.基于核方法的语义角色标注研究[D].哈尔滨工业大学.2007.
    [14]F.Kong, G. D. Zhou, Q. M. Zhu. Employing the Centering Theory in Pronoun Resolution from the Semantic Perspective[C]. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing.2009:987-996.
    [15]王鑫,孙薇薇,穗志方.基于浅层句法分析的中文语义角色标注研究[J].中信息学报.2011,25(01):116-121.
    [16]董静,孙乐,吕元华,冯元勇.基于线性链条件随机场模型的语义角色标注[J].中文信息处理前沿进展——中国中文信息学会二十五周年学术会议论文集,2006:200-209.
    [17]刘挺,车万翔,李生.基于最大熵分类器的语义角色标注[J].软件学报,2007,18(3):565-573.
    [18]M Palmer, D Gildea, P Kingsbury. The Proposition Bank:An Annotated Corpus of Semantic Roles[J]. Computational Linguistics.2005,31(1):71-105.
    [19]MEYERS A, REEVES R, MACLEOD C. The Nombank Project:An Interim Report[C]. In Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics.2004: 24-31.
    [20]袁毓林.语义资源建设的最新趋势和长远目标[J].中文信息学报,2008,3:3-15.
    [21]E. Hajicova. Prague Dependency Treebank:From Analytic to Tectogrammatical Annotations[C]. In Proceedings of the First Workshop on Text, Speech, Dialogue. 1998:45-50.
    [22]K. Erk, A. Kowalski, S. Pado, et al. Towards a Resource for Lexical Semantics:A Large German Corpus with Extensive Semantic Annotation[C]. In Proceedings of 41st Annual Meeting of the Association for Computational Linguistics.2003.
    [23]刘开瑛.汉语框架语义网(CFN)构建现状[J].第四届全国学生计算语言学研讨会会议论文集,2008.
    [24]Carreras X, Marques L. Introduction to the CoNLL-2004 Shared Task:Semantic role labeling[C]. In Proceedings of the 8th Conference on Natural Language Learning.2004:89-97.
    [25]Carreras X, Marques L. Introduction to the CoNLL-2005 Shared Task:Semantic role labeling[C]. In Proceedings of the 9th Conference on Natural Language Learning.2005:152-164.
    [26]Baker CF, Ellsworth M, Erk K. SemEval'07 Task 19:Frame Semantic Structure Extraction[C]. In Proceedings of the 4th International Workshop on Semantic Evaluations.2007:99-104.
    [27]Surdeanu M, Johansson R, Meyers A, Marquez L, Nivre J. The CoNLL 2008 Shared Task on Joint Parsing of Syntactic and Semantic Dependencies[C]. In Proceedings of the 12th Conference on Natural Language Learning.2008:159-177.
    [28]Liu T, Che WX, Li S.Semantic Role Labeling with Maximum Entropy Classifier[J]. Journal of Software.2007,18(3):565-573.
    [29]丁伟伟,常宝宝.基于最大熵原则的汉语语义角色分类[J].中文信息学报,2008,22(6):20-27.
    [30]刘怀军,车万翔,刘挺.中文语义角色标注的特征工程[J].中文信息学报,2007,21(1):79-84.
    [31]Pradhan S, Hacioglu K, Krugler V, Ward W, Martin J, Jurafsky D.Support vector learning for semantic argument classification[J]. Machine Learning.2005,60(1): 11-39.
    [32]Yu JD, Fan X, Pang W, Yu Z. Semantic role labeling based on conditional random fields[J]. Journal of Southeast University (English Edition).2007,23(3):361-364.
    [33]Lafferty J, McCallum A, Pereira F. Conditional Random Fields:Probabilistic Models for Segmenting and Labeling Sequence Data[C]. In Proceedings of the 18th International Conference on Machine Learning.2001:282-289.
    [34]李济洪,王瑞波,王蔚林,李国臣.汉语框架语义角色自动标注[J].软件学报,2010,21(4):597-611.
    [35]J.You and K.Chen, Automatic semantic role assignment for a tree structure[C]. In Proceedings of 3rd ACL SIGHAN Workshop.2004:109-115.
    [36]XUE N, XIA F, DONG CHIOU F. The Penn Chinese Treebank:Phrase Structure Annotation of a Large Corpus[J]. Natural Language Engineering.2005,11(2): 207-238.
    [37]Ken.Litkowski. CLR:Integration of FrameNet in a Text Representation System[C]. In Proceedings of the 4th International Workshop on Semantic Evaluations.2007: 113-116.
    [38]Cosmin Adrian Bejan, Chris Hathaway. UTD-SRL:A Pipeline Architecture for Extracting Frame Semantic Structures[C]. In Proceedings of the 4th International Workshop on Semantic Evaluations.2007:460-463.
    [39]Richard Johansson, Pierre Nugues. LTH:Semantic Structure Extraction using Nonprojective Dependency Trees[C]. In Proceedings of the 4th International Workshop on Semantic Evaluations.2007:227-230.
    [40]Dipanjan Das, Nathan Schneider, Desai Chen, Noah A.Smith. Probabilistic Frame-Semantic Parsing[C]. In Proceedings of Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics.2010:948-956.
    [41]李明琴,李涓子,王作英,陆大绘.中文语义依存关系分析的统计模型[J],计算机学报.2004,27(12):1679-1687.
    [42]You Liping, Kaiying Liu. Building Chinese FrameNet database[C]. In Proceedings of 2005 IEEE International Conference on Natural Language Processing and Knowledge Engineering.2005.
    [43]刘开瑛.汉语框架语义网构建及其应用技术研究[J].中文信息学报,2011,25(6):46-53.
    [44]由丽萍.构建现代汉语框架语义知识库技术研究[D].上海师范大学.2006.
    [45]刘挺,王开铸.基于篇章多级依存结构的自动文摘研究[J].计算机研究与发展,1999,36(4):479-488.
    [46]R.Johansson and P.Nugues. Dependency-based semantic role labeling of PropBank[C]. In Proceedings of Empirical Methods in Natural Language Processing.2008:69-78.
    [47]L.Shi and R.Mihalcea. Putting pieces together:combining FrameNet, VerbNet and WordNet for robust semantic parsing[C]. In Proceedings of Sixth International Conference on Intelligent Text Processing and Computational Linguistics.2005: 100-111.
    [48]A.Burchardt, K.Erk, and A.Frank. A WordNet detour to FrameNet[C]. In Proceedings of 17th Annual meeting of the GLDV (Society for Computational Linguistics and Language Technology).2005.
    [49]Matthew Honnibal and Tobias Hawker. Identifying FrameNet Frames for Verbs from a Real-Text Corpus[C]. In Proceedings of the Australasian Language Technology Workshop.2005:200-206.
    [50]A.-M.Giuglea and A.Moschitti. Shallow semantic parsing based on FrameNet, VerbNet and PropBank[C]. In Proceedings of 17th European Conference on Artificial Intelligence.2006.
    [51]R.Johansson and P.Nugues. Using WordNet to extend FrameNet coverage[C]. In Proceedings of the Workshop on Building Frame-semantic Resources for Scandinavian and Baltic Languages.2007.
    [52]Diego De Caol, Danilo Crocel, Marco Pennacchiotti and Roberto Basili. Combing Word Sense and Usage for Modeling Frame Semantics[C]. In Proceedings of 13th Finnish Artificial Intelligence Conference.2008:85-101.
    [53]M.Pennacchiotti, D. De Cao, R. Basili, D. Croce, and M. Roth. Automatic induction of FrameNet lexical units[C]. In Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing.2008:457-465.
    [54]Sara Tonelli and Daniele Pighin. New Features for FrameNet-WordNet Mapping[C]. In Proceedings of the Thirteenth Conference on Computational Natural Language Learning.2009:219-227.
    [55]S.Tonelli and C.Giuliano. Wikipedia as frame information repository[C]. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing.2009:276-285.
    [56]Dipanjan Das and Noah A. Smith.Semi-Supervised Frame-Semantic Parsing for Unknown Predicate [C]. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics:Human Language Technologies.2011: 1435-1444.
    [57]田久乐,赵蔚.基于同义词词林的词语相似度计算方法[J].吉林大学学报(信息科学版),2010,28(6):602-608.
    [58]刘挺,车万翔,李正华.语言技术平台[J].中文信息学报,2011,25(6):53-61.
    [59]S.Della Pietra, V.Della Pietra, R.L.Mercer, and S.Roukos. Adaptive language modeling using minimum discriminant estimation[C]. In Proceedings of the Speech and Natural Language DARPA Workshop.1992:103-106.
    [60]Erk, K. Frame assignment as word sense disambiguation[C]. In Proceedings of Sixth International Workshop on Computational Semantics.2005.
    [61]Tang Jie, Mingcai Hong, Juanzi Li, and Bangyong Liang.Tree-structured Conditional Random Fields for Semantic Annotation[C]. In Proceedings of 5th International Conference of Semantic Web.2006.
    [62]P.Awasthi, A.Gagrani, B.Ravindran. Image modeling using tree structured conditional random fields [C]. In Proceedings of the 20th International Joint Conference on Artificial Intelligence.2007:2060-2065.
    [63]Trevor Cohn, Philip Blunsom.Semantic role labeling with tree conditional random fields[C]. In Proceedings of Ninth Conference on Computational Natural Language Learning.2005.
    [64]T.Joachims. Text categorization with support vector machines:Learning with many relevant features[C]. In Proceedings of the 10th European Conference on Machine Learning.1998:137-142.
    [65]T.Kudo and Y.Matsumoto. Chunking with support vector machines[C]. In Proceedings of the Second Meeting of the North American Chapter of the Association for Computational Linguistics.2001:137-142.
    [66]G.Escudero, L.Marquez, and G.Rigau. On the portability and tuning of supervised word sense disambiguation[C]. In Proceedings of the Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora.2000: 172-180.
    [67]M.Murata, M.Utiyama, K.Uchimoto, Q.Ma and H.Isahara. Japanese word sense disambiguation using the simple Bayes and support vector machine methods[C]. In Proceedings of the 2nd International Workshop on Evaluating Word Sense Disambiguation Systems.2001:135-138.
    [68]Keok, L. Y. and Ng, H. T. An empirical evaluation of knowledge sources and learning algorithms for word sense disambiguation[C]. In Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing.2002:41-48.
    [69]Boser, B. E., Guyon, I. M., and Vapnik, V. N. A training algorithm for optimal margin classifiers[C]. In Proceedings of the 5th Annual Workshop on Computational Learning Theory.1992:144-152.
    [70]AdmaL. Berger, StePhen A. Della Pietra, and Vineent J. Della Pietra. A Maximum Entropy Approach to Natural Language Processing[J]. Computational Linguistic. 1996,22(1):39-71.
    [71]Ronald Rosenefld. Adaptive Statistical Language Modeling:A Maximum Entropy Approach[D]. Ph.D.thesis, Carnegie Mellon University,1994.
    [72]Rosenefld. A Maximum Entropy Approach to Adaptive Statistical Language Modeling[J]. Computer, Speech and Language.1996, (10):187-228.
    [73]Adwait Ratnpaarkhi. Maximum Entropy Models for Natural Language Ambiguity Resolution[D]. Ph.D.thesis, University of Pennsylvania,1998
    [74]A. Ratnpaakrhi, J.Reynar, and S.Roukos. A maximum entropy model for prepositional phrase attachment[C]. In Proceedings of the ARPA Workshop on Human Language Technology.1994:250-255
    [75]Adwait Ratnaparkhi. A Maximum Entropy Part-Of-Speech Tagger[C]. In Proceedings of the Conference on Empirical Methods in Natural Language Processing.1996.
    [76]Rob Koeling. Chunking with Maximum Entropy Models[C]. In Proceedings of The Fourth Conference on Computational Language Learning and the Second Learning Language in Logic Workshop.2000:139-141.
    [77]http://homepages.inf.ed.ac.uk/s0450736/maxent toolkit.html.
    [78]Chen J, Rambow O. Use of Deep Linguistic Features for the Recognition and Labeling of Semantic Arguments[C]. In Proceedings of 2003 Conference on Empirical Methods in Natural Language Processing.2003.
    [79]Cohn T, Blunsom P. Semantic role labelling with tree conditional random fields[C]. In Proceedings of Ninth Conference on Computational Natural Language Learning. 2005:169-172.
    [80]Xue NW, Palmer M. Automatic semantic role labeling for Chinese verbs[C]. In Proceedings of the Nineteenth International Joint Conference on Artificial Intelligence.2005.
    [81]刘开瑛,陈雪艳,李济洪.汉语框架元素自动标注实验报告[C].第四届全国信息检索与内容安全学术会议,2008,1:48-55.
    [82]王步康,王红玲,袁晓虹,周国栋.基于依存句法分析的中文语义角色标注[J],中文信息学报,2010,24(1):25-29.
    [83]Ming Che Lee. A novel sentence similarity measure for semantic-based expert systems[J]. Expert Systems with Applications.2011,38(5):6392-6399.
    [84]Sui Zhifang, Yu Shiwen. The Skeletal-Dependency-Tree-Based Computational Model for the Sentence Similarity[C]. In Proceedings of International Conference on Chinese Information Processing.1998:458-465.
    [85]Zhao Jun, Jin Qianli, XU Bo. Semantic Computation for Text Retrieval[J]. Chinese journal of computers.2005,28(12):2068-2078.
    [86]Che Wanxiang, Liu Ting, Qin Bing, Li Sheng. Chinese sentence similarity computing for bilingual sentence pair retrieval[C]. In Proceedings of 7th Joint National Conference on Computational Linguistics.2003.
    [87]Ramiz M. Aliguliyev. A new sentence similarity measure and sentence based extractive technique for automatic text summarization[J]. Expert Systems with Applications.2009,36(4):7764-7772.
    [88]Zhang Qi, Huang Xuanjing, Wu Lide. A new method for calculating similarity between sentencees and application on automatic text summarization[J]. Journal of Chinese Information Processing.2005,19(2):93-99.
    [89]http://www.keenage.com.
    [90]G. A. Miller, R. Beckwith, C. D. Fellbaum, D. Gross, K. Miller. WordNet:An online lexical database[C].1990.
    [91]Mei Jiaju, Zhu Yiming, GaoYunqi. Synonyms Cilin[M]. Shanghai:Shanghai Lexicographical Publisher,1983.
    [92]Li Sujian. The research of relevancy between sentences based on semantic computation[J]. Computer Engineering and Applications.2002,38 (7):75-76,83.
    [93]Li Bing, Liu Ting, Qin Bing, Li Sheng. Chinese sentence similarity computing based on semantic dependency relationship analysis [J]. Application Research of Computers.2003,20 (12):15-17.
    [94]Li Ru, Li Shuanghong, Zhang Zezheng. The Semantic Computing Model of Sentence Similarity Based on Chinese FrameNet[C]. In Proceedings of the 2009 IEEE/WIC/ACM International Conference on Web Intelligence and International Conference on Intelligent Agent Technology Workshops.2009:255-258.
    [95]Liu Qun, Li Sujian. Word similarity computing based on HowNet[J]. Computational Linguistics and Chinese Language Processing.2002,7 (2):59-76.
    [96]Wenqian JI, Zhoujun LI, Wenhan CHAO, Xiaoming CHEN. A New Method for Calculating Similarity between Sentences and Application on Automatic Abstracting[J]. Intelligent Information Management.2009:1 (1) 36-42.
    [97]Yuhua Li, David McLean, Zuhair A. Bandar, James D. O'Shea, and Keeley Crockett. Sentence Similarity Based on Semantic Nets and Corpus Statistics[J]. IEEE Transactions on knowledge and data engineering.2006,18(8):1138-1150.
    [98]赵妍妍,秦兵,刘挺,张俐,苏中.基于多特征融合的句子相似度计算[C].全国第八届计算语言学联合学术会议(JSCL),2005:168-174.
    [99]Ru Li, Haijing Liu, Shuanghong Li. Chinese Frame Identification using T-CRF Model[C]. In Proceedings of 23th International Conference on Computational Linguistics.2010:674-682.
    [100]J Pierre, Z Pierre. Towards a medial question answering system:a feasibility study [C]. In Proceedings of the Medical Information.2003:463-468.
    [101]R Fagin, P G Kolaitis, R J Miller et al. Data exchange:Semantics and query answer [C]. In Proceedings of the 9th International Conference on Database Theory. 2003:207-224.
    [102]李茹,王文晶,梁吉业,宋小香,刘海静,由丽萍.基于汉语框架网的旅游信息问答系统设计[J].中文信息学报,2009,23(2):34-40.
    [103]董慧,余传明,姜赢,杨宁等.基于本体的数字图书馆检索模型研究(Ⅱ)[J].情报学报,2006,25(4):451-461.
    [104]李茹,宋小香,王文晶.基于汉语框架网的中文问题分类[J].计算机工程与应用,2009,45(31):111-114.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700