基于树核方法的中文语义角色标注研究

英文题名：Research on Tree Kernel-Based Semantic Role Labeling in Chinese
作者：吴方磊
论文级别：硕士
学科专业名称：计算机应用技术
中文关键词：语义角色标注 ; 语义角色分类 ; 树核 ; 复合核
英文关键词：Semantic Role Labeling ; Semantic Role Classification ; Tree Kernel ; Composite Kernel
学位年度：2011
导师：朱巧明 ; 李培峰
学科代码：081203
学位授予单位：苏州大学
论文提交日期：2011-05-01

摘要

语义角色标注(Semantic Role Labeling,简称SRL)是浅层语义分析的一种实现方式,其任务是对于给定句子,对句子中的每个谓词标注出句中的相应语义成分,并作出相应的语义标记,如施事、受事、工具或附加语等。
     近期主流的SRL研究都基于特征向量的方法,取得了较好的效果。然而,这种方法存在的问题也日益突显,如:更有效的特征很难被抽取,丢失了重要的结构化信息等。目前,一种研究趋势是探索基于核函数的SRL方法,可以有效地解决特征工程所带来的瓶颈。本文深入探讨了基于树核方法的中文语义角色标注,重点研究SRL的分类阶段。
     首先,我们研究了应用在中文SRL上的各种核方法:使用二次多项式核实现了一个基于特征向量的语义角色分类系统;探索了基于卷积树核的语义角色分类方法,并在最小句法树结构的基础上,进一步定义了两种不同的句法结构。在中文PropBank语料上的精确率达到91.53%;使用复合核将基于树核和基于特征的方法结合,性能进一步提高,分类精确率达到94.23%。
     接着,我们对适用于中文SRL的有效结构化信息做了更为深入的研究,探索了结构化特征对语义角色分类的重要性。考虑到同一谓词的各论元间的影响,提出了多论元-谓词结构化特征空间(AAPF),并在结构化特征中融入平面特征的信息,提出了三种受平面特征启发的方法,分类精确率提高到92.54%。再使用复合核将最优的树核方法FIT与特征向量结合起来对语义角色进行分类,分类精确率达到95.21%,性能优于目前同类SRL系统。
     最后,我们使用树核函数的方法对中文名词性谓词语义角色分类进行了初步探索,结果表明,将树核函数应用于中文名词性谓词语义角色分类有较大的潜力。
Semantic role labeling (SRL) is a particular case of shallow semantic parsing, it only labels predicate-related constituents with semantic roles in a sentence, such as agent, patient, time, place, and so on.
     At present, the mainstream studies of semantic role labeling focus on the feature-based method, and it can achieve high performance. However, this method also has some issues. For example: it’s difficult to extract more effective features for SRL and it misses the important structural information. Current trend of SRL is to explore kernel-based method, which can effectively solve the bottleneck of those features engineering methods. This dissertation explores kernel-based SRL in Chinese and focuses on the semantic role classification.
     At first, we focus on how to apply current kernel-based methods to the Chinese SRL. We construct a feature-based SRL system which uses a polynomial kernel to combine features automatically. Meanwhile, we explore SRL in Chinese via tree kernel methods and explore two effect syntactic structures with respect to the characteristics of semantic role classification by extending the minimum syntactic structure. Evaluation on the Chinese PropBank shows that the tree kernel-based semantic role classification method achieves a performance of 91.53% in accuracy. We also explore composite kernel to integrate the feature-based method and the kernel-based method. The experimental results show that the accuracy is improved to 94.23%.
     Then, we explore the structured-fetures in Chinese SRL. Considering the dependence among the arguments of a predicate, we propose an All-Arguments Predicate Feature (AAPF) space, which can capture the dependency relation. Moreover, we introduce flat features into the kernel-based method and propose three heuristic kernel space. Experimental results on Chinese PropBank shows that our approach improves the performance of 92.54% in accuracy. Finally, we adopts composite kernel to combine tree kernel-based and feature-based approaches and the accuracy achieves 95.21%, which outperforms the state-of-the-art system.
     At last, we use the method described above on the Chinese nominal semantic role classification. Experimental results on Chinese NomBank shows that our method has a greater potential for further research.

引文

[1] Gildea D., Jurafsky D.. Automatic Labeling of Semantic Roles[J]. Computational Linguistics. 2002, 28(3):245–288.
    [2] Surdeanu M., Harabagiu S., J.Williams, Aarseth P.. Using Predicate-argument Structures for Information Extraction[C]. In Proceedings of the Annual Meeting on Association for Computational Lingustics (ACL). 2003. pp.8-15.
    [3]于江德,樊孝忠,庞文博.事件信息抽取中语义角色标注研究[J].计算机科学. 2008, 35(3):155–157.
    [4] Narayanan S., Harbabagiu S.. Question Answering Based on Semantic Structures[C]. In Proceedings of COLING 2004. 2004.
    [5] Hajic J., Cmejrek M., Dorr B., et al. Natural Language Generation in the Context of Machine Translation[R]. Center for Language and Speech Processing, Johns Hopkins University, Baltimore, 2002.
    [6] Braz R., Girju R., Punyakanok V., Roth D., Sammons M. An inference model for semantic entailment in natural language[C]. In Proceedings of AAAI 2005. pp. 261-286.
    [7] Melli G., Wang Y., Liu Y., Kashani M.M., Shi Z., Gu B., Sarkar A., Popowich F. 2005. Description of SQUASH, the SFU question answering summary handler for the DUC-2005 summarization task[C]. In Proceedings of DUC 2005.
    [8] Carreras X., M`arquez L.. Introduction to the Conll-2004 Shared Task: Semantic Role Labeling[C]. In Proceedings of Eighth Conference on Computational Natural Language Learning (CoNLL-2004). Boston, Massachusetts, USA, 2004:89–97.
    [9] Carreras X., M`arquez L.. Introduction to the CoNLL-2005 Shared Task: Semantic Role Labeling[C]. In Proceedings of CoNLL-2005. Michigan, 2005:152–164.
    [10] CoNLL 2008, http://www.yr-bcn.es/conll2008/, [EB].
    [11] CoNLL 2009, http://ufal.mff.cuni.cz/conll2009-st/,[EB].
    [12] Hirst G. Semantic interpretation and the resolution of ambiguity (Studies in naturallanguage processing)[M]. Cambridge University Press, 1987.
    [13] Xue N., Palmer M. Calibrating features for semantic role labeling[C]. In Proceedings of EMNLP-2004, 2004:88-94.
    [14] Pradhan S., Hacioglu K., V. Krugler, et al. Support Vector Learning for Semantic Argument Classification[J]. Machine Learning Journal. 2005,60(1):11-39.
    [15]刘挺,车万翔,李生.基于最大熵分类器的语义角色标注.软件学报. 2007, 18(3):565-573.
    [16]刘怀军,车万翔,刘挺.中文语义角色标注的特征工程.中文信息学报. 2007, 21(2):79-85.
    [17]李军辉,王红玲,周国栋,朱巧明,钱培德.语义角色标注中句法特征的研究[J].中文信息学报, 2009, 23(6): 11-18.
    [18] Gildea D., Palmer M.. The Necessity of Parsing for Predicate Argument Recognition[C]. In Proceedings of ACL 2002: 239-246.
    [19] Zhang M., Zhang J., Su J.. Exploring Syntactic Features for Relation Extraction using a Convolution Tree Kernel. COLING-ACL-2006: 288-295. Sydney, Australia.
    [20] Kong F., Li Y., Zhou G., Zhu Q. Exploring Syntactic Features for Pronoun Resolution Using Context-Sensitive Convolution Tree Kernel. ICCC. 2009.
    [21] Moschitti A.. A Study on Convolution Kernels for Shallow Statistic Parsing. ACL-2004. 2004:335–342.
    [22] Moschitti A., Pighin D., Basili R.. Tree Kernel Engineering in Semantic Role Labeling Systems. Proceedings of the Workshop on Learning Structured Information for Natural Language Applications, Eleventh International Conference on European Association for Computational Linguistics. Trento, Italy, 2006:49–56.
    [23] Che W., Zhang M., Liu T., Li S.. A Hybrid Convolution Tree Kernel for Semantic Role Labeling. COLING-ACL-2006. Sydney, Australia,2006.
    [24] Zhang M., Che W., AW A. T., TAN C. L., Liu T., Li S.. A Grammar-driven Convolution Tree Kernel for Semantic Role Classification. ACL-2007. 2007:200-207.
    [25]车万翔.基于核方法的语义角色标注研究[D].哈尔滨:哈尔滨工业大学,2008.
    [26] Pradhan S., Sun H., Ward W., Martin J., Jurafsky D.. Parsing arguments ofnominalizations in English and Chinese. In Proc. Of NAACL-HIT 2004.
    [27] Xue N.. Semantic role labeling of nominalized predicates in Chinese. In Proc. of HLT-NAACL 2006.
    [28] Xue N.. Labeling Chinese predicates with semantic roles. Computational Linguistics, 2008, 34(2):225-255.
    [29] Li J., Zhou G., Zhao H., Zhu Q., Qian P.. Improving Nominal SRL in Chinese Language with Verbal SRL Information and Automatic Predicate Recognition[C]. Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, pages 1280–1288,Singapore, 6-7 August 2009. 2009 ACL and AFNLP.
    [30] Baker C. F., Fillmore C. J., Lowe J. B.. The Berkeley FrameNet Project. Proceedings of the ACL-Coling-1998. 1998:86–90.
    [31] Palmer M., Gildea D., Kingsbury P.. The Proposition Bank: An Annotated Corpus of Semantic Roles[J]. Computational Linguistics. 2005, 31(1):71–106.
    [32] Meyers A., Reeves R., Macleod C.. The Nombank Project: An Interim Report[C]. In Proceedings of HLT-NAACL 2004 Workshop: Frontiers in Corpus Annotation. Boston, Massachusetts, USA, 2004:24–31.
    [33] Xue N.. Annotating the predicate-argument structure of Chinese nominalizations [C]. Proc of the LREC 2006.
    [34] Vapnik V. N.. The Nature of Statistical Learning Theory[M]. Springer-Verlag, Berlin, 1995.
    [35] Rifkin R., Klautau A.. In Defense of One-vs-all Classification. J. Mach. Learn.Res. 2004, 5:101–141.
    [36]李军辉.中文句法语义分析及其联合学习机制研究[D].苏州:苏州大学,2010.
    [37] Collins M., Duffy N.. Convolution Kernels for Natural Language. Proceedings of NIPS-2001.
    [38]孙建涛,郭崇慧,陆玉昌,等.多项式核支持向量机文本分类器泛化性能分析[J].计算机研究与发展, 2004, 41(8):1321-1326.
    [39]王红玲.基于特征向量的中英文语义角色标注研究[D].苏州:苏州大学,2009.
    [40] Xue N., Palmer M.. Automatic semantic role labeling for Chinese verbs. In Proc. ofIJCAI 2005.
    [41]车万翔,刘挺,李生.浅层语义分析.自然语言理解与大规模内容计算[M].北京:清华大学出版社,2005.
    [42] Marquez L., Comas P., Gimenez J., Catal`a N.. Semantic Role Labeling as Sequential Tagging. In Proceedings of CoNLL-2005. Ann Arbor, Michigan, 2005:193-196.
    [43] Ngai G., Wu D., CarpuaM. t, Carpuat M., Wang C. , Wang C.. Semantic Role Labeling with Boosting, Svms, Maximum Entropy, Snow, and Decision Lists. In the Proceedings of Senseval-3: Third International Workshop on the Evaluation of Systems for the Semantic Analysis of Text. Barcelona, Spain, 2004:183-186.
    [44] Haghighi A., Toutanova K., Manning C. A joint model for semantic role labeling[C]. In Proc. of CoNLL-2005, 2005:173-176.
    [45]陈耀东,王挺,陈火旺.半监督学习和主动学习相结合的浅层语义分析[J].中文信息学报. 2008, 22(2):70–75.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700