汉语框架语义角色的自动标注技术研究

英文题名：Research on Techniques of Automatic Sematic Role Labeling of Chinese FrameNet
作者：李济洪
论文级别：博士
学科专业名称：计算机应用技术
中文关键词：汉语框架语义 ; 语义角色标注 ; 正交表 ; 特征选择 ; 条件随机场
英文关键词：Chinese FrameNet ; semantic role labeling ; orthogonal array ; features selection ; conditional random fields
学位年度：2010
导师：梁吉业 ; 刘开瑛
学科代码：081203
学位授予单位：山西大学
论文提交日期：2010-06-01

摘要

为了给大规模的汉语框架语义资源的构建提供一个自动标注工具,本文基于山西大学自主开发的汉语框架语义知识库(CFN),在给定句子中的一个目标词及目标词所属框架情形下,将其语义角色(框架元素)的自动标注问题通过IOB策略转化为整个句子上的词序列标注问题,使用条件随机场模型(CRF),采用统计学中的正交表实验方案,研究了汉语框架语义角色的自动标注模型。
     本文的全部实验语料使用的是现有的CFN中选出25个框架的6692个例句。将语料均匀分为4份,分3组作2-fold交叉验证,以3组交叉验证的平均F1-值作为系统性能评价指标。本文给出了系统性能评价指标的方差估计,以及两个标注系统性能差异的显著性检验方法。
     本文以词为基本标注单元,将标注步骤分为1)边界识别、2)角色分类、3)后处理三个步骤。分别采用了边界识别与角色分类一起进行,以及先边界识别,再角色分类两种标注策略。在后处理步骤上,对输出的标注序列要求在整个句子上满足IOB序列合法性约束,并以所有合法序列中概率最大的序列作为最后的标注输出。
     本文总共提取了26个特征,对每个特征设定若干可选的窗口,组合构成CRF模型的各种特征模板。为了选出较好的特征模板,本文基于统计学中的正交表给出了一种模板选优方法,并采用三种方案进行了实验。方案一：基于11个词层面特征,其特征包括词、词性、词相对于目标词的位置、目标词等,实验选用正交表L32(49×24)；方案二：基于全部的26个特征,包括11个词层面的特征和基本块的句法标记、结构标记等15个特征,选用正交表L54(21×325)。其中基本块特征提取使用的是清华大学周强的自动分析器；方案三：分批正交表实验,即先用正交表L32(49×24),在11个词层面特征选出的最好模板基础上,再加入15个基本块特征,使用正交表L54(21×325),通过适当选择正交表的水平以确保性能不低于前一批实验结果。对每种方案的实验进行了详细分析。
     本文对正交表模板选优方法与传统的基于贪心算法的方法进行了比较。也比较了本文的基于词序列标注方法和采用完全句法分析树的方法,也对选用不同标注模型,如支持向量机(SVM)模型和最大熵模型的实验结果进行了比较。
     实验结果表明：
     (1)在基于11个词层面特征上(方案一),最好结果(平均F1-值)达到61.61%,比基于完全句法分析树,将角色标注看做句法成分的分类问题的结果显著高。与传统的贪心算法特征选择方法比较,本文的正交表模板选择方法与其在标注性能上没有显著差异,但正交表方法的计算更简单,且在通用模板的选择上更适宜。
     (2)加入15个基本块特征(方案二)可以显著提高标注模型的性能。这类特征主要对角色分类有显著作用,对角色的边界识别作用不显著。
     (3)分批正交表实验(方案三)比实验方案二在性能上有显著提高。
     (4)每个框架训练一个模型,边界识别与角色分类一起进行,与先边界识别,再角色分类两个步骤在标注性能上没有显著差别,但由前者得到的标注性能有较小的方差。
     (5)基于条件随机场标注模型(CRF)与基于支持向量机(SVM)模型的标注结果没有显著差异,但显著好于基于最大熵(ME)模型的标注结果。
     (6)在全部25个框架的所有实验中,语义角色边界识别最好的结果(平均F1-值)为71.68%；在给定语义角色边界下,角色分类的最好结果(平均精确率)为84.08%；在给定句子中的目标词以及目标词所属的框架情况下,最好结果(平均F1-值)达到63.26%.
     本文的创新之处主要是首次系统地研究汉语框架语义角色的自动标注模型,给出了一种采用正交表的模板选优方法,在计算上,该方法比基于贪心算法的模板选择方法更简单。对于一般的序列标注中的特征选择问题,本文的正交表特征模板选优法也适用。在标注性能上,本文的结果优于基于句法分析树的语义角色标注的结果。
In order to provide an automatic labeling tools for developing a large-scale resource of Chinese FrameNet, based on the semantic knowledge base of Chinese FrameNet(CFN) self-developed by Shanxi University, this paper study the semantic role(frame element) automatic labeling for given a target word in a sentence and its known frame name.The task of semantic role automatic labeling is conversed into sequential tagging problem at word-level within the entire sentence by IOB strategy. The conditional random fields model (CRF), and the orthogonal array experiment in statistics are employed.
     The experimental corpus in the paper, selected from current CFN corpus, include 6692 annotated sentences of 25 frames. The corpus is uniformly divided into four parts. Therefore,2-fold cross validation experimrnt can be engaged in the three different groups. We take the cross-validation average F1-value on three groups as the system performance measure. This paper presents the estimator of variance of the system performance measure as well as the significant test method for two different labeling system.
     Using word as the basic tagging unit, the tagging procedure is divided into three steps:1)identification,2) classification,3)post-processing. The two IOB strategies are adopted, one is conjunction of identifying and classifying, and the other is firstly identifying then classifying. In post-processing step, the final output of the sequential labels is choosen by the largest probability of all labels with a logical IOB sequence in entire sentence.
     This paper totally extract 26 features, and for each feature set some optional windows. The combination of various features with different windows form the feature templates of CRF model.The best template selection method is given based on orthogonal array in statistics.The three schemes of experiment are adopted. Scheme I: based on 11 word-level features, including word, POS of word, position of word relative to the target word, and the target word etc, the experiment is arranged in orthogonal array L32 (49×24); SchemeⅡ:based on all 26 features, including 11 word-level features and 15 base chunk features about shallow syntax, arranged in orthogonal array L54 (21×325).The base chunk features are automaticly extracted by automatic analyzer of Tsinghua University Zhou; Scheme III:batch orthogonal array experiment, i.e. first using the orthogonal array L32 (49 x 24) on 11 words-level features acquire the best templates, and then join the 15 base chunk features into orthogonal array L54 (21×325). Through the appropriate selection of the levels in orthogonal array L54 (21×325), it ensure that performance measure is not lower than the previous results.Each experiment conduct a detailed analysis.
     The paper compares the template selection method upon orthogonal array with traditional greedy algorithm, and compares the sequential tagging method at word-level with method upon syntactic parses tree, and also compares with different tagging models, such as support vector machine (SVM) model and maximum entropy(ME) model.
     Experimental results show:
     1)Based on 11 word-level features (schemeⅠ), the best average F1-value reach 61.61%.The result is significantly higher than method upon syntactic parses tree which regard the role labeling as a classification of syntactic constituents.On template selection, comparing with traditional greedy algorithm, the two methods dose not have significant differences, but orthogonal array method is relatively sample in calculation, and has some advantage in choice of the general template for any frame.
     2) Adding the 15 base chunk features (schemeⅡ) can significantly improve the performance. These features mainly have significant effects on role classification, not significant on role identification.
     3) Batch orthogonal array experiment (schemeⅢ) has a significant higher performance than the schemeⅡ.
     4) Two IOB stratagies, i.e. each frame training a model with role identification and classification together, or the firstly identifying then classifying, are no significant difference in performance, but the former has less variance of performance measure.
     5) The experimental results have no significant difference in annotation upon the conditional random fields model (CRF) and upon support vector machine (SVM) model, but CRF model is significantly better than a maximum entropy (ME) model.
     6) In total 25 frames of all experiments, for semantic role identification the best average F1-value reach 71.68%. Given semantic role boundary, for the role classification the best average accuracy achieve 84.08%.Given target word in the sentence and its known frame name, the best average F1-value obtain 63.26%. Main innovations in the paper is the first systematic studies of Chinese FrameNet semantic role of automatic labeling, and proposes to adopt orthogonal array to select template. The method is more simple than the greedy algorithm template selection method. For feature selection problems in general sequential annotation, the orthogonal array feature template selection method is also applicable.For labeling performance, this paper's results is better than those based on syntactic parses tree.

引文

[1]Baker CF, Fillmore CJ, Lowe JB. The Berkeley FrameNet project. In:Morgan K ed. Proceedings of the COLING-ACL'98.Montreal:ACL,1998,86-90.
    [2]You L, Liu K. Building Chinese FrameNet Database. In:Ren FJ, Zhong YX eds. Proceedings of IEEE NLP-KE'05.Wuhan:IEEE,2005.301-306.
    [3]Litkowski KC. Senseval-3 task automatic labeling of semantic roles. In:Mihalcea R, Edmonds P eds. Third International Workshop on the Evaluation of Systems for the Semantic Analysis of Text, Barcelona:ACL,2004,9-12.
    [4]Baker CF, Ellsworth M, Erk K. SemEval'07 Task 19:Frame semantic structure extraction. In:Agirre E, Marquez L, Wicentowski R eds. Proceedings of the 4th International Workshop on Semantic Evaluations. Prague:ACL,2007.99-104
    [5]Carreras X, Marques L. Introduction to the CoNLL-2004 shared task:semantic role labeling. In:Ng HT, Riloff E, eds. Proceedings of the CoNLL 2004, Boston:ACL,2004. 89-97.
    [6]Carreras X, Marques L. Introduction to the CoNLL-2005 shared task:semantic role labeling. In:Knight K, Ng HT, Oflazer K, eds.Proceedings of the CoNLL 2005.Ann Arbor: ACL,2005.152-164.
    [7]Surdeanu M, Johansson R, Meyers A, Marquez L, Nivre J. The CoNLL 2008 shared task on joint parsing of syntactic and semantic dependencies. In:Clark A,Toutanova K eds. Proceedings of CoNLL-2008. Manchester:ACL,2008.159-177
    [8]Hajic J, Ciaramita M, Johansson R, Kawahara D, Marti MA, Marquez L, Meyers A, Nivre J, Pado S, Stepdnek J, Stranak P, Surdeanu M, Xue NW, Zhang Y. The CoNLL-2009 shared task:syntactic and semantic dependencies in multiple languages. In:Stevenson S, Carreras X eds. Proceedings of CoNLL-2009. Boulder:ACL,2009.1-18
    [9]Chen J, Rambow O. Use of deep linguistic features for the recognition and labeling of semantic arguments. In:Hinrichs EW,Roth D, eds. Proceedings of EMNLP-2003,Sapporo: ACL,2003.41-48
    [10]Thompson CA, Levy R, Manning CD. A generative model for semantic role labeling. In: Lavrac N, Gamberger D, Todorovski L, Blockeel H eds. Proceedings of ECML-2003. Croatia:Springer,2003.235-238
    [11]Hacioglu K. Semantic role labeling using dependency trees. In:Nirenburg S, ed. Proceedings of COLING-2004, Geneva:COLING, 2004.1273-1276
    [12]Pradhan S, Hacioglu K, Krugler V, Ward W, Martin J, Jurafsky D.Support vector learning for semantic argument classification. Machine Learning,2005,60(1):11-39.
    [13]Pradhan S, Ward W, Martin JH. Towards robust semantic role. Computational Linguistics, 2008,34(2):289-310.
    [14]Cohn T, Blunsom P. Semantic role labeling with tree conditional random fields. In:Knight K, Ng HT, Oflazer K, eds. Proceedings of CoNLL 2005.Ann Arbor:ACL,2005,169-172
    [15]Surdeanu M, Marquez L, Carreras X, Comas PR. Combination strategies for semantic role labeling, Journal of Artificial Intelligence Research,2007,29:105-151.
    [16]董静,孙乐,吕元华,冯元勇.基于线性链条件随机场模型的语义角色标注.中国中文信息学会二十五周年学术会议,2006.32-37.
    [17]Yu JD, Fan X, Pang W,Yu Z, Semantic role labeling based on conditional random fields, Journal of Southeast University(English Edition),2007,23(3):361-364
    [18]刘挺,车万翔,李生.基于最大熵分类器的语义角色标注.软件学报,2007,18(3)：565-573.
    [19]Che WX, Li ZH, Hu YX, Li YQ, Qin B, Liu T, Li S. A cascaded syntactic and semantic dependency parsing system. In:Clark A, Toutanova K eds. Proceedings of the CoNLL-2008.Manchester:ACL,2008.238-242.
    [20]Che WX, Li ZH, Li YQ, Guo YH, Qin B, Liu T. Multilingual dependency-based syntactic and semantic parsing. In:Stevenson S, Carreras X eds.Proceedings of CoNLL-2009. Boulder:ACL,2009.49-54
    [21]Zhao H, Chen WL, Kit C, Zhou GD. Multilingual dependency learning:A Huge Feature Engineering Method to Semantic Dependency Parsing. In:Stevenson S, Carreras X eds. Proceedings of CoNLL-2009. Boulder:ACL,2009.55-60
    [22]Bjorkelund A, Hafdell L, Nugues P. Multilingual semantic role labeling. In:Stevenson S, Carreras X eds. Proceedings of CoNLL-2009. Boulder:ACL,2009.43-48
    [23]Gildea D, Jurafsky D. Automatic labeling of semantic roles, Computational Linguistics, 2002,28(3):245-288.
    [24]Sun HL,Jurafsky D. Shallow semantic parsing of Chinese. In:Hirschberg JB ed. Proceedings of NAACL-HLT 2004. Boston:ACL,2004.249-256
    [25]Xue NW, Palmer M. Automatic semantic role labeling for Chinese verbs. In:Bramer M ed. Proceedings of the Nineteenth International Joint Conference on Artificial Intelligence. Edinburgh:IJCAI,2005.1161-1165
    [26]Xue NW. Labeling Chinese predicates with semantic roles. Computational Linguistics, 2008,34(2):225-255.
    [27]袁毓林.语义资源建设的最新趋势和长远目标.中文信息学报,2008,22(3)：3-15.
    [28]刘鸣洋,由丽萍.汉语感知词语的语义角色标注规则初探.内容计算的研究与应用前沿.2007：320-325.
    [29]刘开瑛,陈雪艳,李济洪.汉语框架元素自动标注实验报告.第四届全国信息检索与内容安全学术会议,2008,1：48-55.
    [30]Marquez L, Carreras X, Litkowski KC, Stevenson S.Semantic role labeling:an introduction to the special issue. Computational Linguistics,2008,34(2):145-159.
    [31]Ramshaw LA, Marcus MP. Text chunking using transformation-based learning. In: Yarowsky D, Church K eds. Proceedings of the 3rd Workshop on Very Large Corpora. Cambridge:ACL,1995.88-94.
    [32]Lafferty J, McCallum A, Pereira F. Conditional random fields:probabilistic models for segmenting and labeling sequence data. In:Brodley CE, Danyluk AP Eds. Proceedings of the 18th International Conf. on Machine Learning. Williamstown:Morgan Kaufmann,2001. 282-289.
    [33]丁伟伟,常宝宝.基于最大熵原则的汉语语义角色分类.中文信息学报,2008,22(6)：20-27.
    [34]中国现场统计研究会三次设计组.可计算性项目的三次设计.北京大学出版社,1985.165-165
    [35]Taku Kudo.CRF++Tools Package:http://crfpp.sourceforge.net/, version:5.0,2007.
    [36]Bill MacCartney. The Stanford Parser:http://nlp.stanford.edu/software/lex-parser.shtml, version:1.6,2007.
    [37]Levy R, Manning C. Is it harder to parse Chinese, or the Chinese Treebank?. In:Tsujii JI ed. Proceedings of the 41st Annual Meeting on Association for Computational Linguistics-Volume 1.Sapporo:ACL,2003.439-446.
    [38]Nadeau C,Bengio Y Inference for the generalization error. Machine Learning,52: 239-281,2003.
    [39]Vasin Punyakanok, Dan Roth, Wen-tau Yih. The Importance of Syntactic Parsing and Inference in Semantic Role Labeling. Computational Linguistics,2008, 34(2):256-287.
    [40]周强.基于规则的汉语基本块自动分析器[C].第七届中文信息处理国际会议论文集,2007：137-142.
    [41]Ethem Alpaydin. Combined 5 x 2 cv F test for comparing supervised classification learning algorithms.Neur. Comp.1999,11(8):1885-1892.
    [42]Yoshua Bengio and Yves Grandvalet. No unbiased estimator of the variance of K-fold cross-validation. J. Mach. Learn. Res.,5:1089-1105(electronic),2004.
    [43]Prabir Burman. A comparative study of ordinary cross-validation, v-fold cross-validation and the repeated learning-testing methods.Biometrika, 76(3):503-514,1989.
    [44]Vasin Punyakanok, Dan Roth and Wen-tau Yih. The Importance of Syntactic Parsing and Inference in Semantic Role Labeling. Computational Linguistics. 2008,34(2),257-287
    [45]Trevor Hastie, Robert Tibshirani, and Jerome Friedman. The elements of statistical learning. Springer Series in Statistics.Springer-Verlag, New York,2001. Data mining, inference, and prediction.
    [46]AnnetteM. Molinaro,Richard Simon, and RuthM. Pfeiffer. Prediction error estimation: a comparison of resampling methods. Bioinformatics,21(15):3301-3307,2005.
    [47]Jun Shao.Linear model selection by cross-validation. J.Amer. Statist. Assoc., 88(422):486-494,1993.
    [48]M. Stone. Cross-validatory choice and assessment of statistical predictions.J. Roy. Statist. Soc. Ser. B,36:111-147,1974.With discussion by G. A. Barnard, A. C. Atkinson, L. K. Chan, A. P. Dawid,F. Downton, J. Dickey, A.G. Baker, O. Barndorff-Nielsen, D.R. Cox, S.Giesser, D.Hinkley, R. R.Hocking, and A. S.Young, and with a reply by the authors.
    [49]Mark J. van der Laan, Sandrine Dudoit, and Sunduz Keles.Asymptotic optimality of likelihood-based cross-validation. Stat. Appl.Genet. Mol.Biol.,3:Art.4,27 pp. (electronic),2004.
    [50]Yuhong Yang. Comparing learning methods for classification. Statist. Sinica, 16(2):635-657,2006.
    [51]Yuhong Yang.Consistency of cross validation for comparing regression procedures. Annals of Statistics,2007,
    [52]Ping Zhang. Model selection via multifold cross validation. Ann. Statist.,21(1): 299-313,1993.
    [53]G. M. James.Variance and bias for general loss functions.Machine Learning,51: 115-135,2003.
    [54]M. Kearns.A bound on the error of cross validation with consequences for the training-test split. In Advances in Neural Information Processing Systems,8:183-189, 1996.
    [55]B.Efron. The Estimation of Prediction Error:Covariance Penalties and Cross-Validation. Journal of the American Statistical Association,99:619-632,2004.
    [56]R. Kohavi.A study of cross-validation and bootstrap for accuracy estimation and model selection.In Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence, pages 1137-1143,1995.
    [57]E. Alpaydin. Combined 5×2 cv F test for comparing supervised classification learning algorithms.Neural Computation,11(8):1885-1892,1999.
    [58]Prabir Burman. A Comparative Study of Ordinary Cross-Validation, v-Fold Cross-Validation and the Repeated Learning-Testing Methods.Biometrika,1989. 76(3):503-514.
    [59]王珏.机器学习与人工智能.机器学习及其应用2009清华大学出版社2009,1-32.
    [60]茆诗松主编统计手册科学出版社2003
    [61]Zhang Le. MaxEnt toolkit:http://homepages.inf.ed.ac.uk/lzhang10/maxent_toolkit.html, version:20061005,2006.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700