快速精确的结构化机器学习方法研究

设为首页

收藏本站

网站地图 | English | 公务邮箱

NSTL服务站

快速精确的结构化机器学习方法研究

详细信息本馆镜像全文| 推荐本文 | | 获取CNKI官网全文

英文题名：Research on Fast Exact Structured Learning
作者：钱线
论文级别：博士
学科专业名称：计算机应用技术
中文关键词：结构化机器学习 ; 条件随机场 ; 稀疏高阶的条件随机场模型 ; 两维Trie结构 ; 序列标注 ; 依存句法分析
英文关键词：Structured learning ; Conditional Random Fields ; Sparse Higher Order Conditional Random Fields ; 2D Trie ; sequence labeling ; dependency parsing
学位年度：2010
导师：吴立德
学科代码：081203
学位授予单位：复旦大学
论文提交日期：2010-04-12

摘要

相比于普通的机器学习算法,结构化机器学习可以利用结构信息达到更好的效果,但其时间复杂度要高很多,虽然有快速的近似解法,但精度的损失一定程度上抵消了结构信息带来的好处,因此研究快速精确的结构化机器学习算法成了一个重要的课题。
     本文中,我们对结构化机器学习中的推断算法以及特征抽取两个重要环节进行改进。首先,我们针对序列标注问题,基于许多实际应用中高阶特征信息的稀疏性特点,提出了稀疏高阶的条件随机场模型和一种新的快速精确的推断算法,它可以同时处理局部特征和稀疏的高阶特征。由于稀疏性的存在,这种新的推断算法是十分高效的。在手写体识别任务上,我们采用词缀特征作为高阶特征,稀疏高阶的条件随机场模型达到了所有公开的实验结果中最高的精度。在中文组织机构名识别任务上,我们将人工抽取的规则转化为高阶特征,并取得了微软亚洲研究院数据集上第二名的成绩。这两个实验表明,在特征集相同的情况下,稀疏高阶的条件随机场模型明显优于其他的方法。
     其次,我们提出了一种新的特征字符串索引结构以加速特征抽取,从而缩短解码时间。现在许多结构化机器学习方法采用模板生成数以百万千万的特征。复杂的模板可以产生大量复杂的特征,从而提高了精度,但却需要更多特征抽取的时间,大大影响了解码速度。为此,我们提出了两维的Trie结构,该结构可以利用模板之间的相互关系提高特征抽取的速度：一个模板生成的特征字符串是它的扩展模板生成的特征字符串的前缀,因此前一个特征字符串的索引号可以用来检索后一个特征字符串,从而节约了时间。我们将这种新的数据结构用在基于图模型的依存句法分析的任务上。在中文宾州树库上的实验表明,两维Trie的特征抽取速度是传统Trie的5倍,整个句法分析的解码速度是后者的4.3倍。
Structured learning models owe a great part of their success to the ability in using structured information. However, these methods are more time consuming than non-structured learning models. Though approximate algorithms reduce the computational complexity, they degrade the accuracy to some extent. Therefore, exploring fast extract algorithms has important role in structured learning.
     In this paper, we improve two aspects of structured learning:inference and feature extraction. First, for sequence labeling tasks, we proposed sparse higher order Conditional Random Fields(SHO-CRFs) based on the characteristics of sparseness of higher order features in many real applications, together with a novel extract tractable inference algorithm which is able to deal with local and sparse higher order features. SHO-CRFs are practically very efficient due to fea-ture sparseness. In optical character recognition task, we use word affixes as higher order features, SHO-CRFs achieve the highest reported accuracy. In Chi-nese organization name recognition task, we achieve the second highest F1 score on Microsoft Research Asia corpus. Both experimental results show that, with the same feature set, SHO-CRFs significantly outperform other state of the art methods.
     Second, we propose a novel feature string indexing structure for fast feature extraction to reduce decoding time. Many modern structured learning methods adopt templates to generate millions features. Complicated templates bring about abundant features which lead to higher accuracy but more feature extraction time. We proposed two-dimensional Trie(2D Trie), a novel data structure which takes advantages of relationship between templates for fast feature extraction:feature strings generated by a template are prefixes of the features from its extended templates. Experimental results on Chinese Tree Bank 6 corpus show that,2D Trie is about 5 times faster than traditional Trie, leading parsing time 4.3 times faster.

引文

[1]周雅倩,黄萱菁,吴立德.一种特征匹配方法：稀疏特征树.Journal of Software,17(5):1026-1033,2006.
    [2]王思力,张华平,王斌.双数组trie树算法优化及其应用研究.中文信息学报,20(5)：24-30,2006.
    [3]Yasemin Altun, Ioannis Tsochantaridis, and Thomas Hofmann. Hidden markov support vector machines. In ICML, pages 3-10,2003.
    [4]Galen Andrew. A hybrid markov/semi-markov conditional random field for sequence segmentation. In EMNLP'06:Proceedings of the 2006 Confer-ence on Empirical Methods in Natural Language Processing, pages 465-472, Morristown, NJ, USA,2006. Association for Computational Linguistics.
    [5]Jun'ichi Aoe. An efficient digital search algorithm by using a double-array structure. IEEE Transactions on software andengineering,15(9):1066-1077, 1989.
    [6]Pranjal Awasthi, Aakanksha Gagrani, and Balaraman Ravindran. Image modeling using tree structured conditional random fields. In IJCAI, pages 2060-2065,2007.
    [7]Peter L. Bartlett, Michael Collins, Benjamin Taskar, and David A-McAllester. Exponentiated gradient algorithms for large-margin structured classification. In NIPS,2004.
    [8]Adam L. Berger, Stephen A. Della Pietra, and Vincent J. Della Pietra. A maximum entropy approach to natural language processing. Computational Linguistics,22(1):39-71,1996.
    [9]Bernd Bohnet. Efficient parsing of syntactic and semantic dependency struc-tures. In Proceedings of the Thirteenth Conference on Computational Natu-ral Language Learning (CoNLL 2009):Shared Task, pages 67-72, Boulder, Colorado, June 2009. Association for Computational Linguistics.
    [10]Matthew Brand and Patrick Pletscher. A conditional random field for au-tomatic photo editing. In CVPR,2008.
    [11]Haibin Cheng, Ruofei Zhang, Yefei Peng, Jianchang Mao, and Pang-Ning Tan. Maximum margin active learning for sequence labeling with different length. In ICDM, pages 345-359,2008.
    [12]Michael Collins. Discriminative training methods for hidden markov mod-els:Theory and experiments with perceptron algorithms. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1-8, Philadelphia, July 2002. Association for Computa-tional Linguistics.
    [13]Michael Collins. Ranking algorithms for named entity extraction:Boost-ing and the voted perceptron. In Proceedings of 40th Annual Meeting of the Association for Computational Linguistics, pages 489-496, Philadelphia, Pennsylvania, USA, July 2002. Association for Computational Linguistics.
    [14]Michael Collins, Amir Globerson, Terry Koo, Xavier Carreras, and Peter L. Bartlett. Exponentiated gradient algorithms for Conditional Random Fields and max-margin markov networks. J. Mach. Learn. Res.,9:1775-1822,2008.
    [15]Koby Crammer, Ofer Dekel, Joseph Keshet, Shai Shalev-Shwartz, and Yoram Singer. Online passive-aggressive algorithms. Journal of Machine Learning Research,7:551-585,2006.
    [16]Hal Daume Ⅲ. Practical Structured Learning Techniques for Natural Lan-guage Processing. PhD thesis, University of Southern California, Los Ange-les, CA, August 2006.
    [17]A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum likelihood from in-complete data via the em algorithm. Journal of the Royal Statistical Society, series B,39(1):1-38,1977.
    [18]Shilin Ding, Gao Cong, Chin-Yew Lin, and Xiaoyan Zhu. Using Conditional Random Fields to extract contexts and answers of questions from online forums. In Proceedings of ACL-08:HLT, pages 710-718, Columbus, Ohio, June 2008. Association for Computational Linguistics.
    [19]Mark Dredze, Koby Crammer, and Fernando Pereira. Confidence-weighted linear classification. In ICML, pages 264-271,2008.
    [20]Jenny Rose Finkel, Trond Grenager, and Christopher Manning. Incorpo-rating non-local information into information extraction systems by gibbs sampling. In ACL'05:Proceedings of the 43rd Annual Meeting on Associ-ation for Computational Linguistics, pages 363-370, Morristown, NJ, USA, 2005. Association for Computational Linguistics.
    [21]Thomas Finley and Thorsten Joachims. Supervised clustering with support vector machines. In ICML, pages 217-224,2005.
    [22]Thomas Finley and Thorsten Joachims. Training structural svms when exact inference is intractable. In ICML'08:Proceedings of the 25th international conference on Machine learning, pages 304-311, New York, NY, USA,2008. ACM.
    [23]Ugo Galassi, Attilio Giordana, and Lorenza Saitta. Structured hidden markov model:A general framework for modeling complex sequences. In AI*IA 2007:Artificial Intelligence and Human-Oriented Computing, volume 4733, pages 290-301,2007.
    [24]Michel Galley and Christopher D. Manning. Quadratic-time dependency parsing for machine translation. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, pages 773-781, Suntec, Singapore, August 2009. Association for Computational Linguistics.
    [25]Yoav Goldberg and Michael Elhadad. splitsvm:Fast, space-efficient, non-heuristic, polynomial kernel computation for nlp applications. In Proceedings of ACL-08:HLT, Short Papers, pages 237-240, Columbus, Ohio, June 2008. Association for Computational Linguistics.
    [26]Kazuo Hara, Masashi Shimbo, Hideharu Okuma, and Yuji Matsumoto. Co-ordinate structure analysis with global structural constraints and alignment-based local features. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, pages 967-975, Suntec, Singa-pore, August 2009. Association for Computational Linguistics.
    [27]Jingzhou He and Houfeng Wang. Chinese named entity recognition and word segmentation based on character. In Proceedings of Sixth Special Interest Group on Chinese Language Processing Workshop, pages 128-132,2008.
    [28]Richard E. Higgs, Kerry G. Bemis, Ian A. Watson, and James H. Wikel. Experimental designs for selecting molecules from large chemical databases. Journal of Chemical Information and Computer Sciences,37(5):861-870, 1997.
    [29]Tsutomu Hirao, Jun Suzuki, and Hideki Isozaki. A syntax-free approach to japanese sentence compression. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, pages 826-833, Suntec, Singapore, August 2009. Association for Computational Linguistics.
    [30]Liang Huang. Forest reranking:Discriminative parsing with non-local fea-tures. In Proceedings of ACL-08:HLT, pages 586-594, Columbus, Ohio, June 2008. Association for Computational Linguistics.
    [31]Pearl J. Probabilistic Reasoning in Intelligent Systems:Networks of Plausi-ble Inference. Morgan Kaufmann Publishers, Inc., San Mateo, CA.,1998.
    [32]Wenbin Jiang, Liang Huang, Qun Liu, and Yajuan Lu. A cascaded linear model for joint chinese word segmentation and part-of-speech tagging. In Proceedings of ACL-08:HLT, pages 897-904, Columbus, Ohio, June 2008. Association for Computational Linguistics.
    [33]Feng Jiao, Shaojun Wang, Chi-Hoon Lee, Russell Greiner, and Dale Schuur-mans. Semi-supervised Conditional Random Fields for improved sequence segmentation and labeling. In Proceedings of the 21st International Confer-ence on Computational Linguistics and 44th Annual Meeting of the Associ-ation for Computational Linguistics, pages 209-216, Sydney, Australia, July 2006. Association for Computational Linguistics.
    [34]Guangjin Jin and Xiao Chen. The fourth international chinese language processing bakeoff:Chinese word segmentation, named entity recognition and chinese pos tagging. In Proceedings of Sixth Special Interest Group on Chinese Language Processing Workshop, pages 69-81,2008.
    [35]Thorsten Joachims. Training linear svms in linear time. In KDD'06:Pro-ceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 217-226, New York, NY, USA,2006. ACM.
    [36]Thorsten Joachims, Thomas Finley, and Chun-Nam John Yu. Cutting-plane training of structural svms. Machine Learning,77(1):27-59, October 2009.
    [37]Jr. Joe H. Ward. Hierarchical grouping to optimize an objective function. Journal of American Statistical Association,58:236-244,1963.
    [38]Richard Johansson and Pierre Nugues. Dependency-based semantic role la-beling of propbank. In EMNLP'08:Proceedings of the Conference on Em-pirical Methods in Natural Language Processing, pages 69-78, Morristown, NJ, USA,2008. Association for Computational Linguistics.
    [39]R. Kassel. A Comparison of Approaches to On-line Handwritten Charac-ter Recognition. PhD thesis, MIT Spoken Language Systems Group, Los Angeles, CA,1995.
    [40]Leonard Kaufman and Peter J. Rousseeuw. Finding Groups in Data. An introduction to Cluster Analysis. John Wiley& Sons, Inc., Canada,1990.
    [41]Jun'ichi Kazama and Kentaro Torisawa. A new perceptron algorithm for sequence labeling with non-local features. In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Com-putational Natural Language Learning (EMNLP-CoNLL), pages 315-324, Prague, Czech Republic, June 2007. Association for Computational Lin-guistics.
    [42]Jun'ichi Kazama and Kentaro Torisawa. Inducing gazetteers for named entity recognition by large-scale clustering of dependency relations. In Pro-ceedings of ACL-08:HLT, pages 407-415, Columbus, Ohio, June 2008. As-sociation for Computational Linguistics.
    [43]Terry Koo, Amir Globerson, Xavier Carreras, and Michael Collins. Struc-tured prediction models via the matrix-tree theorem. In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Process-ing and Computational Natural Language Learning (EMNLP-CoNLL), pages 141-150, Prague, Czech Republic, June 2007. Association for Computational Linguistics.
    [44]Vijay Krishnan and Christopher D. Manning. An effective two-stage model for exploiting non-local dependencies in named entity recognition. In Pro-ceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, pages 1121-1128, Sydney, Australia, July 2006. Association for Computa-tional Linguistics.
    [45]Alex Kulesza and Fernando Pereira. Structured learning with approximate inference. In NIPS,2007.
    [46]Blake C. L. and Merz C. J. UCI machine learning repository,1998.
    [47]John D. Lafferty, Andrew McCallum, and Fernando C. N. Pereira. Con-ditional Random Fields:Probabilistic models for segmenting and labeling sequence data. In ICML, pages 282-289,2001.
    [48]John D. Lafferty, Xiaojin Zhu, and Yan Liu. Kernel Conditional Random Fields:representation and clique selection. In ICML,2004.
    [49]Xiangyang Lan, Stefan Roth, Daniel P. Huttenlocher, and Michael J. Black. Efficient belief propagation with learned higher-order markov random fields. In ECCV (2), pages 269-282,2006.
    [50]Chi-Hoon Lee, Shaojun Wang, Feng Jiao, Dale Schuurmans, and Russell Greiner. Learning to model spatial dependency:Semi-supervised discrimi-native random fields. In NIPS, pages 793-800,2006.
    [51]Wei Li and Andrew McCallum. Semi-supervised sequence modeling with syntactic topic models. In AAAI'05:Proceedings of the 20th national con-ference on Artificial intelligence, pages 813-818. AAAI Press,2005.
    [52]Xavier Lluis, Stefan Bott, and Lluis Marquez. A second-order joint eis-ner model for syntactic and semantic dependency parsing. In Proceedings of the Thirteenth Conference on Computational Natural Language Learning (CoNLL 2009):Shared Task, pages 79-84, Boulder, Colorado, June 2009. Association for Computational Linguistics.
    [53]Bruce Lowerre. The Harpy Speech Recognition System. PhD thesis, Carnegie Mellon University,1976.
    [54]Robert Malouf. A comparison of algorithms for maximum entropy parameter estimation. In Proceedings of CoNLL-2002, pages 49-55. Taipei, Taiwan, 2002.
    [55]Gideon S. Mann and Andrew McCallum. Generalized expectation criteria for semi-supervised learning of Conditional Random Fields. In Proceedings of ACL-08:HLT, pages 870-878, Columbus, Ohio, June 2008. Association for Computational Linguistics.
    [56]Xinnian Mao, Yuan Dong, Saike He, Sencheng Bao, and Haila Wang. Chi-nese word segmentation and named entity recognition based on Conditional Random Fields. In Proceedings of Sixth Special Interest Group on Chinese Language Processing Workshop, pages 90-93,2008.
    [57]Andre Martins, Noah Smith, and Eric Xing. Concise integer linear pro-gramming formulations for dependency parsing. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, pages 342-350, Suntec, Singapore, August 2009. Association for Computational Lin-guistics.
    [58]Andrew McCallum. Efficiently inducing features of Conditional Random Fields. In UAI, pages 403-410,2003.
    [59]Ryan McDonald, Fernando Pereira, Kiril Ribarov, and Jan Hajic. Non-projective dependency parsing using spanning tree algorithms. In Proceed-ings of Human Language Technology Conference and Conference on Em-pirical Methods in Natural Language Processing, pages 523-530, Vancouver, British Columbia, Canada, October 2005. Association for Computational Linguistics.
    [60]Ryan T. McDonald, Koby Crammer, and Fernando C. N. Pereira. Online large-margin training of dependency parsers. In ACL,2005.
    [61]Kevin P. Murphy, Yair Weiss, and Michael I. Jordan. Loopy belief propaga-tion for approximate inference:An empirical study. In UAI, pages 467-475, 1999.
    [62]Andrew Y. Ng. Feature selection,l1 vs.l2 regularization, and rotational invariance. In In ICML,2004.
    [63]Nam Nguyen and Yunsong Guo. Comparisons of sequence labeling algo-rithms and extensions. In ICML, pages 681-688,2007.
    [64]Joakim Nivre. An efficient algorithm for projective dependency parsing. In Proceedings of the 11th International Conference on Parsing Techniques, pages 149-160,2003.
    [65]Joakim Nivre. Non-projective dependency parsing in expected linear time. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, pages 351-359, Suntec, Singapore, August 2009. Association for Computational Linguistics.
    [66]J. Nocedal. Updating quasi-newton matrices with limited storage. Mathe-matics of Computation,35:5773-782,1980.
    [67]Chris Pal, Charles Sutton, and Andrew McCallum. Sparse forward-backward using minimum divergence beams for fast training of Conditional Random Fields. In International Conference on Acoustics, Speech, and Signal Pro-cessing (ICASSP),2006.
    [68]Martha Palmer and Nianwen Xue. Adding semantic roles to the Chinese Treebank. Natural Language Engineering,15(1):143-172,2009.
    [69]Fuchun Peng, Fangfang Feng, and Andrew McCallum. Chinese segmentation and new word detection using Conditional Random Fields. In Proceedings of Coling 2004, pages 562-568, Geneva, Switzerland, Aug 23-Aug 27 2004. COLING.
    [70]David Pinto, Andrew McCallum, Xing Wei, and W. Bruce Croft. Table extraction using Conditional Random Fields. In SIGIR, pages 235-242, 2003.
    [71]Lawrence R. Rabiner. A tutorial on hidden markov models and selected applications in Speech Recognition. pages 267-296,1990.
    [72]Nathan D. Ratliff. (online) subgradient methods for structured prediction.
    [73]Nathan D. Ratliff, J. Andrew Bagnell, and Martin Zinkevich. Subgradient methods for maximum margin structured learning. In Workshop on Learning in Structured Outputs Spaces at ICML,2006.
    [74]Sebastian Riedel and James Clarke. Incremental integer linear programming for non-projective dependency parsing. In Proceedings of the 2006 Confer-ence on Empirical Methods in Natural Language Processing (EMNLP-2006), pages 129-137, Sydney, Australia,2006.
    [75]Frank Rosenblatt. The perceptron:A probabilistic model for information storage and organization in the brain. Psychological Review,65(6):358-408, 1958.
    [76]Dan Roth and Wen tau Yih. Integer linear programming inference for con-ditional random fields. In ICML, pages 736-743,2005.
    [77]Sunita Sarawagi and William W. Cohen. Semi-markov Conditional Random Fields for information extraction. In NIPS,2004.
    [78]Burr Settles and Mark Craven. An analysis of active learning strategies for sequence labeling tasks. In Proceedings of the Conference on Empiri-cal Methods in Natural Language Processing (EMNLP), page 1070-1079, Honolulu, October 2008. Association for Computational Linguistics.
    [79]Fei Sha and Fernando C. N. Pereira. Shallow parsing with Conditional Ran-dom Fields. In HLT-NAACL,2003.
    [80]Dou Shen, Jian-Tao Sun, Hua Li, Qiang Yang, and Zheng Chen. Document summarization using Conditional Random Fields. In IJCAI, pages 2862-2867,2007.
    [81]Yanxin Shi and Mengqiu Wang. A dual-layer crfs based joint decoding method for cascaded segmentation and labeling tasks. In IJCAI, pages 1707-1712,2007.
    [82]Masashi Shimbo and Kazuo Hara. A discriminative learning model for co-ordinate conjunctions. In Proceedings of the 2007 Joint Conference on Em-pirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), pages 610-619, Prague, Czech Re-public, June 2007. Association for Computational Linguistics.
    [83]N. Z. Shor, Krzysztof C. Kiwiel, and Andrzej Ruszcaynski. Minimization methods for non-differentiable functions. Springer-Verlag New York, Inc. New York, NY, USA,1985.
    [84]David A. Smith and Jason Eisner. Dependency parsing by belief propaga-tion. In EMNLP'08:Proceedings of the Conference on Empirical Methods in Natural Language Processing, pages 145-156, Morristown, NJ, USA,2008. Association for Computational Linguistics.
    [85]Michael Snarey, Nicholas K. Terrett, Peter Willett, and David J. Wilton. Comparison of algorithms for dissimilarity-based compound selection. Jour-nal of Molecular Graphics and Modelling,15(6):372-385, December 1997.
    [86]Charles Sutton, Andrew McCallum, and Khashayar Rohanimanesh. Dy-namic Conditional Random Fields:Factorized probabilistic models for la-beling and segmenting sequence data. J. Mach. Learn. Res.,8:693-723, 2007.
    [87]Charles A. Sutton and Andrew McCallum. Piecewise training for undirected models. In UAI, pages 568-575,2005.
    [88]Charles A. Sutton and Andrew McCallum. Piecewise pseudolikelihood for efficient training of conditional random fields. In ICML, pages 863-870, 2007.
    [89]Jun Suzuki and Hideki Isozaki. Semi-supervised sequential labeling and seg-mentation using giga-word scale unlabeled data. In Proceedings of ACL-08: HLT, pages 665-673, Columbus, Ohio, June 2008. Association for Compu-tational Linguistics.
    [90]B. Taskar, C. Guestrin, and D. Koller. Max-margin markov networks. In Advances in Neural Information Processing Systems, volume 16, pages 25-32,2003.
    [91]Ben Taskar. Learning Structured Prediction Models:A Large Margin Ap-proach. PhD thesis, Stanford University,2004.
    [92]Ben Taskar, Dan Klein, Mike Collins, Daphne Koller, and Christopher Man-ning. Max-margin parsing. In Dekang Lin and Dekai Wu, editors, Proceed-ings of EMNLP 2004, pages 1-8, Barcelona, Spain, July 2004. Association for Computational Linguistics.
    [93]Benjamin Taskar, Pieter Abbeel, and Daphne Koller. Discriminative prob-abilistic models for relational data. In UAI, pages 485-492,2002.
    [94]Benjamin Taskar, Simon Lacoste-Julien, and Michael I. Jordan. Structured prediction, dual extragradient and bregman projections. Journal of Machine Learning Research,7:1627-1653,2006.
    [95]Katrin Tomanek and Udo Hahn. Semi-supervised active learning for se-quence labeling. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, pages 1039-1047, Suntec, Singapore, August 2009. Association for Computational Linguistics.
    [96]Richard Tzong-Han Tsai, Hsieh-Chuan Hung, Cheng-Lung Sung, Hong-Jie Dai, and Wen-Lian Hsu. On closed task of chinese word segmentation:An improved crf model coupled with character clustering and automatically gen-erated template matching. In Proceedings of the Fifth SIGHAN Workshop on Chinese Language Processing, pages 134-137, Sydney, Australia, July 2006. Association for Computational Linguistics.
    [97]Ioannis Tsochantaridis, Thomas Hofmann, Thorsten Joachims, and Yasemin Altun. Support vector machine learning for interdependent and structured output spaces. In ICML,2004.
    [98]Ioannis Tsochantaridis, Thorsten Joachims, Thomas Hofmann, and Yasemin Altun. Large margin methods for structured and interdependent output variables. J. Mach. Learn. Res.,6:1453-1484,2005.
    [99]Vladimir N. Vapnik. The nature of statistical learning theory. Springer-Verlag New York, Inc., New York, NY, USA,1995.
    [100]S. V. N. Vishwanathan, Nicol N. Schraudolph, Mark W. Schmidt, and Kevin P. Murphy. Accelerated training of Conditional Random Fields with stochas-tic gradient methods. In ICML, pages 969-976,2006.
    [101]Qin Iris Wang, Dale Schuurmans, and Dekang Lin. Semi-supervised convex training for dependency parsing. In Proceedings of ACL-08:HLT, pages 532-540, Columbus, Ohio, June 2008. Association for Computational Linguistics.
    [102]Fan Yang, Jun Zhao, and Bo Zou. CRFs-based named entity recognition incorporated with heuristic entity list searching. In Proceedings of Sixth Special Interest Group on Chinese Language Processing Workshop, pages 171-174,2008.
    [103]Chun-Nam John Yu and Thorsten Joachims. Training structural svms with kernels using sampled cuts. In KDD'08:Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data min-ing, pages 794-802, New York, NY, USA,2008. ACM.
    [104]Xiaofeng Yu, Wai Lam, Shing-Kit Chan, Yiukei Wu, and Bo Chen. Chinese NER using CRFs and logic for the fourth sighan bakeoff. In Proceedings of Sixth Special Interest Group on Chinese Language Processing Workshop, pages 102-105,2008.
    [105]Suxiang Zhang, Xiaojie Wang, Juan Wen, Ying Qin, and Yixin Zhong. A probabilistic feature based maximum entropy model for chinese named entity recognition. In The Twenty First International Conference on the Computer Processing of Oriental Languages, Singapore, December 2006.
    [106]Yue Zhang and Stephen Clark. Chinese segmentation with a word-based perceptron algorithm. In Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, pages 840-847, Prague, Czech Republic, June 2007. Association for Computational Linguistics.
    [107]Yue Zhang and Stephen Clark. Joint word segmentation and POS tagging using a single perceptron. In Proceedings of A CL-08:HLT, pages 888-896, Columbus, Ohio, June 2008. Association for Computational Linguistics.
    [108]Hai Zhao and Chunyu Kit. Unsupervised segmentation helps supervised learning of character tagging for word segmentation and named entity recog-nition. In Proceedings of Sixth Special Interest Group on Chinese Language Processing Workshop, pages 106-111,2008.
    [109]Yaqian Zhou, Lide Wu, Fuliang Weng, and Hauke Schmidt. A fast algo-rithm for feature selection in conditional maximum entropy modeling. In Proceedings of the 2003 conference on Empirical methods in natural lan-guage processing, pages 153-159, Morristown, NJ, USA,2003. Association for Computational Linguistics.
    [110]Jun Zhu, Zaiqing Nie, Ji-Rong Wen, Bo Zhang, and Wei-Ying Ma.2D Conditional Random Fields for web information extraction. In ICML'05: Proceedings of the 22nd international conference on Machine learning, pages 1044-1051, New York, NY, USA,2005. ACM.
    [111]Jun Zhu, Eric P. Xing, and Bo Zhang. Laplace maximum margin markov networks. In ICML, pages 1256-1263,2008.
    [112]Jun Zhu, Eric P. Xing, and Bo Zhang. Primal sparse max-margin markov networks. In KDD'09:Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 1047-1056, New York, NY, USA,2009. ACM.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700