自然语言处理中序列标注问题的联合学习方法研究

英文题名：Research on Joint Learning of Sequence Labeling in Natural Language Processing
作者：李鑫鑫
论文级别：博士
学科专业名称：计算机应用技术
中文关键词：序列标注问题 ; 联合学习 ; 重排序方法 ; 迭代方法 ; 深度神经网络
英文关键词：Sequence Labeling Problem ; Joint Learning ; Reranking Approach ; Iterative
英文关键词：Approach ; Deep Neural Network
学位年度：2014
导师：王轩
学科代码：081203
学位授予单位：哈尔滨工业大学
论文提交日期：2014-06-01

摘要

序列标注问题是自然语言处理领域的基本问题之一，可以分为两类：单序列标注问题，即预测一个输出标签序列的序列标注问题；多序列标注问题，即预测多个输出标签序列的序列标注问题。对于多序列标注问题，一般采用级联学习方法来处理，这种方法将多序列标注问题当作多个单序列标注问题来逐一进行处理，往往存在错误传递、信息无法共享等缺点。而联合学习方法却能有效克服以上不足，它对多序列标注问题包含的多个单序列标注问题同时进行处理，能够促进问题间的信息交互。本文探析了不同类型的序列标注问题，对单序列标注方法和联合学习方法进行了研究，其中联合学习方法是本文的研究重点。具体的研究内容包括：
     第一、传统序列标注方法一般采用预测单元的邻近信息作为模型的特征，较少考虑序列中的全局信息，使得预测结果不够准确。针对这一问题，本文提出融合全局信息的级联重排序方法。对于单序列标注问题，级联重排序方法引入包含序列全局信息和句法信息的模型，首先，采用线性重排序方法将这些模型进行结合；然后，从这些模型的预测结果中提取特征来训练结构化感知器重排序方法的模型；最后，将线性重排序方法和结构化感知器重排序方法进行级联来选择最优标签序列。对于多序列标注问题，级联重排序方法能够使用单序列标注问题的全局信息和多个问题的信息，本文称之为级联重排序联合学习方法。实验结果表明：级联重排序方法提高了汉语音字转换问题和汉语语音识别问题的识别准确率，优于单个重排序方法；级联重排序联合学习方法在英语词性标注和组块分析问题上取得了优于级联学习方法和标签值结合方法的预测性能。
     第二、与单一学习方法相比，统一解析方法能通过在解析过程中将多个单一模型进行结合来提高预测性能。针对多序列标注问题，本文提出有监督和半监督的统一解析联合学习方法。有监督统一解析联合学习方法在解析过程中通过概率加权的方式来结合多个联合学习模型。在半监督统一解析联合学习方法中，首先采用两个联合学习模型对未标注语料进行标注，然后将两个模型预测的标签序列相同的语料作为新训练语料，最后使用原训练语料和新训练语料来训练半监督模型。将统一解析联合学习方法应用于中文分词和词性标注问题，实验结果表明：有监督统一解析联合学习方法优于单一有监督学习方法，半监督统一解析联合学习方法优于目前其他的半监督学习方法。
     第三、当多序列标注问题中各个单序列标注问题的训练集不一致时，不能采用级联重排序联合学习方法和统一解析联合学习方法来解决。针对这一问题，本文提出一种迭代联合学习方法，使多序列标注问题中的各个单序列标注问题通过特征传递的方式来交互信息。在迭代过程中，对于每个单序列标注问题，首先采用结构化感知器方法将基本模型和包含其他问题信息的模型进行集成，然后再采用该集成学习模型进行预测。英文词性标注和组块分析问题、中文分词和词性标注与名实体识别问题的实验结果表明了迭代联合学习方法的有效性。
     第四、传统中文序列标注方法采用字词等离散信息作为特征来训练模型，存在模型规模庞大和需要人工特征选择的不足。针对这个问题，本文首先提出一种基于词边界字向量的深度神经网络模型，并用于解决中文单序列标注问题。在模型的字向量表示层，将每个汉字输入表示为词边界字向量的组合；在模型的标签推导层，采用二阶标签转移矩阵来加强邻近标签之间的约束。然后，采用深度神经网络联合学习方法来处理中文多序列标注问题，该方法通过共享多个单序列标注模型的字向量表示层来促进问题间的信息交互。中文分词和词性标注与中文名实体识别的实验结果显示：基于词边界字向量的深度神经网络模型要优于基于基本字向量的模型，而采用深度神经网络联合学习方法能进一步提高模型的预测性能。最后，通过实验对论文提出的四种联合学习方法进行比较分析。
Sequence labeling is one of the fundamental problems in natural language process-ing. In this thesis, we classify it into two categories: single sequence labeling problem(SSLP), which predicts one output label sequence; multiple sequence labeling problems(MSLPs), which predict several output label sequences. The cascaded approach treatsMSLPs as multiple SSLPs, and processes these SSLPs in pipeline. However, this ap-proach has the drawbacks that error propagation and the lack of information sharing areamong these multiple SSLPs. Joint learning approach can overcome these drawbacksby jointly processing multiple SSLPs in one model or in one framework. It efectivelyenhances information exchange among these SSLPs, and improves their prediction per-formance. This thesis discusses diferent types of sequence labeling problems, and studieson single sequence labeling approaches and joint learning approaches. Our main researchtopics include:
     1. Traditional sequence labeling approaches use neighboring information of theinput unit as features and usually lack global information, which tend to mistakenly anno-tate the unit. A cascaded reranking approach with global information fusion is proposedto solve this problem. For SSLP, the cascaded reranking approach brings several mod-els with sequence global information and syntactic information. First, a linear rerankingapproach is used to combine these models. Second, a structured perceptron rerankingapproach uses features extracted from these models to build the reranking model. Finally,the linear reranking approach and the structured perceptron reranking approach are cas-caded to choose the optimal output label sequence. For MSLPs, the cascaded rerankingjoint learning approach can employ global information in each SSLP and combination in-formation among these SSLPs. Experimental results show that the cascaded reranking ap-proach improves the recognition accuracy on Chinese pinyin-to-character conversion andMandarin speech recognition by incorporating part-of-speech and syntactic information,and the cascaded reranking joint learning approach outperforms the cascaded approachand the tag combination approach on English part-of-speech tagging and chunking.
     2. Compared with a single learning approach, joint decoding approach can integratediferent models in decoding phase, and improve the prediction performance. The the-sis proposes supervised and semi-supervised joint decoding approaches for MSLPs. The supervised joint decoding approach integrates diferent models with linear weights in de-coding phase, and the semi-supervised joint decoding approach selects the text annotatedsame by two models as new training sentences. Then the joint decoding approaches areused for Chinese word segmentation and part-of-speech tagging. Experimental resultsshow that the supervised joint decoding approach outperforms other single supervised ap-proaches, and the semi-supervised joint decoding approach outperforms the state-of-artsupervised approaches and semi-supervised approaches.
     3. The cascaded reranking joint learning approach and joint decoding approach can’tbe applied for MSLPs with inconsistent training data. An iterative joint learning approachis proposed to solve this problem, which allows each SSLP in MSLPs to share informa-tion with other SSLPs through feature propagation. In the iterative step, each problemuses a structured perceptron based ensemble method to combine the models using basicinformation and the models using the information in other problems. Experimental re-sults on English part-of-speech tagging and chunking, Chinese word segmentation andpart-of-speech tagging&named entity recognition show that the iterative joint learningapproach achieves a better performance than the pipeline approach, the tag combinationapproach and other ensemble methods.
     4. Traditional approaches for Chinese sequence labeling problems utilize discretelinguistic information as features. However, the scale of training model is greatly in-creased during the training process, and features for diferent problems need to be manu-ally chosen and tuned on development data. To solve this problem, a deep neural networkmodel with word boundary based character representation is proposed, and then appliedon Chinese single sequence labeling problem. In the character representation layer of thedeep neural network model, each Chinese character is converted to a combination of fourword boundary based character representations. In the tag inference layer, this deep neu-ral network model uses a second-order tag transition matrix to enhance tag constraints.Then, a deep neural network based joint learning approach is used for MSLPs to increaseinformation exchange among multiple SSLPs by sharing their character representationlayer. Experimental results on Chinese word segmentation and part-of-speech tagging&named entity recognition show that the deep neural network model with word boundarybased character representation outperforms the model with baseline character representa-tion, and the deep neural network based joint learning approach further improves singlemodels.

引文

[1] Tang J, Zhang J, Jin R. Topic Level Expertise Search Over Heterogeneous Net-works[J]. Machine Learning,2011,82(2):211–237.
    [2] Graves A. Supervised Sequence Labelling with Recurrent Neural Networks[D].Munich, Germany: Technische Universita¨t Mu¨nchen,2008:4–12.
    [3] Azuma A, Matsumoto Y. Multilayer Sequence Labeling[C]//Proceedings of the2011Conference on Empirical Methods in Natural Language Processing. Edin-burgh, Scotland, UK: Association for Computational Linguistics,2011:628–637.
    [4] Ng H T, Low J K. Chinese Part-of-Speech Tagging: One-at-a-Time or All-at-Once?Word-Based or Character-Based?[C]//Proceedings of EMNLP2004. Barcelona,Spain: Association for Computational Linguistics,2004:277–284.
    [5] Hollingshead K, Roark B. Pipeline Iteration[C]//Proceedings of the45th AnnualMeeting of the Association of Computational Linguistics. Prague, Czech Republic:Association for Computational Linguistics,2007:952–959.
    [6] Toutanova K, Haghighi A, Manning C. Joint Learning Improves Semantic RoleLabeling[C]//Proceedings of the43rd Annual Meeting of the Association for Com-putational Linguistics. Ann Arbor, Michigan: Association for Computational Lin-guistics,2005:589–596.
    [7] Rabiner L R. A Tutorial on Hidden Markov Models and Selected Applications inSpeech Recognition[J]. Proceedings of the IEEE,1989,77(2):257–286.
    [8] Laferty J D, McCallum A, Pereira F. Conditional Random Fields: ProbabilisticModels for Segmenting and Labeling Sequence Data[C]//Proceedings of the Eigh-teenth International Conference on Machine Learning. San Francisco, CA, USA:Morgan Kaufmann Publishers Inc.,2001:282–289.
    [9] Koller D, Friedman N. Probabilistic Graphical Models: Principles and Tech-niques[M]. Cambridge, Massachusetts, USA: The MIT Press,2009:103–156.
    [10] Berger A L, Pietra V J D, Pietra S A D. A Naximum Entropy Approach to NaturalLanguage Processing[J]. Computational Linguistics,1996,22(1):39–71.
    [11] Chen S F, Goodman J. An Empirical Study of Smoothing Techniques for LanguageModeling[C]//Proceedings of the34th Annual Meeting on Association for Compu-tational Linguistics. Santa Cruz, California, USA: Association for ComputationalLinguistics,1996:310–318.
    [12] McCallum A, Freitag D, Pereira F. Maximum Entropy Markov Models for In-formation Extraction and Segmentation[C]//Proceedings of the Seventeenth Inter-national Conference on Machine Learning. San Francisco, CA, USA: MorganKaufmann Publishers Inc.,2000:591–598.
    [13] Collins M. Discriminative Training Methods for Hidden Markov Models: The-ory and Experiments with Perceptron Algorithms[C]//Proceedings of the ACL-02Conference on Empirical Methods in Natural Language Processing. Philadelphia,PA, USA: Association for Computational Linguistics,2002:1–8.
    [14] Altun Y, Tsochantaridis I, Hofmann T, et al. Hidden Markov Support VectorMachines[C]//Proceedings of the Twentieth International Conference on MachineLearning. Washington DC, USA: AAAI Press,2003:3–10.
    [15] Tsochantaridis I, Joachims T, Hofmann T, et al. Large Margin Methods for Struc-tured and Interdependent Output Variables[J]. Journal of Machine Learning Re-search,2005,6(1):1453–1484.
    [16] Taskar B, Guestrin C, Koller D. Max-Margin Markov Networks[C]//Advances inNeural Information Processing Systems. Vancouver, British Columbia, Canada:MIT Press,2003:25–32.
    [17] Daume′H, Langford J, Marcu D. Search-based Structured Prediction[J]. MachineLearning,2009,75(3):297–325.
    [18] Sun X, Tsujii J. Sequential Labeling with Latent Variables: An Exact Inference Al-gorithm and Its Efcient Approximation[C]//Proceedings of the12th Conference ofthe European Chapter of the Association for Computational Linguistics. Strouds-burg, PA, USA: Association for Computational Linguistics,2009:772–780.
    [19] Sun X, Matsuzaki T, Okanohara D, et al. Latent Variable Perceptron Algorithmfor Structured Classification.[C]//Proceedings of the21st International Jont Con-ference on Artifical Intelligence. Pasadena, California, USA: Morgan KaufmannPublishers Inc.,2009:1236–1242.
    [20] Collobert R, Weston J, Bottou L, et al. Natural Language Processing (Almost) fromScratch[J]. Journal of Machine Learning Research,2011,12(1):2493–2537.
    [21] Shen L, Satta G, Joshi A. Guided Learning for Bidirectional Sequence Classi-fication[C]//Proceedings of the45th Annual Meeting of the Association of Com-putational Linguistics. Prague, Czech Republic: Association for ComputationalLinguistics,2007:760–767.
    [22] Sun W. A Stacked Sub-Word Model for Joint Chinese Word Segmentation andPart-of-Speech Tagging[C]//Proceedings of the49th Annual Meeting of the Asso-ciation for Computational Linguistics. Portland, Oregon, USA: Association forComputational Linguistics,2011:1385–1394.
    [23] Finkel J R, Manning C D. Nested Named Entity Recognition[C]//Proceedingsof the2009Conference on Empirical Methods in Natural Language Processing.Singapore: Association for Computational Linguistics,2009:141–150.
    [24] Mann G, McDonald R, Mohri M, et al. Efcient Large-Scale Distributed Train-ing of Conditional Maximum Entropy Models[C]//Advances in Neural InformationProcessing Systems. Vancouver, B.C., Canada: ACM Press,2009:1–10.
    [25] Phan X, Nguyen L, Horiguchi S, et al. Parallel Training of CRFs:A Practical Approach to Build Large-Scale Prediction Models for SequenceData[C]//ECML/PKDD2006Workshop on Parallel Data Mining. Berlin, German:Springer,2006:51–63.
    [26] Lavergne T, Cappe′O, Yvon F. Practical Very Large Scale CRFs[C]//Proceedingsof the48th Annual Meeting of the Association for Computational Linguistics. Up-psala, Sweden: Association for Computational Linguistics,2010:504–513.
    [27] Piatkowski N, Morik K. Parallel Inference on Structured Data with CRFs onGPUs[C]//International Workshop at ECML PKDD on Collective Learning andInference on Structured Data. Athens, Greece: Springer,2011:1–12.
    [28] Zhao K, Huang L. Minibatch and Parallelization for Online Large Margin Struc-tured Learning[C]//Proceedings of the2013Conference of the North AmericanChapter of the Association for Computational Linguistics. Atlanta, Georgia: As-sociation for Computational Linguistics,2013:370–379.
    [29] Dean J, Corrado G, Monga R, et al. Large Scale Distributed Deep Net-works[C]//Advances in Neural Information Processing Systems. Lake Tahoe,Nevada, United States: ACM Press,2012:1232–1240.
    [30] Coates A, Huval B, Wang T, et al. Deep Learning with COTS HPC Sys-tems[C]//Proceedings of the30th International Conference on Machine Learning.Atlanta, GA, USA: Microtome Publishing,2013:1337–1345.
    [31] Hinton G, Deng L, Yu D, et al. Deep Neural Networks for Acoustic Modeling inSpeech Recognition: The Shared Views of Four Research Groups[J]. IEEE SignalProcessing Magazine,2012,29(6):82–97.
    [32] Pauls A, Klein D. Faster and Smaller N-Gram Language Models[C]//Proceedingsof the49th Annual Meeting of the Association for Computational Linguistics. Port-land, Oregon, USA: Association for Computational Linguistics,2011:258–267.
    [33] Pauls A, Klein D. Large-Scale Syntactic Language Modeling withTreelets[C]//Proceedings of the50th Annual Meeting of the Association for Com-putational Linguistics. Jeju Island, Korea: Association for Computational Linguis-tics,2012:959–968.
    [34] Heafield K. KenLM: Faster and Smaller Language Model Queries[C]//Proceedingsof the EMNLP2011Sixth Workshop on Statistical Machine Translation. Ed-inburgh, Scotland, United Kingdom: Association for Computational Linguistics,2011:187–197.
    [35] Heafield K, Pouzyrevsky I, Clark J H, et al. Scalable Modified Kneser-Ney Lan-guage Model Estimation[C]//Proceedings of the51st Annual Meeting of the Asso-ciation for Computational Linguistics. Sofia, Bulgaria: Association for Computa-tional Linguistics,2013:690–696.
    [36] Federico M, Bertoldi N, Cettolo M. IRSTLM: an Open Source Toolkit for Han-dling Large Scale Language Models[C]//Proceedings of Interspeech. Brisbane,Australia: International Speech Communication Association,2008:1618–1621.
    [37] Bengio Y, Ducharme R, Vincent P, et al. A Neural Probabilistic LanguageModel[J]. Journal of Machine Learning Research,2003,3(1):1137–1155.
    [38] Morin F, Bengio Y. Hierarchical Probabilistic Neural Network LanguageModel[C]//Proceedings of the International Workshop on Artificial Intelligenceand Statistics. Barbados: The Society for Artificial Intelligence and Statistics,2005:246–252.
    [39] Mnih A, Hinton G. Three New Graphical Models for Statistical Language Mod-elling[C]//Proceedings of the24th International Conference on Machine Learning.Corvalis, Oregon: ACM Press,2007:641–648.
    [40] Mnih A, Teh Y W. A Fast and Simple Algorithm for Training Meural Proba-bilistic Language Models[C]//Proceedings of the29th International Conference onMachine Learning. Beijing, China: ACM Press,2012:1751–1758.
    [41] Mikolov T, Karafia′t M, Burget L, et al. Recurrent Neural Network Based LanguageModel[C]//Eleventh Annual Conference of the International Speech Communica-tion Association. Makuhari, Japan: ISCA,2010:1045–1048.
    [42] Mikolov T. Statistical Language Models Based on Neural Networks[D]. Brno,Czech Republic: Brno University of Technology,2012:26–42.
    [43] Peng F, Feng F, McCallum A. Chinese Segmentation and New Word DetectionUsing Conditional Random Fields[C]//Proceedings of the20th international con-ference on Computational Linguistics. Stroudsburg, PA, USA: Association forComputational Linguistics,2004:562–568.
    [44] Zhao H, Huang C, Li M, et al. A Unified Character-Based Tagging Framework forChinese Word Segmentation[J]. ACM Transactions on Asian Language Informa-tion Processing,2010,9(2):1–32.
    [45] Chen W, Zhang Y, Isahara H. Chinese Named Entity Recognition with ConditionalRandom Fields[C]//Proceedings of the Fifth SIGHAN Workshop on Chinese Lan-guage Processing. Sydney, Australia: Association for Computational Linguistics,2006:118–121.
    [46] Hifny Y, Renals S. Speech Recognition Using Augmented Conditional RandomFields[J]. IEEE Transactions on Audio, Speech and Language Processing,2009,17(2):354–365.
    [47] Sarawagi S, Cohen W. Semi-Markov Conditional Random Fields for InformationExtraction[C]//Advances in Neural Information Processing Systems. Vancouver,Canada: MIT Press,2005，17:1185–1192.
    [48] Morency L P, Quattoni A, Darrell T. Latent-Dynamic Discriminative Models forContinuous Gesture Recognition[C]//IEEE Conference on Computer Vision andPattern Recognition. Minneapolis, MN, USA: IEEE,2007:1–8.
    [49] Mueller T, Schmid H, Schu¨tze H. Efcient Higher-Order CRFs for MorphologicalTagging[C]//Proceedings of the2013Conference on Empirical Methods in NaturalLanguage Processing. Seattle, Washington, USA: Association for ComputationalLinguistics,2013:322–332.
    [50] Zhang M, Zhang Y, Che W, et al. Chinese Parsing Exploiting Charac-ters[C]//Proceedings of the51st Annual Meeting of the Association for Compu-tational Linguistics. Sofia, Bulgaria: Association for Computational Linguistics,2013:125–134.
    [51] Jiang W, Sun M, Lu¨ Y, et al. Discriminative Learning with Natural Annotations:Word Segmentation as a Case Study[C]//Proceedings of the51st Annual Meetingof the Association for Computational Linguistics. Sofia, Bulgaria: Association forComputational Linguistics,2013:761–769.
    [52] Zhang Y, Clark S. Shift-Reduce CCG Parsing[C]//Proceedings of the49th AnnualMeeting of the Association for Computational Linguistics. Portland, Oregon, USA:Association for Computational Linguistics,2011:683–692.
    [53] Zhang Y, Nivre J. Transition-based Dependency Parsing with Rich Non-localFeatures[C]//Proceedings of the49th Annual Meeting of the Association for Com-putational Linguistics. Portland, Oregon, USA: Association for ComputationalLinguistics,2011:188–193.
    [54] Bohnet B, Nivre J. A Transition-Based System for Joint Part-of-Speech Tag-ging and Labeled Non-Projective Dependency Parsing[C]//Proceedings of the2012Joint Conference on EMNLP and CoNLL. Jeju Island, Korea: Association forComputational Linguistics,2012:1455–1465.
    [55] McDonald R, Hall K, Mann G. Distributed Training Strategies for the StructuredPerceptron[C]//The2010Annual Conference of the North American Chapter of theAssociation for Computational Linguistics. Los Angeles, California: Associationfor Computational Linguistics,2010:456–464.
    [56] Daume′III H, Marcu D. NP Bracketing by Maximum Entropy Tagging and SVMReranking[C]//Proceedings of EMNLP2004. Barcelona, Spain: Association forComputational Linguistics,2004:254–261.
    [57] Zhang Y, He P, Xiang W, et al. Discriminative Reranking for Spelling Correc-tion[C]//Proceedings of the20th Pacific Asia Conference on Language, Informa-tion and Computation. Wuhan, China: Tsinghua University Press,2006:64–71.
    [58] Huang Z, Harper M, Wang W. Mandarin Part-of-Speech Tagging and Discrim-inative Reranking[C]//Proceedings of the2007Joint Conference on EmpiricalMethods in Natural Language Processing and Computational Natural LanguageLearning. Prague, Czech Republic: Association for Computational Linguistics,2007:1093–1102.
    [59] Nguyen T T V, Moschitti A, Riccardi G. Kernel-based Reranking for Named-EntityExtraction[C]//Coling2010: Poster. Beijing, China: Coling2010Organizing Com-mittee,2010:901–909.
    [60] Liu C W, Lin S H, Chen B. Exploiting Discriminative Language Models forReranking Speech Recognition Hypotheses[C]//Proceedings of the22nd Confer-ence on Computational Linguistics and Speech Processing. Nantou, Taiwan: TheAssociation for Computer Linguistics and Chinese Language Processing,2010:30–49.
    [61] Tomeh N, Habash N, Roth R, et al. Reranking with Linguistic and Semantic Fea-tures for Arabic Optical Character Recognition[C]//Proceedings of the51st AnnualMeeting of the Association for Computational Linguistics. Sofia, Bulgaria: Asso-ciation for Computational Linguistics,2013:549–555.
    [62] Dietterich T G. Machine Learning Research: Four Current Directions[J]. AI Mag-azine,1997,18(4):97–136.
    [63] Schapire R E. The Strength of Weak Learnability[J]. Machine Learning,1990,5(2):197–227.
    [64] Freund Y, Schapire R. A Desicion-Theoretic Generalization of On-Line Learn-ing and an Application to Boosting[C]//Computational Learning Theory. Berlin:Springer,1995:23–37.
    [65] Breiman L. Bagging Predictors[J]. Machine Learning,1996,24(2):123–140.
    [66] Breiman L. Random Forests[J]. Machine Learning,2001,45(1):5–32.
    [67] Sun W. Word-Based and Character-Based Word Segmentation Models: Com-parison and Combination[C]//Coling2010: Posters. Beijing, China: Coling2010Organizing Committee,2010:1211–1219.
    [68] Van Halteren H, Zavrel J, Daelemans W. Improving Accuracy in Word Class Tag-ging Through the Combination of Machine Learning Systems[J]. ComputationalLinguistics,2001,27(2):199–229.
    [69] Sang E F T K. Text Chunking by System Combination[C]//Proceedings of the2nd Workshop on Learning Language in Logic and the4th Conference on Com-putational Natural Language Learning. Morristown, NJ, USA: Association forComputational Linguistics,2000:151–153.
    [70] Sang E F T K. Noun Phrase Recognition by System Combination[C]//Proceedingsof the1st North American Chapter of the Association for Computational Linguis-tics Conference. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc.,2000:50–55.
    [71] Sagae K, Lavie A. Parser Combination by Reparsing[C]//Proceedings of the Hu-man Language Technology Conference of the NAACL. New York City, USA:Association for Computational Linguistics,2006:129–132.
    [72] Yarowsky D. Unsupervised Word Sense Disambiguation Rivaling SupervisedMethods[C]//Proceedings of the33rd Annual Meeting on Association for Com-putational Linguistics. Stroudsburg, PA, USA: Association for Computational Lin-guistics,1995:189–196.
    [73] Blum A, Mitchell T. Combining Labeled and Unlabeled Data with Co-Training[C]//Proceedings of the Eleventh Annual Conference on ComputationalLearning Theory. New York, NY, USA: ACM Press,1998:92–100.
    [74] Brefeld U, Bu¨scher C, Schefer T. Multi-View Discriminative Sequential Learn-ing[C]//Machine Learning: ECML2005. Porto, Portugal: Springer,2005:60–71.
    [75] Joachims T. Transductive Inference for Text Classification Using Support VectorMachines[C]//Proceedings of the Sixteenth International Conference on MachineLearning. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc.,1999:200–209.
    [76] Wang W, Huang Z, Harper M. Semi-Supervised Learning for Part-of-SpeechTagging of Mandarin Transcribed Speech[C]//IEEE International Conference onAcoustics, Speech and Signal Processing. Honolulu, Hawaii, USA: IEEE SignalProcessing Society,2007:137–140.
    [77] Kaljahi R S Z. Adapting Self-Training for Semantic Role Label-ing[C]//Proceedings of the ACL2010Student Research Workshop. Uppsala, Swe-den: Association for Computational Linguistics,2010:91–96.
    [78] Spoustova′D j, HajicˇJ, Raab J, et al. Semi-Supervised Training for the AveragedPerceptron POS Tagger[C]//Proceedings of the12th Conference of the EuropeanChapter of the ACL. Athens, Greece: Association for Computational Linguistics,2009:763–771.
    [79] Huang Z, Eidelman V, Harper M. Improving a Simple Bigram HMM Part-of-Speech Tagger by Latent Annotation and Self-Training[C]//Annual Conferenceof the North American Chapter of the Association for Computational Linguis-tics: Short Papers. Boulder, Colorado: Association for Computational Linguistics,2009:213–216.
    [80] McClosky D, Charniak E, Johnson M. Efective Self-Training for Pars-ing[C]//Proceedings of the Human Language Technology Conference of theNAACL. New York City, USA: Association for Computational Linguistics,2006:152–159.
    [81] Dasgupta S, Littman M L, McAllester D. PAC Generalization Bounds for Co-Training[C]//In Proceeding of Advances in Neural Information Processing Sys-tems. Vancouver, British Columbia, Canada: MIT Press,2002:375–382.
    [82] Abney S. Bootstrapping[C]//Proceedings of the40th Annual Meeting on As-sociation for Computational Linguistics. Stroudsburg, PA, USA: Association forComputational Linguistics,2002:360–367.
    [83] Clark S, Curran J R, Osborne M. Bootstrapping POS Taggers using UnlabelledData[C]//Proceedings of the Seventh Conference on Natural Language Learning atHLT-NAACL2003. Morristown, NJ, USA: Association for Computational Lin-guistics,2003:49–55.
    [84] Zhou Z H, Li M. Tri-Training: Exploiting Unlabeled Data Using ThreeClassifiers[J]. IEEE Transactions on Knowledge and Data Engineering,2005,17(1):1529–1541.
    [85] Chen W, Zhang Y, Isahara H. Chinese Chunking with Tri-training Learn-ing[C]//Proceedings of the21st International Conference on Computer Process-ing of Oriental Languages: Beyond the Orient: The Research Challenges Ahead.Berlin: Springer,2006:466–473.
    [86] S gaard A. Simple Semi-Supervised Training of Part-Of-Speech Tag-gers[C]//Proceedings of the ACL2010Conference Short Papers. Uppsala, Sweden:Association for Computational Linguistics,2010:205–208.
    [87] S gaard A. Semi-Supervised Condensed Nearest Neighbor for Part-of-SpeechTagging[C]//Proceedings of the49th Annual Meeting of the Association for Com-putational Linguistics: Human Language Technologies. Portland, Oregon, USA:Association for Computational Linguistics,2011:48–52.
    [88] Fine S, Singer Y, Tishby N. The Hierarchical Hidden Markov Model: Analysis andApplications[J]. Machine Learning,1998,32(1):41–62.
    [89] Sutton C, McCallum A, Rohanimanesh K. Dynamic Conditional Random Fields:Factorized Probabilistic Models for Labeling and Segmenting Sequence Data[J].The Journal of Machine Learning Research,2007,8(1):693–723.
    [90] Caruana R. Multitask Learning[J]. Machine Learning,1997,28(1):41–75.
    [91] Zheng X, Chen H, Xu T. Deep Learning for Chinese Word Segmentation and POSTagging[C]//Proceedings of the2013Conference on Empirical Methods in NaturalLanguage Processing. Seattle, Washington, USA: Association for ComputationalLinguistics,2013:647–657.
    [92] Shimizu N, Haas A. Exact Decoding for Jointly Labeling and Chunking Se-quences[C]//Proceedings of the COLING/ACL2006Main Conference Poster Ses-sions. Sydney, Australia: Association for Computational Linguistics,2006:763–770.
    [93] Zhang Y, Clark S. Joint Word Segmentation and POS Tagging Using a SinglePerceptron[C]//Proceedings of ACL-08: HLT. Columbus, Ohio: Association forComputational Linguistics,2008:888–896.
    [94] Kruengkrai C, Uchimoto K, Kazama J, et al. An Error-Driven Word-CharacterHybrid Model for Joint Chinese Word Segmentation and POS Tagging[C]//JointConference of ACL and IJCNLP. Morristown, NJ, USA: Association for Compu-tational Linguistics,2009:513–521.
    [95] Qian X, Zhang Q, Zhou Y, et al. Joint Training and Decoding Using VirtualNodes for Cascaded Segmentation and Tagging Tasks[C]//Proceedings of the2010Conference on Empirical Methods in Natural Language Processing. Cambridge,MA: Association for Computational Linguistics,2010:187–195.
    [96] Xun E, Huang C, Zhou M. A Unified Statistical Model for the Identification ofEnglish BaseNP[C]//Proceedings of the38th Annual Meeting on Association forComputational Linguistics. Morristown, NJ, USA: Association for ComputationalLinguistics,2000:109–116.
    [97] Shi Y, Wang M. A Dual-Layer CRFs Based Joint Decoding Method for CascadedSegmentation and Labeling Tasks[C]//Proceedings of IJCAI. Hyderabad, India:Morgan Kaufmann Publishers,2007:1707–1712.
    [98] Huang L. Forest Reranking: Discriminative Parsing with Non-Local Fea-tures[C]//Proceedings of ACL-08: HLT. Columbus, Ohio: Association for Com-putational Linguistics,2008:586–594.
    [99] Jiang W, Mi H, Liu Q. Word Lattice Reranking for Chinese Word Segmentationand Part-of-Speech Tagging[C]//Proceedings of the22nd International Conferenceon Computational Linguistics. Manchester, UK: Coling2008Organizing Commit-tee,2008:385–392.
    [100] Sutton C, McCallum A. Joint Parsing and Semantic Role Labeling[C]//Proceedingsof the Ninth Conference on Computational Natural Language Learning. Ann Ar-bor, Michigan: Association for Computational Linguistics,2005:225–228.
    [101] Roth D, Yih W. A Linear Programming Formulation for Global Inference in Nat-ural Language Tasks[C]//Eighth Conference on Computational Natural LanguageLearning. Boston, Massachusetts, USA: Association for Computational Linguis-tics,2004:1–8.
    [102] Punyakanok V, Roth D, Yih W. The Necessity of Syntactic Parsing for SemanticRole Labeling[C]//Proceeding of the International Joint Conference on ArtificialIntelligence. Edinburgh, Scotland: AAAI Press,2005:1117–1123.
    [103] Richardson M, Domingos P. Markov Logic Networks[J]. Machine Learning,2006,62(1):107–136.
    [104] Poon H, Domingos P. Joint Inference in Information Extraction[C]//Proceedingsof the22nd National Conference on Artificial Intelligence. Vancouver, BritishColumbia, Canada: AAAI Press,2007:913–918.
    [105] Toutanova K, Haghighi A, Manning C D. A Global Joint Model for Semantic RoleLabeling[J]. Computational Linguistics,2008,34(2):161–191.
    [106] Meza-Ruiz I, Riedel S. Jointly Identifying Predicates, Arguments and Senses usingMarkov Logic[C]//The2009Annual Conference of the North American Chapter ofthe Association for Computational Linguistics. Boulder, Colorado: Association forComputational Linguistics,2009:155–163.
    [107] Baxter J. A Model of Inductive Bias Learning[J]. Journal of Artificial IntelligenceResearch,2000,12(1):149–198.
    [108] Quattoni A, Collins M, Darrell T. Transfer Learning for Image Classification withSparse Prototype Representations[C]//IEEE Conference on Computer Vision andPattern Recognition. Los Alamitos, CA, USA: IEEE Computer Society,2008:1–8.
    [109] Ando R K, Zhang T. A Framework for Learning Predictive Structures from Mul-tiple Tasks and Unlabeled Data[J]. Journal of Machine Learning Research,2005,6(1):1817–1853.
    [110] Toutanova K, Klein D, Manning C D, et al. Feature-Rich Part-of-Speech Tag-ging with a Cyclic Dependency Network[C]//Proceedings of the2003Conferenceof the North American Chapter of the Association for Computational Linguistics.Edmonton, Canada: Association for Computational Linguistics,2003:173–180.
    [111] Owoputi O, O’Connor B, Dyer C, et al. Improved Part-of-Speech Tagging for On-line Conversational Text with Word Clusters[C]//Proceedings of the2013Confer-ence of the North American Chapter of the Association for Computational Linguis-tics. Atlanta, Georgia: Association for Computational Linguistics,2013:380–390.
    [112] Plank B, Hovy D, McDonald R, et al. Adapting Taggers to Twitter with Not-so-Distant Supervision[C]//Proceedings of the25th International Conference onComputational Linguistics. Dublin, Ireland: Association for Computational Lin-guistics,2014:1783–1792.
    [113] Ma J, Zhang Y, Zhu J. Tagging The Web: Building A Robust Web Tagger withNeural Network[C]//Proceedings of the52nd Annual Meeting of the Associationfor Computational Linguistics. Baltimore, Maryland: Association for Computa-tional Linguistics,2014:144–154.
    [114] Abney S, Abney S P. Parsing By Chunks[C]//Principle-Based Parsing. Dordrecht:Kluwer Academic Publishers,1991:257–278.
    [115] Sang E F T K, Buchholz S. Introduction to the CoNLL-2000Shared Task: Chunk-ing[C]//Proceedings of the2nd Workshop on Learning Language in Logic and the4th Conference on Computational Natural Language Learning. Stroudsburg, PA,USA: Association for Computational Linguistics,2000:127–132.
    [116] Zhang T, Damerau F, Johnson D. Text Chunking Based on a Generalization ofWinnow[J]. Journal of Machine Learning Research,2002,2(1):615–637.
    [117] Sha F, Pereira F. Shallow Parsing with Conditional RandomFields[C]//Proceedings of the2003Conference of the North American Chapterof the Association for Computational Linguistics. Stroudsburg, PA, USA:Association for Computational Linguistics,2003:134–141.
    [118] Tang B, Wang X, Wang X. Chunking with Max-Margin Markov Net-works[C]//Proceedings of the22nd Pacific Asia Conference on Language, Informa-tion and Computation. Cebu City, Philippines: De La Salle University,2008:474–480.
    [119] Duh K. Jointly Labeling Multiple Sequences: A Factorial HMM Ap-proach[C]//Proceedings of the ACL Student Research Workshop. Ann Arbor,Michigan: Association for Computational Linguistics,2005:19–24.
    [120] Xue N. Chinese Word Segmentation as Character Tagging[J]. Computational Lin-guistics and Chinese Language Processing,2003,8(1):29–48.
    [121] Andrew G. A Hybrid Markov/Semi-Markov Conditional Random Field for Se-quence Segmentation[C]//Proceedings of the2006Conference on Empirical Meth-ods in Natural Language Processing. Sydney, Australia: Association for Compu-tational Linguistics,2006:465–472.
    [122] Zhang Y, Clark S. Chinese Segmentation with a Word-Based Perceptron Algo-rithm[C]//Proceedings of the45th Annual Meeting of the Association of Com-putational Linguistics. Prague, Czech Republic: Association for ComputationalLinguistics,2007:840–847.
    [123] Tang B, Wang X, Wang X. Chinese Word Segmentation Based on Large MarginMethods[J]. International Journal of Asian Language Processing,2009,19(2):55–68.
    [124] Zhao H, Kit C. Integrating Unsupervised and Supervised Word Segmentation: TheRole of Goodness Measures[J]. Information Sciences,2011,181(1):163–183.
    [125] Jiang W, Huang L, Liu Q, et al. A Cascaded Linear Model for Joint Chi-nese Word Segmentation and Part-of-Speech Tagging[C]//Proceedings of ACL-08:HLT. Columbus, Ohio: Association for Computational Linguistics,2008:897–904.
    [126] Sun A, Grishman R, Sekine S. Semi-Supervised Relation Extraction with Large-scale Word Clustering[C]//Proceedings of the49th Annual Meeting of the Asso-ciation for Computational Linguistics: Human Language Technologies. Portland,Oregon, USA: Association for Computational Linguistics,2011:521–529.
    [127] Jiang W, Huang L, Liu Q. Automatic Adaptation of Annotation Standards: Chi-nese Word Segmentation and POS Tagging–A Case Study[C]//Proceedings ofthe Joint Conference of the47th Annual Meeting of the ACL. Suntec, Singapore:Association for Computational Linguistics,2009:522–530.
    [128] Wang H, Zhu J, Tang S, et al. A New Unsupervised Approach to Word Segmenta-tion[J]. Computational Linguistics,2011,37(3):421–454.
    [129] Sun W, Wan X. Reducing Approximation and Estimation Errors for ChineseLexical Processing with Heterogeneous Annotations[C]//Proceedings of the50thAnnual Meeting of the Association for Computational Linguistics. Jeju Island,Korea: Association for Computational Linguistics,2012:232–241.
    [130] Zeng X, Wong D F, Chao L S, et al. Graph-based Semi-Supervised Model forJoint Chinese Word Segmentation and Part-of-Speech Tagging[C]//Proceedings ofthe51st Annual Meeting of the Association for Computational Linguistics. Sofia,Bulgaria: Association for Computational Linguistics,2013:770–779.
    [131] Qiu X, Zhao J, Huang X. Joint Chinese Word Segmentation and POS Tagging onHeterogeneous Annotated Corpora with Multiple Task Learning[C]//Proceedingsof the2013Conference on Empirical Methods in Natural Language Processing.Seattle, Washington, USA: Association for Computational Linguistics,2013:658–668.
    [132] Chinchor N. MUC-6Named Entity Task Definition (Version2.1)[C]//6th MessageUnderstanding Conference. Columbia, Maryland: ARPA,1995:1–14.
    [133] Levow G A. The Third International Chinese Language Processing Bakeof:Word Segmentation and Named Entity Recognition[C]//Proceedings of the FifthSIGHAN Workshop on Chinese Language Processing. Sydney, Australia: Associ-ation for Computational Linguistics,2006:108–117.
    [134] Liu Z, Zhu C, Zhao T. Chinese Named Entity Recognition with a Sequence Label-ing Approach: Based on Characters, or Based on Words?[C]//The6th InternationalConference on Intelligent Computing. Changsha, China: Springer,2010:634–640.
    [135] Zhang S, Qin Y, Wen J, et al. Word Segmentation and Named Entity Recogni-tion for SIGHAN Bakeof3[C]//Proceedings of the Fifth SIGHAN Workshop onChinese Language Processing. Sydney, Australia: Association for ComputationalLinguistics,2006:158–161.
    [136] Che W, Wang M, Manning C D, et al. Named Entity Recognition with BilingualConstraints[C]//Proceedings of the2013Conference of the North American Chap-ter of the Association for Computational Linguistics. Atlanta, Georgia: Associationfor Computational Linguistics,2013:52–62.
    [137] Zhao H, Kit C. Unsupervised Segmentation Helps Supervised Learning of Charac-ter Tagging for Word Segmentation and Named Entity Recognition[C]//The SixthSIGHAN Workshop on Chinese Language Processing. Hyderabad, India: IJCNLP,2008:106–111.
    [138] Guo H, Jiang J, Hu G, et al. Chinese Named Entity Recognition Based on Mul-tilevel Linguistic Features[C]//Natural Language Processing–IJCNLP. Hainan,China: Springer,2005:90–99.
    [139] Jiang F, Liu H, Chen Y, et al. An Enhanced Model for Chinese Word Segmentationand Part-of-Speech Tagging[C]//ACL SIGHAN Workshop2004. Barcelona, Spain:Association for Computational Linguistics,2004:28–32.
    [140] Shen L, Joshi A K. Ranking and Reranking with Perceptron[J]. Machine Learning,2005,60(1):73–96.
    [141] Och F J. Minimum Error Rate Training in Statistical Machine Transla-tion[C]//Proceedings of the41st Annual Meeting of the Association for Compu-tational Linguistics. Sapporo, Japan: Association for Computational Linguistics,2003:160–167.
    [142] Zaidan O. Z-MERT: A Fully Configurable Open Source Tool for Minimum ErrorRate Training of Machine Translation Systems[J]. The Prague Bulletin of Mathe-matical Linguistics,2009,91(1):79–88.
    [143] Collins M, Roark B, Saraclar M. Discriminative Syntactic Language Modelingfor Speech Recognition[C]//Proceedings of the43rd Annual Meeting of the As-sociation for Computational Linguistics. Ann Arbor, Michigan: Association forComputational Linguistics,2005:507–514.
    [144] Chen S F, Goodman J. An Empirical Study of Smoothing Techniques for Lan-guage Modeling[R]. Cambridge, Massachusetts, USA: Computer Science Group,Harvard University,1998:1–63.
    [145]单煜翔,邓妍,刘加.一种联合语种识别的新型大词汇量连续语音识别算法[J].自动化学报,2012,38(3):366–374.
    [146] Siu M, Ostendorf M. Variable N-grams and Extensions for Conversational SpeechLanguage Modeling[J]. IEEE Transactions on Speech and Audio Processing,2000,8(1):63–75.
    [147] Ney H, Essen U, Kneser R. On Structuring Probabilistic Dependences in StochasticLanguage Modelling[J]. Computer Speech and Language,1994,8(1):1–38.
    [148] Rosenfeld R. A Maximum Entropy Approach to Adaptive Statistical LanguageModelling[J]. Computer Speech and Language,1996,10(3):187–228.
    [149] Roark B. Probabilistic Top-Down Parsing and Language Modeling[J]. Computa-tional Linguistics,2001,27(2):249–276.
    [150]王轩,王晓龙,藏晓莉.统计与规则相结合的计算机音字相互转换技术[J].哈尔滨工业大学学报,1997,29(4):1–4.
    [151]王轩,王晓龙.大规模文本计算机音字相互转换技术的研究[J].计算机研究与发展,1998,35(5):417–421.
    [152] Wang X, Chen Q, Yeung D S. Mining Pinyin-to-Character Conversion Rules fromLarge-Scale Corpus: a Rough Set Approach[J]. IEEE Transactions on Systems,Man, and Cybernetics, Part B: Cybernetics,2004,34(2):834–844.
    [153] Wang X, Li L, Yao L, et al. A Maximum Entropy Approach to Chinese PinYin-To-Character Conversion[C]//IEEE International Conference on Systems, Man andCybernetics. Taipei, Taiwan: IEEE,2006:2956–2959.
    [154] Zhao Y, Wang X, Liu B, et al. Research of Pinyin-to-Character Conversion Basedon Maximum Entropy Model[J]. Journal of Electronics,2006,23(6):864–869.
    [155] Xiao J, Liu B, Wang X. Exploiting Pinyin Constraints in Pinyin-to-CharacterConversion Task: a Class-Based Maximum Entropy Markov Model Approach[J].Computational Linguistics and Chinese Language Processing,2007,12(3):325–348.
    [156] Li L, Wang X, Wang X L, et al. A Conditional Random Fields Approach to Chi-nese Pinyin-to-Character Conversion[J]. Journal of Communication and Computer,2009,6(4):25–31.
    [157]姜维,关毅,王晓龙.基于支持向量机的音字转换模型[J].中文信息学报,2007,21(2):100–105.
    [158] Yang S, Zhao H, Lu B. A Machine Translation Approach for Chinese Whole-Sentence Pinyin-to-Character Conversion[C]//Proceedings of the26th Pacific AsiaConference on Language, Information, and Computation. Bali,Indonesia: Facultyof Computer Science, Universitas Indonesia,2012:333–342.
    [159] Li X, Wang X, Yao L. Joint Decoding for Chinese Word Segmentationand POS Tagging Using Character-Based and Word-Based Discriminative Mod-els[C]//Proceedings-2011International Conference on Asian Language Process-ing (IALP2011). Penang, Malaysia: IEEE Computer Society,2011:11–14.
    [160] Liu W, Guthrie L. Chinese Pinyin-Text Conversion on Segmented Text[M]//Text,Speech and Dialogue. Berlin: Springer,2009:116–123.
    [161] Zhou X, Hu X, Zhang X, et al. A Segment-Based Hidden Markov Model forReal-Setting Pinyin-to-Chinese Conversion[C]//Proceedings of the Sixteenth ACMConference on Conference on Information and Knowledge Management. NewYork, NY, USA: ACM,2007:1027–1030.
    [162] Zhang S. Solving the Pinyin-to-Chinese-Character Conversion Problem Based onHybrid Word Lattice[J]. Chinese Journal of Computers,2007,30(7):1145–1153.
    [163] Huang J, Powers D. Adaptive Compression-Based Approach for Chinese PinyinInput[C]//ACL SIGHAN Workshop2004. Barcelona, Spain: Association for Com-putational Linguistics,2004:24–27.
    [164] Tang B, Wang X, Wang X, et al. Frequency-based Online Adaptive N-gramModels[C]//2nd International Conference on Multimedia and Computational In-telligence. Wuhan, Hubei, China: ICMCI,2010:263–266.
    [165] Huang J H, Powers D. Error-Driven Adaptive Language Modeling for ChinesePinyin-to-Character Conversion[C]//2011International Conference on Asian Lan-guage Processing. Penang, Malaysia: IEEE Computer Society,2011:19–22.
    [166] Stolcke A. SRILM-An Extensible Language Modeling Toolkit[C]//Proceedings ofthe International Conference on Spoken Language Processing. Denver, Colorado,USA: ISCA,2002:901–904.
    [167]宋彦,蔡东风,张桂平.一种基于字词联合编码的中文分词方法[J].软件学报,2009,20(9):2366–2375.
    [168] Wang K, Zong C, Su K Y. A Character-Based Joint Model for Chinese Word Seg-mentation[C]//Proceedings of the23rd International Conference on ComputationalLinguistics. Beijing, China: Coling2010Organizing Committee,2010:1173–1181.
    [169] Bishop C M. Pattern Recognition and Machine Learning[M]. Information Scienceand Statistics. Berlin/Heidelberg: Springer,2006:197–242.
    [170] Tseng H, Jurafsky D, Manning C. Morphological Features Help POS Tag-ging of Unknown Words Across Language Varieties[C]//Proceedings of the FourthSIGHAN Workshop on Chinese Language Processing. Chiang Mai, Thailand: TheAssociation for Computer Linguistics,2005:32–39.
    [171] Xue N, Palmer M. Automatic Semantic Role Labeling for ChineseVerbs[C]//Proceedings of the19th International Joint Conference on Artificial In-telligence. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc.,2005，19:1160–1165.
    [172] Zhang Y, Clark S. A Fast Decoder for Joint Word Segmentation and POS-TaggingUsing a Single Discriminative Model[C]//Proceedings of the2010Conference onEmpirical Methods in Natural Language Processing. Cambridge, MA: Associationfor Computational Linguistics,2010:843–852.
    [173] Wang Y, Kazama J, Tsuruoka Y, et al. Improving Chinese Word Segmenta-tion and POS Tagging with Semi-supervised Methods Using Large Auto-AnalyzedData[C]//Proceedings of5th International Joint Conference on Natural LanguageProcessing. Chiang Mai, Thailand: AFNLP,2011:309–317.
    [174] Marcus M P, Marcinkiewicz M A, Santorini B. Building a Large Annotated Corpusof English: the Penn Treebank[J]. Computional Linguistics,1993,19(2):313–330.
    [175] Collobert R, Weston J. A Unified Architecture for Natural Language Processing:Deep Neural Networks with Multitask Learning[C]//Proceedings of the25th Inter-national Conference on Machine Learning. New York, NY, USA: ACM,2008:160–167.
    [176] Schwenk H. Continuous Space Language Models[J]. Computer Speech and Lan-guage,2007,21(3):492–518.
    [177] Collobert R. Deep Learning for Efcient Discriminative Parsing[C]//Proceedingsof the Fourteenth International Conference on Artificial Intelligence and Statistics.Fort Lauderdale, FL, USA: JMLR,2011:224–232.
    [178] Mikolov T, Sutskever I, Chen K, et al. Distributed Representations of Words andPhrases and Their Compositionality[C]//. Lake Tahoe, Nevada, United States: MITPress,2013:1–9.
    [179] Turian J, Ratinov L A, Bengio Y. Word Representations: A Simple and GeneralMethod for Semi-Supervised Learning[C]//Proceedings of the48th Annual Meet-ing of the Association for Computational Linguistics. Uppsala, Sweden: Associa-tion for Computational Linguistics,2010:384–394.
    [180] Mikolov T, Yih W T, Zweig G. Linguistic Regularities in Continuous Space WordRepresentations[C]//Proceedings of the2013Conference of the North AmericanChapter of the Association for Computational Linguistics. Atlanta, Georgia: As-sociation for Computational Linguistics,2013:746–751.
    [181] Gutmann M U, Hyva¨rinen A. Noise-Contrastive Estimation of Unnormalized Sta-tistical Models, with Applications to Natural Image Statistics[J]. Journal of Ma-chine Learning Research,2012,13(1):307–361.
    [182] Mikolov T, Le Q V, Sutskever I. Exploiting Similarities Among Languages forMachine Translation[J]. CoRR,2013,1(1):1–10.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700