基于统计模式识别发音错误自动检测的研究

英文题名：A Study on Automatic Mispronunciation Detection Based on Statistical Pattern Recognition
作者：张峰
论文级别：博士
学科专业名称：信号与信息处理
中文关键词：发音错误检测 ; 统计语音识别 ; 比例化对数后验概率 ; 选择性最大似然线性回归 ; 区分性训练 ; 后端处理 ; 机器学习 ; 半监督聚类
英文关键词：Automatic Mispronunciation Detection ; Statistical Speech Recognition ; SLPP ; SMLLR ; DT ; Back-end Processing ; Machine Learning ; Half-supervised Cluster
学位年度：2009
导师：王仁华 ; 戴礼荣
学科代码：081002
学位授予单位：中国科学技术大学
论文提交日期：2009-05-01

摘要

发音错误自动检测是计算机辅助语言学习系统的关键技术,在很大程度上决定了计算机辅助语言学习系统的性能。可靠的自动发音错误检测技术有助于计算机辅助语言学习系统了解学习者掌握语言的水平,分析出学习者的发音缺陷,针对性的给出改进意见,并给出对应的学习材料,有效的提高学习者的语言水平。本文针对主流的基于统计模式识别的发音错误自动检测技术进行了深入的分析,在声学模型和后端处理方面都进行了有针对性地研究,建立了性能稳定的发音检错系统。本论文的具体工作和研究成果概述如下。
     首先,本文调研了发音错误自动检测技术,通过对该研究背景和现状的分析,选择了基于统计语音识别的策略作为发音错误检测的基本方法。在对基本的发音检错的系统进行介绍时,本文重点说明了系统中的错误检测度量得分算法。针对原有的错误检测度量得分算法在实际使用中的缺陷,本文提出了SLPP算法,其检错性能要明显好于原有算法。在对本文的实验数据库进行介绍时,分析了数据库上几个专家检错结果的一致性问题,了解了人工发音检错的性能,说明了发音检错自动任务的挑战性。
     其次,在声学模型的改进方面,本文提出了引入统计语音识别中的自适应技术,将该技术用于测试数据,减少测试数据与训练数据的不匹配,同时也应用于训练数据,以有效的估计出话者无关的规范模型。在对测试数据采用自适应技术时,本文引入了语音识别中成熟的MLLR算法。由于语音检错与语音识别的目标不一致,MLLR算法不一定能提高发音检错系统的性能。为此,本文针对发音检错的目标,提出了SMLLR的自适应技术;在对训练数据采用自适应技术时,本文引入了语音识别中的SAT算法,以生成规范的声学模型,提高检错性能。由于规范模型会导致其与测试数据更加不一致,因此需要把SAT技术和SMLLR技术结合使用,以有效的提高发音检错系统的性能。
     再次,在声学建模的改进方面,本文还提出了采用语音识别中的区分性训练的思想,针对性的设置与发音检错目标相一致的声学建模目标函数。通过回顾语音识别中的各种区分性训练的方法,本文说明了这些区分性训练的方法如何与语音识别的提高识别率的这个目标函数相一致。然后针对发音检错的任务,本文分析了该任务的目标函数以及与之对应的区分性训练的策略,提出了发音检错的区分性训练的方法要与错误检测度量得分算法相一致,并且提出在进行区分性训练时,训练数据库中除了正确发音的样本外,还需要错误发音的样本,否则区分性训练可能作用不明显。
     此外,除了声学建模的改进以外,本文还从发音检错的后端处理方面,提出了三维后端归一化的处理策略和基于机器学习的后端处理策略。首先,通过对专家打分和实验数据的分析,提出了要在说话人层次上引入说话人整体发音水平的特征;其次,通过对文本相关的后验概率的分析,提出了要在说话内容层次上引入音素类别的特征;再次,通过对系统使用中的干扰问题的分析,提出了要在说话时间层次引入前后文得分的特征。最后,通过引入这三个层次的特征,提出了三维后端归一化的处理策略,大幅度提高了系统性能。三维后端归一化的处理策略也有一些问题,比如多维特征的处理。为了解决这些问题,我们提出了更加可靠的基于机器学习的后端处理策略,通过SVM来处理多维特征的优化。
     最后,通过以上的研究工作,可以实现一个性能比较稳定的发音检错系统,在此基础上,本文提出了发音检错的声学模型自动更新策略,该策略能通过对未标注的原始数据的获得,针对错误发音样本进行处理,不断的提高发音检错系统的性能。首先,本文分析了错误检测度量生成算法,说明了对错误发音进行建模的必要性;接着通过对错误发音的特点和非监督的参数估计的分析,提出了几种错误发音建模的策略,其中错误发音半监督聚类建模的算法效果最好。进一步,通过已建立的性能比较可靠的检错系统以及错误发音建模算法,本文提出了发音检错的声学模型自动更新策略,能够处理未标注的原始数据,改进声学模型的建模空间,提高发音检错系统的性能。
Automatic mispronunciation detection is the key technique of Computer Assisted language learning(CALL) system.With the help of automatic mispronunciation detection module,CALL system can evaluate the language learner,analysis his pronunciation defection and give him the specific advice and most suitable training materials in order to improve his pronunciation level.This thesis focuses on the automatic mispronunciation detection based on statistical pattern recognition and carries out thorough research in the areas of the acoustic model and the back-end processing.The specific work and research findings of this thesis are summarized below.
     Firstly,the automatic mispronunciation detection system based on statistical speech recognition is used as the basic strategy in this thesis through the survey of the current technology.A brief introduction of this system is given.This thesis also introduces the details of the algorithms of the measure of mispronunciation scoring and their defect in actual usage.To eliminate the defect,SLPP algorithm is proposed here.While introducing the experiment databases,the consistence of the mispronunciation detection by the experts on these databases is analyzed,this shows up the performance of the artificial level of mispronunciation detection and considers automatic mispronunciation detection as a challenging task.
     Secondly,in the area of the acoustic modeling,to reduce the mismatch between the training and testing data and build a speaker-independent canonical model,this thesis induces the adaptation technology to the mispronunciation detection system in testing and training.In testing,speaker adaptation based on maximum likelihood linear regression(MLLR) for speech recognition is induced here.Taking account of the difference objections for speech recognition and mispronunciation detection, selective maximum likelihood linear regression(SMLLR) strategy is proposed for the special purpose of mispronunciation detection;In training,adaptive training based on speaker adaptive training(SAT) for speech recognition is induced which can be a useful approach of speaker normalization to reduce the overlap of speaker independent model caused by variation among the speakers of the training data.SAT and SMLLR strategies must be used together as the only canonical model will lead to more inconsistent with the testing data.
     Thirdly,in the area of the acoustic modeling,besides adaptation technology,this thesis also makes use of the notion of discriminative training original for speech recognition and analyses the special objective function consisted with the target of mispronunciation detection.From the review of the various methods of discriminate training for speech recognition,the connection between these methods and the target of speech recognition is shown.With the analysis of the target of mispronunciation detection task and the related objection functions,this thesis proposes that the strategy of the discriminative function must be consisted with the measure of mispronunciation scoring.Furthermore,the mispronunciation samples are needed in the training database for discriminative function of mispronunciation detection.
     Fourthly,besides investigating proper strategy for acoustic modeling,improving the back-end processing can also improve the mispronunciation detection system.In this thesis,three-dimension back-end normalization and machine learning back-end processing strategies are proposed.Three-dimension means the speaker-level, context- level and time-level.As the analysis based on the expert rating and experimental data,this thesis proposes the feature of the speaker overall pronunciation score in the speaker-level;as the analysis of the content-dependent posterior probability algorithm,this thesis proposes the phoneme-related feature in the content-level;as the problem of the actual usage,this thesis proposes the context-related feature in the time-level.For the usage of these three features,this thesis proposed three-dimension back-end normalization strategy.To avoid some defects of this strategy,machine learning back-end processing strategy is proposed here which can deal with the incremental multi-features wisely.
     At last,a reliable system of mispronunciation detection can be achieved by the previous strategies in the acoustic modeling and back-end processing.On the basis of this system,the thesis proposed a strategy of automatic updating of acoustic model by handling of the mispronunciation modeling.The necessity of mispronunciation modeling is proved by the analysis of the algorithms of the measure of mispronunciation scoring.To modeling the mispronunciation,several strategies are proposed.Among them,the performance of half-supervised cluster modeling strategy based on unsupervised parameter estimation is the best.Consequently,through the reliable system and the mispronunciation modeling algorithm,this thesis proposed a strategy for automatic updating of acoustic model of mispronunciation detection, which can continuously improve the acoustic modeling space and the performance of the system.

引文

1.F.Marty.Reflections on the Use of Computers in Second Language Acquisition.Studies in Language Learning.1981,pp.25-53.
    2.K.Ahmad,G.Corbett,M.Rogers and R.Sussex.Computers,Language Learning and Language Teaching.Cambridge University Press.1985.
    3.C.Jones and S.Fortescue.Using Computers in the Language Classroom.London:Longman,1987.
    4.M.Phillips.Communicative Language Learning and the Microcomputer.London:British Council,1987.
    5.J.Underwood.Linguistics,Computers,and the Language Teacher:a Communicative Approach.Rowley,MA:Newbury House,1984.
    6.M.M.Kenning and M.J.Kenning.Computers and Language Learning:Current Theory and Practice.New York:Ellis Horwood,1990.p.90.
    7.M.Warschauer.Computer-Assisted Language Learning:an Introduction.Multimedia Language Teaching Tokyo:Logos,1996,pp.3-20.
    8.C.Matthews.Intelligent Computer Assisted Language Learning as Cognitive Science:The choice of Syntactic Frameworks for Language Tutoring.Journal of Artificial Intelligence in Education 5.1994,pp.4:533-56.
    9.H.Hamada,S.Miki and R.Nakatsu.Automatic evaluation of English pronunciation based on speech recognition techniques.EICE Trans.Inf and Sys.1993,pp.352-359.
    10.B.Dong,Q.W.Zhao and Y.H.Yan.Automatic Scoring of Flat Tongue and Raised Tongue in Computer-assisted Mandarin Learning.ISCSLP2006.2006,pp.580-591.
    11.K.Truong.Automatic Pronunciation Error Detection in Dutch as a Second Language:an Acoustic-Phonetic Approach.Master's thesis,Utrecht University.Netherlands:s.n.,2004.
    12.H.Jiang.Confidence Measures for Speech Recognition:A Survey.Speech Communication.2005,pp.455-470.
    13.A.Asadi,R.Schwartz and J.Makhoul.Automatic Detection of New Words in a Large Vocabulary Continuous Speech Recognition System.IEEE ICASSP 1990,pp.125-128.
    14.R.C.Rose.Keyword Detection in Conversational Speech Utterances Using Hidden Markov Model Based Continuous Speech Recognition.Computer Speech and Language.I995,Vol.9,pp.309-333.
    15.R.A.Sukkar and C.H.Lee.Vocabulary Independent Discriminative Utterance Verification for Nonkeyword Rejection in Subword Based Speech Recognition.IEEE Trans.on Speech and Audio Processing 1996,Vols.4,no.6,pp.420-429.
    16.M.G.Rahim,C.H.Lee and B.H.Juang.Discriminative Utterance Verification for Connected Digits Recognition.IEEE Trans.on Speech and Audio P.rocessing 1997,Vols.5,no.3,pp.266-277.
    17.F.Wessel,K.Macherey and R.Schl(u|¨)ter.Using Word Probabilities as Confidence Measures.IEEE ICASSP.1998,pp.225-228.
    18.F.Wessel,R.Schl(u|¨)ter,K.Macherey and H.Ney.Confidence Measures for Large Vocabulary Continuous Speech Recognition.IEEE Trans.on Speech and Audio Processing.2001,Vols.9,no.3,pp.288-298.
    19.G.Evermann and P.C.Woodland.Large Vocabulary Decoding and Confidence Estimation Using Word Posterior Probabilities.IEEE ICASSP 2000,pp.1655-1658.
    20.G.Evermann and P.C.Woodland.Posterior Probability Decoding,Confidence Estimation and System Combination.Proc.Speech Transcription Workshop.2000.
    21.A.Gunawardana,H.W.Hon and L.Jiang.Word-Based Acoustic Confidence Measures for Large-Vocabulary Speech Recognition.Int.Conf.on Spoken Language Processing.1998,pp.791-794.
    22.G.Hernandez-Arego,X.Menendez-Pidal and L.Olorenshaw.Robust and Efficient Confidence Measure for Isolated Command Recognition.IEEE Workshop on Automatic Speech Recognition and Understanding.2001,pp.449-452.
    23.A.Bayya.Rejection in Speech Recognition Systems with Limited Training.nt.Conf.on Spoken Language Processing.1998,pp.305-308.
    24.C.S.Ramalingam,Y.Gong,L.P.Neysch,W.W.Anderson,J.J.Godfrey and Y-H.Kao.Speaker-Dependent Name Dialing in a Car Environment with Out-of-vocabulary Rejection.IEEE ICASSP.1999,pp.165-168.
    25.E.Eide,H.Gish,P.Jeanrenaud and A.Mielke.Understanding and Improving Speech Recognition Performance through the Use of Diagnostic Tools.IEEE ICASSP.1995,pp.221-224.
    26.L.Chase.Word and Acoustic Confidence Annotation for Large Vocabular Speech Recognition.Proc.Eurospeech.1997,pp.815-818.
    27.C.V.Neti,S.Roukos and E.Eide.Word-based Confidence Measures as a Guide for Stack Search in Speech Recognition.IEEE ICASSP.1997,pp.883-886.
    28.R.Zhang and A.I.Rudnicky.Word Level Confidence Annotation Using Combinations of Features.Proc.Eurospeech.2001,pp.2105-2108.
    29.M.Weintraub,F.Beaufays,Z.Rivlin,Y.Konig and A.Stolcke.Neural-network Based Measures of confidence for Word Recognition.IEEE ICASSP.1997,pp.887-890.
    30.O.Ronen,L.Neumeyer and H.Franco.Automatic Detection of Mispronunciation for Language Instruction.Proc.of Eurospeech.1997,pp.645-648.
    31.H.Franeo,L.Neumeyer,and H.Bratt.Automatic Detection of Phone-Level Mispronunciation for Language Learning.Proc.of Eurospeech.1999,pp.851-854.
    32.A.Ito,Y.L.Lim,M.Suzuki and S.Makino.Pronunciation Error Detection for Computer-Assisted:anguage:earning System Based on Error Rule Clustering Using a Decision Tree.Journal of Acoust.Sci.& Tech.2007,pp.131-133.
    33.M.S.Liang,Z.Y.Hong,R.Y.Lyu and Y.C.Chiang.Data-Driven Approach to Pronunciation Error Detection for Computer Assisted Language Teaching.ICALT2007.2007,pp.359-361.
    34.H.Franco,L.Neumeyer,Y.Kim and O.Ronen.Automatic Pronunciation Scoring for Language Instruction.IEEE ICASSP.1997,pp.1471-1474.
    35.P.Langlais,A.M.Oster and B.Granstrom.Automatic Detection of Mispronunciation in non-native Swedish Speech.Still1998.1998,pp.41-44.
    36.S.M.Witt and S.J.Young.Phone-level Pronunciation Scoring and Assessment for Interactive Language Learning.Speech Communication.2000,pp.95-108.
    37.S.M.Witt.Use of Speech Recognition in Computer-Assisted Language Learning.PhD thesis,University of Cambridge.1999.
    38.S.Wei.基于统计模式识别的发音错误检测研究.PhD thesis,University of Science and Technology of China.2008.
    39.Y.Q.Chen,C.Huang and F.K.Soong.Improving Mispronunciation Detection Using Machine Learning.IEEE ICASSP.2009.
    40.C.Cucchiarini,H.Strik,and Lou Bores.Automatic Pronunciation Grading for Dutch.STill.1998,pp.95-99.
    41.M.Eskenazi.Using Automatic Speech Processing for Foreign Language Pronunciation Tutoring:Some Issues and a Prototype.Language Learning and Technology.1999,pp.62-76.
    42.S.Hiller,E.Ronney,J.Laver,and M.Jack.SPELL:an Automated System for Computer-aided Pronunciation Teaching.Speech Communication.1993,pp.463-473.
    43.H.Murakawa.PROTS(Pronunciation Training System).ICSLP1990.1990,pp.641-643.
    44.A.M.Oster.Spoken L2 Teaching with Contrastive Visual and Auditory Feed-back.ICSLP1998.1998,pp.256-259.
    45.A.R.M.Simoes.Assessing the Contribution of Instructional Technology in the Teaching of Pronunciation.ICSLP1996.1996,pp.1461-1464.
    46.J.Zheng,C.Huang,M.Chu,F.K.Soong and W.Ye.Generalized Segment Posterior Probability for Automatic Mandarin Pronunciation Evaluation.IEEE ICASSP.2007,pp.Ⅳ-201-Ⅳ-204.
    47.L.Baum and J.Eagon.An Inequality with Applications to Statistical Estimation for Probabilistic Functions of Markov Processes and to a Model for Ecology.Bulletin of American Mathematical Society.1967,pp.73:360-363.
    48.K.Tokuda,T.Yoshimura,T.Masuko,T.Kobayashi and T.Kitamura.Speech Parameter Generation Algorithms for HMM-Based Speech Synthesis.Proceedings of ICASSP2000.2000,Vol.3,pp.1315-1318.
    49.P.F.Brown,S.D.Pietra,V.J.D.Pietra and R.L.Mercer.The Mathematic of Statistical Machine Translation:Parameter Estimation.Computational Linguistics.1993,19(2):263-311.
    50.S.J.DeRose.Grammatical Category Disambiguation by Statistical Optimization.Computational Linguistics.1988,14(1):31-39.
    51.L.Rabiner.A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition.Proceedings of the IEEE.1989,77(3):257-286.
    52.X.D.Huang,A.Aeero and H.W.Hon.Spoken Language Processing.Prentice Hall.2001.
    53.J.Lafferty,A.McCallum and F.Pereira.Conditional Random Fields:Probabilistic Models for Segmenting and Labeling Sequence Data.Proceedings of ICML2001.2001,pp.282-289.
    54.A.Gunawardana,M.Mahajan,A.Acero and J.C.Platt.Hidden Conditional Random Fields for PhoneClassification.Proceedings of Eurospeech2005.2005,pp.1117-1120.
    55.L.E.Baum and J.A.Eagon.An Inequality with Applications to Statistical Estimation for Probabilistic Functions of Markov Processes and to a Model for Ecology.An Inequality with Applications to Statistical Estimation for Probabilistic Functions of Markov Processes and to a Model for Ecology.1967,pp.360-363.
    56.L.E.Baum,T.Petrie,G.Soules,and N.Weiss.A Maximization Technique Occurring in the Statistical Analysis of Probabilistic Functions of Markov Chains.Annals of Mathematical Statistics.1970,pp.164-171.
    57.K.F.Lee and H.W.Hon.Speaker-Independent Phone Recognition Using Hidden Markov Models.EEE Trans.on Acoustics,Speech,and Signal Processing.1989,37(11):1641-1648.
    58.S.Ortmanns,H.Ney and X.Aubert.A Word Graph Algorithm for Large Vocabulary Continuous Speech Recognition.Computer Speech and Language Processing.1997,pp.43-72.
    59.姚喜双,刘照雄等.普通话水平测试实施纲要.2004年,商务印书馆.
    60.S.Young,et al.The HTK Book(Revised for HTK version 3.4).2006.
    61.C.J.Leggetter and P.C.Woodland.Maximum Likelihood Lnear Regression for Speaker Adaptation of Continuous Density Hidden Markov Models.Computer Speech and language.1995,pp.171-185.
    62.J.L.Guvain and C.H.Lee.Maximum a Posteriori Estimation for Multivariate Gaussian Observations.IEEE Transactions on Speech and Audio Processing.1994,pp.291-298.
    63.Mo J.F.Gales.The Generation and Use of Regression Class Trees for MLLR Adaptation.Cambridge University,August 1996.
    64.T.Anastasakos,J.McDonough,R.Schwartz and J.Makhoul.A Compact Model for Speaker-Adaptive Training.ICSLP1996.1996.
    65.M.J.F.Gales.Adaptive Training for Robust ASR.ASRU2001.2001.
    66.M.J.F.Gales.Multiple-Cluster Adaptation Training Schemes.Proceedings of ICASSP2001.2001.
    67.T.Hain,P.C.Woodland,T.R.Niesler and E.W.D.Whittaker.The 1998 HTK System for Transcription of Conversational Telephone Speech.Proceedings of ICASSP1999.1999.
    68.L.Lee and R.C.Rose.Speaker Normalisation Using Efficient Frequency Warping Procedures.Processing of ICASSP1996.1996.
    69.M.J.F.Gales.Maximum Likelihood Linear Transformations for HMM-Based Speech Recognition.Computer Speech and Language.1998,pp.75-98.
    70.M.J.F.Gales.Cluster Adaptive Training of Hidden Markov Models.IEEE Trans on Speech and Audio Processing.2000,pp.417-428.
    71.C.J.Leggetter and P.C.Woodland.Speaker Adaptation of HMMs Using Linear Regression.Cambridge University Engineering Department.1994.
    72.D.Giuliani,M.Gerosa and F.Brugnara.Improved Automatic Speech Recognition through Speaker Normalization.Computer Speech and Language.2006,pp.107-123.
    73.L.R.Bahl,P.F.Brown,P.V.Souza,et al.Maximum Mutual Information Estimation of Hidden Markov Model Parameters for Speech Recognition.Proceedings of ICASSP1986.1986,pp.49-52.
    74.P.Brown.The Acoustic-Modeling Problem in Automatic Speech Recognition.Carnegie Mellon University,1987.
    75.B.Merialdo.Phonetic Recognition Using Hidden Markov Models and Maximum Mutual Information.Proceedings of ICASSP1988.1988,Vol.1,pp.111-114.
    76.Y.L.Chow.Maximum Mutual Information Estimation of HMM Parameters for Continuous Speech Recognition Using the N-Best Algorithm.Proceedings of ICASSP1990.1990,Vol.2,pp.701-704.
    77.Y.Normandin.Hidden Markov Models,Maximum Mutual Information Estimation,and the Speech Recognition Problem.McGill University,1991.
    78.Y.Normandin.MMIE Training for Large Vocabulary Continuous Speech Recognition.Proceedings of ICSLP1994.1994,Vol.3,pp.1367-1370.
    79.V.Valtchev,J.Odell,P.Woodland,et al.Lattice-Based Discriminative Training for Large Vocabulary Speech Recognition.Proceedings of ICASSP1996.1996,Vol.2,pp.605-608.
    80.V.Valtchev,J.Odell,P.Woodland,et al.MMIE Training of Large Vocabulary Recognition Systems.Speech Communication.22(4):303-314,1997.
    81.B.H.Juang and S.Katagiri.Discriminative Learning for Minimum Error Classification.IEEE Trans.on Signal Processing.40(12):3043-3054,1992.
    82.E.McDermott and S.Katagiri.String-Level MCE for Continuous Phoneme Recognition.Proceedings of EuroSpeech1997.1997,pp.123-126.
    83.W.Chou,C.H.Lee and B.H.Juang.Minimum Error Rate Training Based on N-Best String Models.Proceedings of ICASSP1993.1993,Vol.2,pp.652-655.
    84.W.Chou,C.H.Lee and B.H.Juang.Minimum Error Rate Training of Inter-Word Context Dependent Acoustic Model Units in Speech Recognition.Proceedings of ICSLP1994.1994,Vol.2,pp.439-442.
    85.L.Saul and M.Rahim.Maximum Likelihood and Minimum Classification Error Factor Analysis for Automatic Speech Recognition.IEEE Trans.on Speech and Audio Processing.8(2):115-125,2000.
    86.W.Macherey,L.Haferkamp,R.Schl(u|¨)ter,et al.Investigations on Error Minimizing Training Criteria for Discriminative Training in Automatic Speech Recognition.Proceedings of EuroSpeech2005.2005,pp.2133-2136.
    87.R.Schl(u|¨)ter,W.Macherey,B.M(u|¨)ller,et al.Comparison of Discriminative Training Criteria and Optimization Methods for Speech Recognition.Speech Communication.34:287-310,2001.
    88.D.Povey and P.Woodland.Minimum Phone Error and I-Smoothing for Improved Discriminative Training.Proceedings of ICASSP2002.2002年,卷1,页105-108.
    89.H.Soltau,B.Kingsbury,L.Mangu,et al.The IBM 2004 Conversational Telephony System for Rich Transcription.Proceedings of ICASSP2005.2005,Vol.1,pp.205-208.
    90.T.Hain,P.Woodland,G.Evermann,et al.Automatic Transcription of Conversational Telephone Speech.IEEE Trans.on Speech and Audio Processing.13(6):1173-1185,2005.
    91.S.Chen,B.Kingsbury,L.Mangu,et al..Advances in Speech Transcription at IBM under the DARPA EARS Program.IEEE Trans.on Audio,Speech and Language Processing.14(5):1596-1608,2006.
    92.J.Du,P.Liu,F.Soong,et al.Minimum Divergence Based Discriminative Training.Proceedings of ICSLP2006.2006,pp.2410-2413.
    93.G.Heigold,W.Macherey,R.Schliiter,et al.Minimum Exact Word Error Training.Proceedings of IEEE Workshop on Automatic Speech Recognition and Understanding.2005,pp.186-190.
    94.D.Povey and B.Kingsbury.Evaluation of Proposed Modifications to MPE for Large Scale Discriminative Training.Proceedings of ICASSP2007.2007,Vol.4,pp.321-324.
    95.D.Povey.Improvements to fMPE for Discriminative Training of Features.Proceedings of Eurospeech2005.2005,pp.2977-2980.
    96.D.Povey,M.Gales,D.Kim,et al.MMI-MAP and MPE-MAP for Acoustic Model Adaptation.Proceedings of Eurospeech2003.2003,pp.1981-1984.
    97.R.Sukkar and C.H.Lee.Vocabulary Independent Discriminative Utterance Verification for Nonkeyword Rejection in Subword Based Speech Recognition.IEEE Trans.on Speech and Audio Processing.4(6):420-429,1996.
    98.F.Korkmazskiy and B.H.Juang.Discriminative Adaptation for Speaker Verification.Proceedings of ICSLP1996.1996,Vol.3,pp.1744-1747.
    99.R.A.Sukkar.Subword-Based Minimum Verification Error(SB-MVE) Training for Task Independent Utterance Verification.Proceedings of ICASSP1998.1998,pp.229-232.
    100.R.H.Wang,Q.F.Liu and S.Wei.Putonghua Proficiency Test and Evaluation.Advances in Chinese Spoken Language Processing.Chapter 18,2006.
    101.V.N.Vapnik.Estimation of Dependencies Based on Empirical Data.Berlin:Springer-Verlag,1982,pp.123-125.
    102.V.N.Vapnik.The Nature of Statistical Learning Theory.1995,pp.45-48.
    103.V.Cherkassky and F.Mulier.Learning from Data:Concepts,Theory and Methods.1997,pp.97-99.
    104.C.M.Bishop.Pattern Recognition and Machine Learning.Springer,2006.
    105.W,Karush.Minima of Functions of Several Variables with Inequalities as Side Constraints.Master thesis,University of Chicago.1939.
    106.H.W.Kuhn and A.W.Tucker.Nonlinear Programming.University of California Press,1951,pp.481-492.
    107.T.Kosaka and S.Sagayama.Tree Structured Speaker Clustering for Fast Speaker Adaptation.Proc.of ICASSP1994.1994,Vol.1,pp.245-248.
    108.N.Sugamura,K.Shikano and S.Furui.Isolated Word Recognition Using Phoneme-like Templates.Proc.of ICASSP83.1983,Vol.8,pp.723-726.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700