用户名: 密码: 验证码:
声学模型区分性训练及其在LVCSR系统的应用
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
声学模型区分性训练是近年来语音识别领域的研究热点之一,它已经成为当今主流的语音识别系统,尤其是大词汇量连续语音识别LVCSR系统中最重要的模型训练手段之一。本文主要针对声学模型区分性训练及其在LVCSR系统中的应用问题进行较深入的研究和讨论。另外,本文对语音识别系统的另一个重要模块——置信度判决也有所涉猎。
     首先,本文提出了一种新颖的、称为“受限线性搜索”CLS的优化算法,该算法用于语音识别区分性训练中的CDHMM模型参数更新。CLS方法可以用于区分性训练统一准则框架下各种区分性准则的模型更新,包括MMI、MCE、MWE/MPE等。在该方法中,HMM的区分性训练问题首先被定义为一个受限优化问题,并且直接使用模型间的KLD度量来定量的描述所定义的模型间限制。接着,基于简单的线性搜索思想,我们发现在将该模型限制转化为二次函数形式后,可以很容易获得模型更新参数的闭式解。CLS方法可以用于优化CDHMM模型中的各种参数,包括高斯均值、协方差矩阵、权重等。
     接着,本文对我们此前提出的称为“信任区域”(Trust Region)的区分性训练模型参数更新方法进行了进一步理论分析和扩展。Trust Region方法通过将MMI区分性训练问题转变为一个优化理论中可参考的标准问题,从而准确高效的求取待优化函数的全局最优点。在引入上述模型间限制的前提下,Trust Region方法可以对区分性训练中的辅助函数进行完美的优化。然而,在区分性训练中对辅助函数的最优化无法保证对原始目标函数的优化。因此我们通过对Trust Region问题的深入理论分析,提出构造一种称为“有界信任区域”(Bounded Trust Region)的新辅助函数。该辅助函数仍然是目标函数的有效估计,更重要的是,在满足模型间限制的前提下,该辅助函数是原始目标函数的下界。这个优良品质可以确保对该辅助函数的最优化也能够带来对目标函数的优化。另外,这里构造的新辅助函数仍然可以直接使用标准的Trust Region方法来解决,从而可以快速求取全局最优点。实验表明基于Bounded Trust Region的方法超越了传统的EBW算法和原始Trust Region方法。
     第三,本文还针对实际的LVCSR系统中存在的若干问题进行了探讨,包括处理海量训练语料时的计算能力问题和由此导致的效率瓶颈,以及区分性训练中普遍存在的推广性问题等。在此基础上,我们分别结合基于WFST解码器生成的具有优良品质的词图,和传统的基于HTK计算区分性训练相关统计量的工具,搭建了一套用于区分性训练的新流程。该流程相对于传统完全基于HTK流程的区分性训练,不仅在训练效率上得到了极大的优化,在识别性能上也有一定的提升。
     最后,本文在语音识别系统的重要模块之一——置信度判决CM方向进行了相关工作。我们首先基于语音识别系统的输出定义了所谓的“目标区域”和“非目标区域”,并分别针对不同的区域选择合适的置信度判决方法。我们尝试发掘“非目标区域”中的额外信息,以期对传统只基于“目标区域”进行CM计算的方法起到补充作用。实验结果表明,基于“非目标区域”的置信度对基于“目标区域”的置信度有很好的补充作用。接下来,我们又进一步利用贝叶斯信息准则对“非目标区域”中所吸收的语音边界进行定位,基于定位后的置信度取得了更多的性能提升。
In past few decades, discriminative training (DT) has been a very active research area in automatic speech recognition (ASR). Discriminative training of acoustic model has become one of the most important training methods for state-of-the-art speech recogni-tion systems, especially for large vocabulary continuous speech recognition (LVCSR) systems. This thesis focuses on discriminative training of acoustic model and its appli-cation in LVCSR tasks. It also covers another important module in speech recognition, confidence measure (CM).
     Firstly, this thesis proposes a novel optimization algorithm called constrained line search (CLS) for discriminative training (DT) of Gaussian mixture CDHMM in speech recognition. The CLS method is formulated under a general framework for optimiz-ing any discriminative objective functions including MMI, MCE, MPE/MWE, etc. In this method, discriminative training of HMM is first cast as a constrained optimiza-tion problem, where Kullback-Leibler divergence (KLD) between models is explicitly imposed as a constraint during optimization. Based upon the idea of line search, we show that a simple formula of HMM parameters can be found by constraining the KLD between HMM of two successive iterations in an quadratic form. The proposed CLS method can be applied to optimize all model parameters in Gaussian mixture CDHMMs, including means, covariances, and mixture weights.
     Secondly, based on the theoretical analysis of original Trust Region (TR) based optimization method we have proposed before, this thesis proposes a new method to construct an auxiliary function for the discriminative training of HMMs in speech recognition. In original Trust Region method, the MMI based discriminative train-ing is treated as a standard trust region problem in optimization theory. And the global optimum of this problem can be obtained efficiently. However, optimizing the auxiliary function cannot guarantee increasing of original objective function. The proposed new auxiliary function still serves as a first-order approximation of the original objective function but more importantly it remains as a lower bound of the original objective function as well. Due to its lower-bound property, the found optimal point is theoret-ically guaranteed to increase the original discriminative objective function. Further-more, the TR method can also be applied to find the globally optimal point of the new auxiliary function. The proposed bounded trust region methods have been investigated on several LVCSR tasks and experimental results show that the bounded TR method based on the new auxiliary function outperforms both the conventional EBW method and the original TR method based on the old auxiliary function.
     Thirdly, this thesis investigate several practical problems in LVCSR systems, e.g., computing ability and efficiency problems in discriminative training of HMMs in speech recognition, generalization problem in LVCSR system. We propose to build a novel procedure of discriminative training in LVCSR systems, by combining the word graph generated using WFST based decoder and calculating tools from HTK. When conducting discriminative training under this new procedure, not only the efficiency is significantly improved, we also achieve better recognition performance.
     Lastly, in this thesis, appropriate confidence measures (CMs) are investigated for Mandarin command word recognition, both in the so-called target region and non-target region, respectively. Here the target region refers to the recognized speech part of command word while the non-target region refers to the recognized silence part. It shows that exploiting extra information in the non-target region can effectively comple-ment the traditional CM which usually focus on the target region. Furthermore, when analyzing the non-target region in a more theoretical way, where Bayesian information criterion (BIC) is employed to locate more precise boundary in the non-target region, even more improvement is achieved.
引文
[1]Furui S.50 Years of Progress in Speech and Speaker Recognition Research. ECTI Trans on Computer and Information Technology,2005, 1(2):64-74.
    [2]Morgan N.5.1 Automatic Speech Recognition (ASR) History. Lecture5:Audio Signal Processing in Humans and Machines,1995.
    [3]Davis K H, Biddulph R, Balashek S. Automatic Recognition of Spoken Digits. The Journal of the Acoustical Society of America,1952,24(6):637-642.
    [4]Fry D B, Denes P. Theoretical Aspects of Mechanical Speech Recognition,The Design and Operation of the Mechanical Speech Recognizer at University College London. Journal British Institution of Radio Engineering,1959,19(4):211-229.
    [5]Forgie J W, Forgie C D. Results Obtained from A Vowel Recognition Computer Program. The Journal of the Acoustical Society of America,1959,31(11):1480-1489.
    [6]Vintsyuk T K. Speech Discrimination by Dynamic Programming. Cybernetics and Systems Analysis,1968,4(1):81-88.
    [7]Martin T B, Nelson A L, Zadell H J. Speech Recognition by Feature Abstraction Tech-niques. Technical report, Air Force Avionics Lab,1964.
    [8]Reddy D R. Approach to Computer Speech Recognition by Direct Analysis of the Speech Wave. Technical report, Stanford Univ.,1966.
    [9]Suzuki J, Nakata K. Recognition of Japanese Vowels-Preliminary to The Recognition of Speech. Journal Radio Research Laboratory,1961,37(8):193-212.
    [10]Sakai T, Doshita S. The Phonetic Typewriter Information Processing 1962. Proceedings of IFTP Congress,1962.445-450.
    [11]Nagata K, Kato Y, Chiba S. Spoken Digit Recognizer for Japanese Language. NEC Res. Develop,1963,6.
    [12]Klatt D H. Review of the ARPA Speech Understanding Project. The Journal of the Acous-tical Society of America,1977,62(6):1345-1366.
    [13]Erman L D. Overview of the Hearsay Speech Understanding Research. ACM SIGART Bulletin,1976, (56):9-16.
    [14]Lowerre B. The Harpy Speech Understanding System. Morgan Kaufmann Publishers Inc., 1990:576-586.
    [15]Jelinek F, Bahl L, Mercer R. Design of a Linguistic Statistical Decoder for the Recognition of Continuous Speech. IEEE Trans, on Information Theory,1975,21(3):250-256.
    [16]Itakura F, Saito S. A Statistical Method for Estimation of Speech Spectral Density and Formant Frequencies. Electronics and Communications in Japan,1970,53(A):36-43.
    [17]Velichko V M, Zagoruyko N G. Automatic Recognition of 200 Words. International Journal of Man-Machine Studies,1970,2(2):223.
    [18]Sakoe H, Chiba S. Dynamic Programming Algorithm Optimization for Spoken Word Recognition. IEEE Trans, on Acoustics, Speech and Signal Processing,1978,26(1):43-49.
    [19]Itakura F. Minimum Prediction Residual Applied to Speech Recognition. IEEE Trans, on Acoustics, Speech and Signal Processing,1975,23(1):67-72.
    [20]Myers C S, Rabiner L R. A Level Building Dynamic Time Warping Algorithm for Con-nected Word Recognition. IEEE Trans, on Acoustics, Speech and Signal Processing,1981, 29(3):284-297.
    [21]Lee C H, Rabiner L R. A Frame Synchronous Network Search Algorithm for Con-nected Word Recognition. IEEE Trans, on Acoustics, Speech and Signal Processing,1989, 37(11):1649-1658.
    [22]Sakoe H. Two Level DP Matching-A Dynamic Programming Based Pattern Matching Algorithm for Connected Word Recognition. IEEE Trans, on Acoustics, Speech and Signal Processing,1979,27(6):588-595.
    [23]Schwartz R, Chow Y, Kimball O, et al. Context-Dependent Modeling for Acoustic-Phonetic Recognition of Continuous Speech. Proceedings of ICASSP1985,1985. Vol.10,1205-1208.
    [24]Juang B H, Levinson S, Sondhi M. Maximum Likelihood Estimation for Multivariate Mix-ture Observations of Markov Chains. IEEE Trans, on Information Theory,1986,32(2):307-309.
    [25]Rabiner L. A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition. Proceedings of the IEEE,1989,77(3):257-286.
    [26]Jelinek F. The Development of An Experimental Discrete Dictation Recognizer. IEEE, 1985,73(11):1616-1624.
    [27]Furui S. Speaker Independent Isolated Word Recognition Using Dynamic Features of Speech Spectrum. IEEE Trans, on Acoustics, Speech and Signal Processing,1986, 34(1):52-59.
    [28]Lippmann R P. An Introduction to Computing with Neural Nets. IEEE Magazine on Acous-tics, Speech and Signal Processing,1987,4(2):4-22.
    [29]Waibel A, Hanazawa T, Hinton G, et al. Phoneme Recognition Using Time-delay Neural Networks. IEEE Trans, on Acoustics, Speech and Signal Processing,1989,37(3):393-404.
    [30]Lee K F. Large-Vocabulary Speaker-Independent Continuous Speech Recognition:The Sphinx System[D]. Carnegie Mellon University,1988.
    [31]Lee K F, Hon H W, Reddy R. An Overview of The SPHINX Speech Recognition System. IEEE Trans, on Acoustics, Speech and Signal Processing,1990,38(1):35-45.
    [32]Weintraub M, Murveit H, Cohen M, et al. Linguistic Constraints in Hidden Markov Model Based Speech Recognition. Proceedings of ICASSP,1989.699-702.
    [33]Chow Y, Dunham M, Kimball O, et al. BYBLOS:The BBN Continuous Speech Recogni-tion System. Proceedings of ICASSP1987,1987. Vol.12,89-92.
    [34]Gauvain J L, Lee C H. Maximum a Posteriori Estimation for Multivariate Gaussian Mixture Observations of Markov Chains. IEEE Trans, on Speech and Audio Processing,1994, 2(2):291-298.
    [35]Leggetter C, Woodland P. Maximum Likelihood Linear Regression for Speaker Adaptation of Continuous Density Hidden Markov Models. Computer Speech and Language,1995, 9(2):171-185.
    [36]Young S, Odell J, Woodland P. Tree-Based State Tying for High Accuracy Acoustic Mod-elling. Proceedings of ARPA Workshop on Human Language Technology,1994.307-312.
    [37]Odell J. The Use of Context in Large Vocabulary Speech Recognition[D]. Cambridge University,1995.
    [38]Bahl L R, Brown P F, Souza P V, et al. Maximum Mutual Information Estimation of Hidden Markov Model Parameters for Speech Recognition. Proceedings of ICASSP1986,1986. 49-52.
    [39]Juang B H, Katagiri S. Discriminative Learning for Minimum Error Classification. IEEE Trans, on Signal Processing,1992,40(12):3043-3054.
    [40]Woodland P C, Povey D. Large Scale Discriminative Training of Hidden Markov Models for Speech Recognition. Computer Speech and Language,2002,16(1):25-47.
    [41]Juang B H, Chou W, Lee C H. Minimum Classification Error Rate Methods for Speech Recognition. IEEE Trans, on Speech and Audio Processing,1997,5(3):257-265.
    [42]Chou W, Juang B, Lee C. Segmental GPD Training of HMM Based Speech Recognizer. Proceedings of ICASSP1992,1992. Vol.1,473-476.
    [43]Katagiri S, Juang B H, Lee C H. Pattern Recognition Using a Family of Design Algorithms Based Upon the Generalized Probabilistic Descent Method. Proceedings of the IEEE,1998, 86(11):2345-2373.
    [44]Huang X, Acero A, Alleva F, et al. Microsoft Windows Highly Intelligent Speech Recog-nizer:Whisper. Proceedings of ICASSP1995,1995. Vol.1,93-96.
    [45]Young S, et al. The HTK Book (Revised for HTK version 3.4.1). Cambridge University, 2009.
    [46]Liu Y, Shriberg E, Stolcke A, et al. Structural Metadata Research in the EARS Program. Proceedings of ICASSP,2005.957-960.
    [47]Soltau H, Kingsbury B, Mangu L, et al. The IBM 2004 Conversational Telephone System for Rich Transcription. Proceedings of ICASSP,2005.205-208.
    [48]Soltau H, Saon G, Kingsbury B, et al. The IBM 2006 GALE Arabic ASR System. Proceed-ings of ICASSP2007,2007. Vol.4,349-353.
    [49]Povey D, Woodland P C. Minimum Phone Error and I-Smoothing for Improved Discrimi-native Training. Proceedings of ICASSP,2002.105-108.
    [50]Schluter R. Investigations on Discriminative Training Criteria[D]. RWTH Aachen Univer-sity,2000.
    [51]Macherey W, Haferkamp L, Schluter R, et al. Investigations on Error Minimizing Train-ing Criteria for Discriminative Training in Automatic Speech Recognition. Proceedings of Eurospeech,2005.2133-2136.
    [52]Zweig G. Speech Recognition with Dynamic Bayesian Networks[D]. University of Cali-fornia, Berkeley,1998.
    [53]Hasegawa-Johnson M, Baker J, Borys S, et al. Landmark-Based Speech Recognition:Re-port of the 2004 Johns Hopkins Summer Workshop. Proceedings of ICASSP,2005.213-216.
    [54]Seide F, Yu P, Ma C, et al. Vocabulary-Independent Search in Spontaneous Speech. Pro-ceedings of ICASSP2004,2004. Vol.1,253-256.
    [55]Dupont S, Luettin J. Audio-visual Speech Modeling for Continuous Speech Recognition. IEEE Trans, on Multimedia,2000,2(3):141-151.
    [56]Davis S, Mermelstein P. Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Sentences. IEEE Trans, on Acoustics, Speech, and Signal Processing,1980,28(4):357-366.
    [57]Hermansky H. Perceptual Linear Predictive (PLP) Analysis of Speech. The Journal of the Acoustical Society of America,1990,87(4):1738-1752.
    [58]Biem A, Katagiri S, Juang B H. Pattern Recognition Using Discriminative Feature Extrac-tion. IEEE Trans, on Signal Processing,1997,45(2):500-504.
    [59]Povey D, Kingsbury B, Mangu L, et al. fMPE:Discriminatively Trained Features for Speech Recognition. Proceedings of ICASSP2005,2005. Vol.1,961-964.
    [60]Bahl L R, Jelinek F, Mercer R L. A Maximum Likelihood Approach to Continuous Speech Recognition. IEEE Trans, on Pattern Analysis and Machine Intelligence,1983,5:179-190.
    [61]Katz S. Estimation of Probabilities from Sparse Data for the Language Model Component of a Speech Recognizer. IEEE Trans, on Acoustics, Speech, and Signal Processing,1987, 35(3):400-401.
    [62]Ney H, Essen U, Kneser R. On Structuring Probabilistic Dependences in Stochastic Lan-guage Modelling. Computer Speech and Language,1994,8(1):1-38.
    [63]Viterbi A. Error Bounds for Convolutional Codes and an Asymptotically Optimum Decod-ing Algorithm. IEEE Trans, on Information Theory,1967,13(2):260-269.
    [64]Bocchieri E. Vector Quantization for the Efficient Computation of Continuous Density Likelihoods. Proceedings of ICASSP1993,1993. Vol.2,692-695.
    [65]Ney H, Mergel D, Noll A, et al. A Data-Driven Organization of the Dynamic Programming Beam Search for Continuous Speech Recognition. Proceedings of ICASSP1987,1987. Vol. 12,833-836.
    [66]Steinbiss V, Tran B H, Ney H. Improvements in Beam Search. Proceedings of ICSLP1994, 1994.2143-2146.
    [67]Jelinek F. Fast Sequential Decoding Algorithm Using a Stack. IBM Journal of Research and Development,1969,13(6):675-685.
    [68]Schwartz R, Chow Y L. The N-Best Algorithms:An Efficient and Exact Procedure for Finding the N Most Likely Sentence Hypotheses. Proceedings of ICASSP1990,1990. Vol. 1,81-84.
    [69]Soong F, Huang E F. A Tree-Trellis Based Fast Search for Finding the N-Best Sentence Hypotheses in Continuous Speech Recognition. Proceedings of ICASSP1991,1991. Vol. 1,705-708.
    [70]Ney H, Aubert X. A Word Graph Algorithm for Large Vocabulary, Continuous Speech Recognition. Proceedings of ICSLP1994,1994.1355-1358.
    [71]Lo W K, Soong F. Generalized Posterior Probability for Minimum Error Verification of Recognized Sentences. Proceedings of ICASSP2005,2005. Vol.1,85-88.
    [72]Qian Y. Use of Tone Information in Cantonese LVCSR Based on Generalized Character Posterior Probability Decoding[D]. The Chinese University of Hong Kong,2005.
    [73]Yan Z J, Soong F K, Wang R H. Word Graph Based Feature Enhancement for Noisy Speech Recognition. Proceedings of ICASSP2007,2007. Vol.4,373-376.
    [74]Baum L, Eagon J. An Inequality with Applications to Statistical Estimation for Proba-bilistic Functions of Markov Processes and to a Model for Ecology. Bulletin of American Mathematical Society,1967,73:360-363.
    [75]Tokuda K, Yoshimura T, Masuko T, et al. Speech Parameter Generation Algorithms for HMM-Based Speech Synthesis. Proceedings of ICASSP2000,2000. Vol.3,1315-1318.
    [76]Brown P F, Pietra S D, Pietra V J D, et al. The Mathematic of Statistical Machine Transla-tion:Parameter Estimation. Computational Linguistics,1993,19(2):263-311.
    [77]DeRose S J. Grammatical Category Disambiguation by Statistical Optimization. Compu-tational Linguistics,1988,14(1):31-39.
    [78]Lafferty J, McCallum A, Pereira F. Conditional Random Fields:Probabilistic Models for Segmenting and Labeling Sequence Data. Proceedings of ICML2001,2001.282-289.
    [79]Jensen J L W V. Sur les fonctions convexes et les inegalites entre les valeurs moyennes. Acta Mathematica,1906,30(1):175-193.
    [80]Lee K F. Automatic Speech Recognition:The Development of the Sphinx Recognition System. Kluwer Academic Publishers,1989.
    [81]李小兵.高效简约的语音识别声学模型[博士学位论文].中国科学技术大学,2006.
    [82]Yan Z J, Liu P, Du J, et al. Training Discriminative HMM by Optimal Allocation of Gaussian Kernels. Proceedings of ISCSLP2006,2006.289-298.
    [83]Valtchev V. Discriminative Methods in HMM-Based Speech Recognition[D]. Cambridge University,1995.
    [84]Nadas A, Nahamoo D, Picheny M. On A Model-Robust Training Method for Speech Recognition. IEEE Trans, on Acoustics, Speech, and Signal Processing,1988,36(9):1432-1436.
    [85]Schliiter R, Macherey W. Comparison of Discriminative Training Criteria. Proceedings of ICASSP1998,1998. Vol.1,493-496.
    [86]Schliiter R, Macherey W, Muller B, et al. Comparison of Discriminative Training Criteria and Optimization Methods for Speech Recognition. Speech Communication,2001,34:287-310.
    [87]Roux J L, McDermott E. Optimization Methods for Discriminative Training. Proceedings of InterSpeech2005,2005.3341-3344.
    [88]Fahlman S E. An Empirical Study of Learning Speech in Back-propagation Networks. Proceedings of Tech. Rep. CMU-CS-88-162,1988.
    [89]McDermott E. Discriminative Training for-Speech Recognition[D], Waseda University, 1997.
    [90]Broyden C G, Dennis J E, Jr., et al. On the Local and Superlinear Convergence of Quasi-Newton Methods. Journal of the Institute of Mathematics and its Applications,1973, 12(1):223-245.
    [91]Gopalakrishnan P, Kanevsky D, Nadas A, et al. An Inequality for Rational Functions with Applications to Some Statistical Estimation Problems. IEEE Trans, on Information Theory, 1991,37(1):107-113.
    [92]Kanevsky D. A Generalization of the Baum Algorithm to Functions on Non-Linear Mani-folds. Proceedings of ICASSP1995,1995. Vol.1,473-476.
    [93]Cardin R, Normandin Y, De Mori R. High-Performance Connected Digit Recognition Using Maximum Mutual Information Estimation. Proceedings of ICASSP1991,1991. Vol.1, 533-536.
    [94]Normandin Y. Hidden Markov Models, Maximum Mutual Information Estimation, and the Speech Recognition Problem[D]. McGill University,1991.
    [95]Kapadia S, Valtchev V, Young S. MMI Training for Continuous Phoneme Recognition on the TIMIT Database. Proceedings of ICASSP1993,1993. Vol.2,491-494.
    [96]Valtchev V, Odell J, Woodland P, et al. Lattice-Based Discriminative Training for Large Vocabulary Speech Recognition. Proceedings of ICASSP1996,1996. Vol.2,605-608.
    [97]Valtchev V, Odell J J, Woodland P C, et al. MMIE Training of Large Vocabulary Recogni-tion Systems. Speech Communication,1997,22(4):303-314.
    [98]Povey D. Discriminative Training for Large Vocabulary Speech Recognition[D]. Cam-bridge University,2004.
    [99]Afify M. Extended Baum-Welch Reestimation of Gaussian Mixture Models Based on Re-verse Jensen Inequality. Proceedings of Eurospeech2005,2005.1113-1116.
    [100]McDermott E, Hazen T J, Roux J L, et al. Discriminative Training for Large Vocabulary Speech Recognition Using Minimum Classification Error. IEEE Transactions on Audio, Speech and Language Processing,2007,15(1):203-223.
    [101]Liu C, Liu P, Jiang H, et al. A Constrained Line Search Optimization for Discriminative Training in Speech Recognition. Proceedings of ICASSP2007,2007. IV-329-IV-332.
    [102]Liu P, Liu C, Jiang H, et al. A Constrained Line Search Optimization Method for Discrimi-native Training of HMMs. IEEE Trans, on Speech and Audio Processing,2008,16(1):900-909.
    [103]Kullback S, Leibler R A. On Information and Sufficiency. The Annals of Mathematical Statistics,1951,22(1):79-86.
    [104]Afify M, Li X W, Jiang H. On Information and Sufficiency. Trans, on Audio, Speech and Language Processing,2007,15(8):2405-2417.
    [105]Hindi H. A Tutorial on Convex Optimization. Proceedings of American Control Conference 2004,2004.
    [106]Du J, Liu P, Jiang H, et al. A New Minimum Divergence Approach to Discriminative Training. Proceedings of ICASSP2007,2007. Vol.4,677-680.
    [107]Pallett D S, Fisher W, Fiscus J G. Tools for the Analysis of Benchmark Speech Recognition Tests. Proceedings of ICASSP1990,1990.97-100.
    [108]鄢志杰.声学模型区分性训练及其在自动语音识别中的应用[博士学位论文].中国科学技术大学,2008.
    [109]Yan Z J, Liu C, Hu Y, et al. A Trust Region Based Optimization for Maiximum Mutual Information Estimation of HMMs in Speech Recognition. Proceedings of ICASSP2009, 2009. Vol.4,677-680.
    [110]Liu C, Hu Y, Jiang H, et al. A Bounded Trust Region Optimization for Discriminative Training of HMMs in Speech Recognition. Proceedings of ICASSP2010,2010.4914-4917.
    [111]Nocedal J, Wright S J. Numerical Optimization.2nd ed., Springer,2006.
    [112]Jiang H, Li X. A General Approximation-Optimization Approach to Large Margin Estima-tion of HMMs. chapter 7, pp.103-120, Robust Speech Recognition and Understanding. I-TECH Education and Publishing,2007.
    [113]Jiang H. Discriminative Training of HMMs for Automatic Speech Recognition:A Survey. Computer Speech and Language,2009.
    [114]Mohri M, Pereira F, Riley M. Weighted Finite-state Transducers in Speech Recognition. Computer Speech and Language,2002,16(1):69-88.
    [115]Jiang H. Confidence Measures for Speech Recognition:A Survey. Speech Communication, 2005,45(4):455-470.
    [116]Wessel F J, Schluter R, Macherey K, et al. Confidence Measures for Large Vocabulary Continuous Speech Recognition. IEEE Trans, on Speech and Audio Processing,2001, 9(3):288-298.
    [117]Liu P, Tian Y, Zhou J L, et al. Background Model Based Posterior Probability for Measuring Confidence. Proceedings of Eurospeech 2005,2005.1465-1468.
    [118]Guo G, Huang C, H. Jiang R H W. A Comparative Study on Various Confidence Measures in Large Vocabulary Speech Recognition. Proceedings of ISCSLP2004,2004.
    [119]S.R. Y. Detecting Misrecognitions and Out-ofvocabulary Words. Proceedings of ICASSP1994,1994.11-21-11-24.
    [120]Soong F K, Lo W K, Nakamura S. Generalized Word Posterior Probability (GWPP) for Measuring Reliability of Recognized Words. Proceedings of SWIM2004,2004.
    [121]Liu C,, Hu Y, et al. Exploiting Non-target Region Information for Confidence Measure Based on Bayesian Information Criterion. Proceedings of ISCSLP2008,2008.229-232.
    [122]Guo X F, Zhu W B, Shi Q. The IBM LVCSR System Used for 1998 Mandarin Broadcast News Transcription Evaluation. Proceedings of the 1999 DARPA Broadcast News Work-shop,1999.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700