汉语普通话发音自动评测方法的研究

设为首页

收藏本站

网站地图 | English | 公务邮箱

远程访问

NSTL服务站

汉语普通话发音自动评测方法的研究

详细信息本馆镜像全文| 推荐本文 | | 获取CNKI官网全文

英文题名：Research on Automatic Evaluation Methods of Mandarin Pronunciation
作者：张茹
论文级别：博士
学科专业名称：计算机应用技术
中文关键词：发音自动评测 ; 模型感知度 ; 基频 ; 声调识别 ; 共振峰
英文关键词：automatic pronunciation evaluation ; perception of model ; fundamental frequency ; tone recognition ; formant
学位年度：2013
导师：韩纪庆
学科代码：081203
学位授予单位：哈尔滨工业大学
论文提交日期：2013-01-01

摘要

中国经济的平稳增长吸引了越来越多的外国人开始学习汉语，但由于学习环境限制和专业教学人员缺失等原因阻碍了汉语国际传播工作的推进，因此对外汉语辅助教学系统的研发已成为当务之急。发音自动评测是对外汉语辅助教学的关键技术之一，其评测结果是给出教学反馈意见的基础，而反馈意见的给出会有效的提高学习者的学习效率。由于汉语是一种单音节有调语言，其在语音拼读方面与印欧语系的语言有明显不同：汉语是由单音节构成的，其音节可以拆分成声母、韵母和声调三个部分，其中声调作为语言学区分性特征具有表意的作用。
     因此，在对汉语普通话进行评测时，除需要考虑一般发音评测方法所面临的语音信号自身多变和冗余的问题之外，还需要考虑如何结合实际教学需求针对外国汉语学习者普通话发音特点进行评测。同时，汉语音节的三元结构使得外国学习者的发音错误更加多样和复杂，采用单一的声学特征对多种发音错误进行评测很难取得理想的评测结果，应尽可能的选择多种与发音错误类型相关的声学辅助特征进行发音自动评测。
     综上所述，研究用于对外汉语教学的发音自动评测方法，对汉语的国际推广有着非常重要的理论意义和实用价值。
     本文以外国汉语学习者的普通话发音为研究对象，对发音自动评测方法进行了深入研究。以汉语普通话标准和非标准发音语料库为数据基础，从减少母语迁移影响和促进新语音规则建立的角度出发，提取发音自动评测特征构建相应的发音评测模型，分别从汉语的声、韵、调和融合方法四个方面提升发音自动评测方法的性能。论文的具体研究工作如下：
     （1）研究了一种基于音素模型感知度的发音自动评测方法，以解决外国汉语学习者在普通话学习过程中母语相似发音的替代问题。该方法根据最小化贝叶斯误差原理，通过增大声学特征分别属于标准音素和非标准音素的概率分布之间的距离，以减少对待测发音评测的误差。首先利用汉语标准发音语料库和非标准发音语料库中音素模型之间的距离构建感知度矩阵，然后按照降序将感知度矩阵中的元素依次加入到与音素对应的识别模型候选集中。最后采用模型候选集合来减少待测发音识别的搜索空间，提高音素模型输出结果对待测发音描述的准确性。实验结果表明，该方法利用感知度为相似发音选择候选模型，不仅可以减少语音识别过程中无约束循环网络的计算量，还可以增加后验概率评分与专家评分之间的相关度，同时提高发音错误检出率。
     （2）研究了两种稳定的声调表征方法，以解决声调教学中调类特征受到语调等影响的问题。一种是基于分段线性拟合的声调表征方法。它通过对音节内的基频曲线进行分段分析，利用线性拟合参数代替基频曲线表征声调变化，以减少连续语音中上下文对声调表征的影响，并采用假设检验提高声调评测结果的准确性。另一种是基于韵律分离的声调表征方法。它采用Fujisaki模型对连续语音基频曲线进行不同层次的韵律特征分离。首先利用语言学知识计算Fujisaki模型的韵律边界，然后在保留语调长时变化的前提下获得声调特征曲线，最后构建语法分类回归树对连续语音中声调协变情况进行分类。实验结果表明，第一种方法在上声声调错误检出方面具有独特优势；在第二种方法中，语言学知识的加入提高了Fujisaki模型的基频曲线拟合精度，Fujisaki模型和语法分类回归树的结合有效提高了声调评测的准确性，其性能要优于传统基于基频曲线的发音自动评测方法。
     （3）研究了一种基于共振峰结构特征的发音自动评测方法，以检测由于定势的发音器官而导致的发音问题。它通过定义一种能够充分利用共振峰表征声道形状特性的共振峰结构特征，解决在发音自动评测过程中共振峰易受环境噪声和话者声道长度影响的问题。根据仿射不变性原理，计算待测发音和标准发音之间的结构畸变。实验结果表明，该方法在元音错误检测和与专家评分相关度方面，要优于基于共振峰特征的发音自动评测方法。
     （4）研究了一种基于模糊的层次性客观评测结果融合方法。该方法首先建立一种基于层次的客观评测结果融合模型，它在客观评测层与自动评分层之间引入发音错误分析中间层，将客观评测结果与自动评分之间的直接映射关系变为间接映射。然后采用模糊函数计算不同层次的客观评测融合结果。实验结果表明，发音错误分析层的引入不仅可以提供发音错误分析结果，还可以利用发音错误类型与专家评分之间的相关性信息提高自动评分的准确率，且利用模糊逻辑模拟专家主观判断过程可以提升客观评测融合结果的准确性。
The steady economic growth in China has attracted more and moreforeigners learning Mandarin, but the learning environment restriction andprofessional teaching staff missing have been to hinder the promotion ofMandarin international dissemination. Therefore it is a top priority to develop theauxiliary teaching Mandarin as a foreign language system. Automatic evaluationof pronunciation is the one of key technologies and the evaluation results are thebasis of feedback which will effectively improve learning efficiency. Mandarin isa monosyllabic tonal language which is significantly different with theIndo-European languages in phonics: The syllable of Mandarin can be analyzedinto three parts including the consonants, vowels and tone, which tone as thelinguistics distinguishing characteristics can express idea. Therefore, Mandarinautomatic evaluation is not only to consider the speech signal changing andredundant, but also to take the actual need of teaching for Mandarin of foreignlearners and their pronunciation characteristics in to account. Meanwhile, theternary structure of Mandarin syllables makes learners' pronunciation errors sodiverse and complex that the single acoustic feature is difficult to achieve thedesired results for the multi pronunciation errors detection. The more acousticauxiliary features should be selected for the automatic pronunciation evaluation.
     In a word, it is very important and valuable that researching the automaticevaluation method of Mandarin pronunciation for the Mandarin internationalpromotion.
     In this paper, the automatic pronunciation evaluation methods aboutMandarin of foreign learner are studied. In order to reduce the impact oflanguage transfer and promote the scope of new voice establishment, the featuresand models of automatic pronunciation evaluation have been constructed. It is toevaluate the test speech from the four aspects: consonant, vowel, tone, andintegration. The contents of this paper are designed as follows:
     (1) An automatic pronunciation evaluation method based on the perceptionof phoneme models is studied, which method is help to resolve the Mandarinreplacement by the analogous pronunciation of tongue. The method reduces thespeech discrimination errors by increasing the probability distribution distancesbetween the standard pronunciations and non-standard pronunciations based onthe principle of minimizing Bayesian error. First, the matrix of perception isconstructed by computing the phoneme model distance of Mandarin standardcorpus and Mandarin nonstandard corpus. Second, phoneme models are added into alternative set of recognition network in descending orders based on theperception matrix that is computed in a different corpus. Last, the alternative setof recognition network is help to reduce the search space of recognition andimprove the accuracy of the speech samples evaluation. The experimental resultsshow that the method used perception of phoneme models to choose thecandidate models for the analogous pronunciation of tongue，not only reduces thecomputational complexity of the speech recognition net, but also increases thecorrelation between automate scoring and expert scoring and whilst improves theaccuracy of pronunciation error detection.
     (2) Two stable tone features are studied in order to solve the problem of tonefeature various in continuous speech, which method is help to resolve the tonecategory feature affected by the intonation in tone teaching. The first one tonerepresentation method uses the sub-section linear fitting algorithm to analyze thefundamental frequency curve in syllables and uses the coefficients of linearfitting to represent the tone. This representation is helpful to reduce the contextimpacts in continuous speech. The hypothesis testing is used to improve theaccuracy of the tone evaluation results. The second tone representation is basedon separated rhythm from the fundamental frequency contours. The Fujisakimodel is used to separate the different rhythm features from speech fundamentalfrequency curves. First, the linguistic knowledge is used to determine theprosodic boundary for computing the components of Fujisaki. Next, the tonefeatures are achieved under the premise of reserving the intonation. Last, alinguistic classification and regression tree is constructed to classify the tone incontinuous speech. The experimental results show that the former method has aunique advantage in the rising tone error detection; In the latter method, thelinguistic knowledge can improve the accuracy of fitting results of Fujisaki andcombination of Fujisaki model with linguistic classification and regression treecan effectively improve the accuracy of tone evaluation. Its performance issuperior to the traditional pronunciation evaluation method based on fundamentalfrequency curves.
     (3) An automatic pronunciation evaluation method based on the formantstructure is studied, which method is help to detect the pronunciation errorscaused by the fixed articulator. The method defines a formant structure feature toreduce the distortion of formant caused by vocal tract length normalizationalgorithm and environment noise. The distortion between two formant structuresare computed and used to automatic pronunciation evaluation. The experimentalresults show that the rate of vowel errors detection and the performance ofautomatic pronunciation evaluation are superior to the automatic pronunciationevaluation method based on formant feature.
     (4) A hierarchical objective evaluation results integration method based onfuzzy is studied. First, the method introduces the pronunciation errors analysis asthe middle layer by constructing an objective evaluation results integrationmodel based on hierarchy, which is changing the direct mapping betweenobjective pronunciation results and experts scoring to indirect mapping. Next, afuzzy function is used to analyze the integration results in different hierarchy.The experimental results show that the middle layer of pronunciation errors isnot only to provide the analyzed results of pronunciation errors, but also canimprove the accuracy of automatic pronunciation scoring by introducing thecorrelation knowledge between the pronunciation errors and experts scoring. Thefuzzy logic used to simulate the process of expert’s subjective evaluation canimprove the accuracy of the integration of objective evaluation results.

引文

[1]王士进，李宏言，柯登峰等.面向第二语言学习的口语大规模智能评估技术研究.中文信息学报，2011，26(6)：142-148.
    [2] K.Al Mekhlafi，Xiaopei Hu，Ziguang Zheng. An Approach to Context-aware Mobile Chinese Language Learning for Foreign Students. Internati-onal Conference on Mobile Business，2009：340-346.
    [3] Jaeyoung Jung，Nobuyasu Makoshi，Hiroyuki Akama. Associative Langu-age Learning Support Applying Graph Clustering for Vocabulary Learningand Improving Associative Ability，Proceeding of IEEE InternationalConference on Advanced Learning Technologies，2008：228-232.
    [4] Ruimin Zhang，Bofeng Zhang，Jingchen Zhu, Huiting Huang. Developmentof Multi-Video Based Virtual Classroom and its Application in English asSecond Language Learning. International Symposium on ComputerScience and Computational Technology，2008：175-179.
    [5] Masahiro Mochizuki，An Experimental CALL System Enhanced with Wiki，IEEE International Conference on Advanced Learning Technologies，2007：121-126.
    [6]周小兵，王功平.近三十年汉语作为二语的语音习得研究述评.汉语学习，2010，(1)：88-95.
    [7]张林军.日本留学生汉语声调的范畴化知觉.语言教学与研究，2010，(3)：9-15.
    [8]王韫佳，邓丹.日本学习者对汉语普通话“相似元音”和“陌生元音”的习得.世界汉语教学，2009，(2)：42-48.
    [9]席洁，姜薇，张林军等.汉语语音范畴性知觉及其发展.心理学报，2009，41(7)：572-579.
    [10]张林军.母语经验对留学生汉语声调范畴化知觉的影响.华文教学与研究，2010，(2)：15-20.
    [11] C.Pahl，C.Kenny. Interactive Correction and Recommendation for Compu-ter Language Learning and Training，IEEE Transactions on Knowledge andEngineering，2009，21(6)：854-866.
    [12] J.Jiang，B.Xu. Exploring the Automatic Mispronunciation Detection of Co-nfusable Phones for Mandarin. Proceeding of IEEE International Confe-rence on Acoustics，Speech and Signal Processing，2009：4833-4836.
    [13] Z.Klaus，H.Derrick，X.M.Xi，et al. Automatic Scoring of Non-NativeSpontaneous Speech in Tests of Spoken English. Speech Communication，2009，51(10)：883-895.
    [14] F.Marty. Reflections on the Use of Computers in Second Language Acqui-sition. Studies in Language Learning，1981，3(1)：25-53.
    [15] L.R.Rabiner，S.E.Levinson，M.M.Sondhi. On the Application of VectorQuantization and Hidden Markov Models to Speaker-Independent，IsolatedWord Recognition. The Bell System Technical Journal，1983，62(4)：1075-1105.
    [16] S.J.Young. Generating Multiple Solutions from Connected Word DP Reco-gnition Algorithm. Proceedings of the Institute of Acoustics，1984，6(4)：351-354.
    [17] L.R.Rabiner. A Tutorial on Hidden Markov Models and Selected Applica-tions in Speech Recognition. Proceedings of the IEEE International Confe-rence on Acoustics，Speech and Signal Processing，1989，77(2)：257-286.
    [18] H.Franco，L.Neumeyer，Y.Kim，et al. Automatic Pronunciation Scoring forLanguage Instruction. Proceedings of IEEE International Conference onAcoustics，Speech and Signal Processing，1997，2：1471-1474.
    [19] C.H.JO. Studies on Computer-assisted Pronunciation Learning System forNon-Native Learners based on Speech Recognition Techniques. PhD thesis，Kyoto University，Japan，1999.
    [20] H.Hamada，S.Miki，R.Nakatsu. Automatic Evaluation of English Pronun-ciation Based on Speech Recognition Techniques. IEEE Transactions onInformation and System，1993，76(3)：352-359.
    [21] S.M.Witt，S.J.Young. Phone-Level Pronunciation Scoring and Assessmentfor Interactive Language Learning. Speech Communication，2000，30(2)：95-108.
    [22] D.Higgins，X.Xi，et al. A Three-stage Approach to the Automated Scoringof Spontaneous Spoken Respones. Computer Speech and Language，2011，25：282-306.
    [23] J.Tepperman，S.Lee，S.Narayanan，et al. A Generative Student Model forScoring Word Reading Skill. IEEE Transactions on Audio，Speech andLanguage Proncessing，2011，19(2)：348-359.
    [24] P.Price，J.Tepperman，M.Iseli，et al. Assessment of Emerging Reading Skillin Young Native Speakers and Language Learners. Speech Communication，2009，51：968-984.
    [25] J.Burston. Realizing the Potential of Mobile Phone Technology forLanguage Learning.The IALLT Journal，2011，41(2)：56-71.
    [26]梁维谦，王国梁，刘加等.基于音素的发音质量评价算法.清华大学学报(自然科学版)，2005，45(1)：5-8.
    [27]魏思，刘庆升，胡郁等.普通话水平测试电子化系统.中文信息学报，2006，20(6)：89-96.
    [28]黄双，李靖，王洪莹等.基于发音易混淆模型的发音评测算法.计算机应用，2006，(S2)：287-289.
    [29]汤士民，庄则敬，吴宗宪.应用错误型态分析於英语发音辅助学习. Pro-ceedings of ROLLING XVII，Tainan，Taiwan，2005.
    [30]葛凤培，潘复平，董滨等.汉语发音评估的实验研究.声学学报，2010，35(2)：261-266.
    [31]李宏言，黄申，王士进等.基于GMM-UBM和GLDS-SVM的英文发音错误检测研究.自动化学报，2010，36(2)：332-336.
    [32] Q.Tan，S.S.Narayanan. Novel Variations of Group Sparse RegularizationTechniques with Application to Noise Robust Automatic SpeechRecognition. IEEE Transaction on Audio，Speech and Language Pro-cessing，2012，20(4)：1337-1347.
    [33] C.Barras，J.L.Gauvain. Feature and Score Normalization for SpeakerVerification of Cellular Data. Proceedings of IEEE InternationalConference on Acoustics，Speech and Signal Processing，2003：49-52.
    [34] H.Hermansky，N.Morgan. RASTA Processing of Speech. IEEE Transactionon Audio，Speech and Language Processing，1994，2(1)：578-589.
    [35] C.Garreton，N.B.Yoma，M.Torres. Channel Robust Feature TransformationBased on Filter-bank Energy Filtering. IEEE Transaction on Audio，Speechand Language Processing，2012，18(5)：1082-1086.
    [36] Y.R.Oh，J.S.Yoon，H.K.Kim. Acoustic Model Adaption Based on Pronun-ciation Variability Analysis for Non-Native Speech Recognition. SpeechCommunication，2007，49(1)：59-70.
    [37] Gales M. Model-based Techniques for Noise Robust Speech Recognition.PhD thesis，University of Cambridge，UK，1990.
    [38]韩纪庆，张磊，郑铁然.语音信号处理.第2版.清华大学出版社，2010：302-308.
    [39] L.Adde，T.Svendsen. Pronunciation Variation Modeling of Non-Native Pro-per Names By Discriminative Tree Search. Proceedings of IEEE Interna-tional Conference on Acoustics，Speech and Signal Processing，2011，7：4928-4931.
    [40] J.L.Gauvain，C.H.Lee. Maximum a Posteriori Estimation for MultivariateGaussian Mixture Observations of Markov Chains. IEEE Transactions OnSpeech and Audio Processing，1994，2(2)：291-298.
    [41] C.J.Leggetter，P.C.Woodland. Maximum Likelihood Linear Regression forSpeaker Adaptation of Continuous Density Hidden Markov Models. Comp-uter Speech and Language，1995，9(2)：171-185.
    [42] Y.Rhee，H.K.Kim. MLLR/MAP Adaption Using Pronunciation Variationfor Non-native Speech Recognition. Proceedings of IEEE InternationalConference on Automatic Speech Recognition and Understanding，2009：216-221.
    [43] S.Steidl，G.Stemmer，C.Hacker，et al. Adaptation in the PronunciationSpace for Non-native Speech Recognition. Proceeding of International Co-nference Spoken Language Processing.2004，8：2901-2904.
    [44] Y.Ohkawa，M.Suzuki，H.Ogasawara，et al. A Speaker Adaption Method forNon-Native Speech Using Learners' Native Utterances for Computer-Assisted Language Learning Systems. Speech Communication，2009，51(10)：875-882.
    [45] M.Gerber，T. Kaufmann，B.Pfister. Extended Viterbi Algorithm for Opti-mized Word HMMS. Proceedings of IEEE International Conference onAcoustics，Speech and Signal Processing，2011，7：4932-4935.
    [46] T.Cincarek，R.Gruhn，C.Hacker，et al. Automatic Pronunciation Scoring ofWords and Sentence Independent from the Non-Native's First Language.Computer Speech and Language，2009，23(1)：65-88.
    [47] C.Bhat，K.L.Srinivas，P.Rao. Pronunciation Scoring for Indian EnglishLearners Using a Phone Recognition System. Proceedings of InternationalConference on Interactive Technologies and Multimedia，2010：135-139.
    [48]冯鑫，王岚.基于音素的自动发音检测方法研究.全国模式识别学术会议，2008.
    [49]刘庆升，魏思，胡郁等.基于语言学知识的发音评测算法改进.中文信息学报，2007，21(4)：92-96.
    [50]刘庆升，魏思，胡郁等.基于KLD差的统计错误模式生成算法.数据采集与处理，2009，24(1)：32-37.
    [51] C.Koniaris，O.Engwall. Perceptual Differentiation Modeling ExplainsPhoneme Mispronunciation by Non-native Speakers. Proceedings of IEEEInternational Conference on Acoustics，Speech and Signal Processing，2011，7：5704-5707.
    [52] T.Athanaselis，S.Bakamidis，I.Dologlou. Performance Evaluation of aNovel Technique for Word Order Errors Correction Applied to Non NativeEnglish Speakers' Corpus. Proceedings of the International Conference onComputational Linguistics and Intelligent Text Processing，2011，2：402-410.
    [53]赵元任.一套标调的字母.语言学教师，1980，2：81-83.
    [54]沈炯.汉语的语调构造和语调类型.方言，1994，3：221-228.
    [55]倪崇嘉，刘文举，徐波.汉语韵律短语的时长与音高研究.中文信息学报，2009，23(4)：83-87.
    [56]倪崇嘉，刘文举，徐波.基于多空间概率分布的汉语连续语音声调识别研究.计算机科学，2011，(9)：224-226.
    [57] D.Hirst，R.Espesser. Automatic Modeling of Fundamental Frequency Usinga QuaRRatic Spline Function. Travaux de L'Institut de Phontique D'Aix，1993，15：71-85.
    [58] X.C.Wang，Y.Liu，L.H.Cai. Entering Tone Recognition in a Support VectorMachine Approac. Proceedings of International Conference on NaturalComputation，2008，2：61-65.
    [59] S.Wei，H.K.Wang，Q.S.Liu，et al. PRF-Matching for Automatic Tone ErrorDetection in Mandarin CALL System. Proceedings of IEEE InternationalConference on Acoustics，Speech and Signal Processing，2007，4：205-208.
    [60]张琰彬，呼月宁，初敏等.汉语普通话声调发音错误检测.清华大学学报(自然科学版)，2008，48(Sl)：683-687.
    [61] H.Fujihara，M.Goto. Concurrent Estimation of Singing Voice F0and Phon-emes by Using Spectral Envelopes Estimated from Polyphonic Music. Pro-ceedings of IEEE International Conference on Acoustics，Speech and Sign-al Processing，2011，1：365-368.
    [62]曹阳，黄泰翼，徐波.基于统计方法的汉语连续语音中声调模式的研究.自动化学报，2004,30(2)：191-198.
    [63] L.Tang，J.X.Yin. Mandarin Tone Recognition based on Pre-Classification.Proceedings of World Congress on Intelligent Control and Automation，2006：9468-9472.
    [64] F.P.Pan，Q.W. Zhao，Y.H.Yan. Mandarin Vowel Pronunciation QualityEvaluation by a Novel Formant Classification Method and its Combinationwith Traditional Algorithms. Proceedings of IEEE International Confe-rence on Acoustics，Speech and Signal Processing，2008：5061-5064.
    [65] T.T.Zhu，D.F.Ke，Z.B.Chen，et al. A Preliminary Exploration on Tone ErrorDetection in Mandarin Based on Clustering. Proceedings of InternationalUniversal Communication Symposium，2010：48-51.
    [66]Y.B.Zhang，M.Chu，C.Huang，et al. Detecting Tone Errors in ContinuousMandarin Speech. Proceedings of IEEE International Conference on Acou-stics，Speech and Signal Processing，2008，7：5065-5068.
    [67] L.Welling，H.Ney. Formant Estimation for Speech Recognition. IEEE Tran-sactions on Speech and Audio Processing，1998，6(1)：36-48.
    [68] H.C.Jin，C.J.Jiang，L.B.Zhang. New Method for Extracting Speech For-mants Using LPC Phase Spectrum. Electronic Letters，1993，29(24)：2081-2082.
    [69] A.Watanabe. Formant Estimation Method using Inverse-Filter Control.IEEE Transaction on Speech and Audio Processing，2001，9(4)：317-326.
    [70] J.L.Durrieu，J.P.Thiran. Sparse Non-negative Decomposition of SpeechPower Spectra for Formant Tracking. Proceedings of IEEE InternationalConference on Acoustics，Speech and Signal Processing，2011，7：5260-5263.
    [71]黄海，陈祥献.基于Hilbert-Huang变换的语音信号共振峰频率估计.浙江大学学报，2006，40(11)：1926-1930.
    [72]王士元，彭刚.语言、语音与技术.上海：上海教育出版社，2006.
    [73]吴宗济.普通话元音和辅音的频谱分析及共振峰的测算.声学学报，1964，1(1)：33-39.
    [74] Z.J.Wu，G.H.Sun. An Experimental Study of Coarticulation of Unaspiri-nated Stops in CVCV Contexts in Standard Chinese. Annual Report ofPhonetic Research，1989：1-25.
    [75] J.Z.Yan. A Study of the Formant Transitions Between the First Syllablewith Vocalic Ending and the Second Syllable with Initial Vowel in theDisyllabic Sequence in Standard Chinese. Annual Report of PhoneticResearch，1995：41-53.
    [76]周忠诚，王孟杰，于水源.汉语双音节中第一音节的元音共振峰轨迹研究.电声技术，2007，31(3)：8-13.
    [77] P.F.Assmann，T.M.Nearey. Relationship between Fundamental and FormantFrequencies in Voice Preference. Journal of Acoustic Society of America，2007，122(2)：35-43.
    [78] C.Gendrot，M.A.Decker. Impact of Duration and Vowel Inventory Size onFormant Values of Oral Vowels: An automatic Analysis from EightLanguages. Proceedings of International Congress of Phonetic Sciences，2007，16：1417-1420.
    [79]董滨，赵庆卫，颜永红.基于共振峰模式的汉语普通话中韵母发音水平客观测试方法的研究.声学学报，2007，32(2)：122-128.
    [80] A.Raux，T.Kawahara. Automatic Intelligibility Assessment and Diagnosisof Critical Pronunciation Errors for Computer-assisted Pronunciation Lear-ning. Proceedings of International Conference on Spoken Language Proce-ssing，2002：737-740.
    [81] K.Tatsuya，D. Masatake，T. Yasushi. Practical use of English PronunciationSystem for Japanese Students in the CALL Classroom. Proceedings ofINTERSPEECH，2004：1689-1692.
    [82] H.Franco，L.Neumeyer L，V.Digalakis，et al. Combination of MachineScores for Automatic Grading of Pronunciation Quality. Speech Communi-cation，2000，30(2)：121-130.
    [83] H.Franco，V.Abrash，K.Precoda，et al. The SRI EduSpeak System: Reco-gnition and Pronunciation Scoring for Language Learning. Proceedings ofIntegrating Speech Technology in Language Learning，2000：123-128.
    [84] S.M.Abdou，S.E.Hamid，M.Rashwan，et al. Computer Aided PronunciationLearning System Using Speech Recognition Techniques. Proceedingsof International Conference on Spoken Language Processing，2006，3：849-852.
    [85] X.Sun，K.Evanini. Gaussian Mixtrue Modeling of Vowel Durations forAutomated Assessment of Non-native Speech. Proceedings of IEEE Inter-national Conference on Acoustics，Speech and Signal Processing，2011，7：5716-5719.
    [86] J.Tepperman，S.Narayanan. Using Articulatory Representations to DetectSegmental Errors in Nonnative Pronunciation. IEEE Transactions onAudio，Speech and Language Processing，2008，16(1)：8-22.
    [87]张晴晴，潘接林，颜永红.基于发音评测特征的汉语普通话声学建模.声学学报，2010，35(2)：254-260.
    [88]侯珺，王作英.一种词义与词的混合语言模型及其应用.中文信息学报，2002，15(6)：7-13.
    [89] B.Gick，B.Bernhardt，P.Bacsfalvi，et al. Ultrasound Imaging Applicationsin Second Language Acquisition. John Benjamins Publishing Company，2008：309-322.
    [90] E.D. Petajan. Automatic Lip Reading to Enhance Speech Recognition.PhD.Thesis，University of Illinois at Urbana-Champaign，US，1984.
    [91] T.Chen. Audiovisual Speech Processing. IEEE Signal Processing Magazine，2001，18(1)：9-21.
    [92]赵建军，杨晓虹，杨玉芳等.音高和时长在语篇语句重音中的作用.声学学报，2011，49(4)：435-443.
    [93] H.Y.Li，S.J.Wang，J.E.Liang，et al. High Performance Automatic Mispro-nunciation Detection Method Based on Neural Network and TRAP Feature.Proceeding of Interspeech，2009：1911-1914.
    [94] Y.Song，W.Q.Liang. Experimental Study of Discriminative AdaptiveTraining and MLLR for Automatic Pronunciation Evaluation. TsinghuaScience and Technology，2011，16(2)：189-193.
    [95] C.Cucchiarini，H.Strik，L.Boves. Different Aspects of Expert PronunciationQuality Ratings and their Relation to Scores Produced by Speech Recogni-tion Algorithms. Speech Communication，2000，30(2)：83-93.
    [96] K.Truong. Automatic Pronunciation Error Detection in Dutch as a SecondLanguage：An Acoustic-Phonetic Approach. Master thesis，Utrecht Univer-sity，Netherlands，2004.
    [97] R.Hincks. Speech Recognition for Language Teaching and Evaluation：aStudy of Existing Commercial Products. Proceedings of InternationalConference on Spoken Language Processing，2002：733-736.
    [98] A.J.Li，X.X.Chen，G.H.Sun，et al. The Phonetic Labeling on Read andSpontaneous Discourse Corpora. Proceedings of International Conferenceon Spoken Language Processing，2000：48-51.
    [99]苏周.香港广州话语音资料库成立过程.第三届全国语音学研讨会，1996：53-55.
    [100]王韫佳，李吉梅.建立汉语中介语语音语料库的基本设想.世界汉语教学，2001，1：87-92.
    [101]曹文，张劲松.面向计算机辅助正音的汉语中介语语音语料库的创制与标注.语言文字应用，2009，4：122-131.
    [102]陈开顺.话语感知与理解—过程、特征与能力探讨.外语教学与研究出版社，2001，12：12-51.
    [103] E.Shinto，C.John. Interpreting Kullback-Leibler Divergence with theNeyman-Pearson Lemma. Journal of Multivariate Analysis，2006，97(9)：2034-2040.
    [104] J.Goldberger，H.Aronowitz. A Distance Measure between GMMs Based onthe Unscented Transform and its Application to Speaker Recognition.INTERSPEECH，2005：1985-1988.
    [105]张洪健.对外汉语声调习得研究综述.外国语文，2012：66-68.
    [106]任桂琴，刘颖，于泽.汉语口语韵律的作用及其神经机制.心理科学进展，2012，20(3)：338-343.
    [107]谭秋瑜.视觉反馈对外国留学生单声调习得影响的实验研究.硕士论文北京：北京语言大学，2006：8-11.
    [108]王安红.汉语声调特征教学探讨.语言教学与研究，2006，(3)：70-74.
    [109]朱晓农.语音学.北京商务印书馆，2010：190-195.
    [110] A.Lawson， M.Linderman， M.Leonard， et al. Perturbation and PitchNormalization as Enhancements to Speaker Recognition. Proceedings ofIEEE International Conference on Acoustics，Speech and Signal Processing，2009：4533-4536.
    [111] H.Fujisaki，K.Hirose. Analysis of Voice Fundamental Frequency Contoursfor Declarative Sentences of Japanese. In Journal of the Acoustical Societyof Japan，1984，5(4)：233-241.
    [112] H.Fujisaki. Dynamic Characteristics of Voice Fundamental Frequency inSpeech and Sing. Journal of STL-QPSR，1981，22(1)：1-20.
    [113] H.Fujisaki， M.Ljungqvist， H.Murata. Analysis and Modeling of WordAccent and Sentence Intonation in Swedish. Proceedings of IEEEInternational Conference on Acoustics，Speech and Signal Processing，1993，2：211-214.
    [114] H.Mixdorff. Intonation Patterns of German-Model-based QuantitativeAnalysis and Synthesis of F0Contours. PhD thesis，TU RResden，Germany，1998.
    [115] P.S.Rossi，F.Palmieri，F.Cutugno. A Method for Automatic Extraction ofFujisaki-Model Parameters. Proceedings of Speech Prosody，2002：615-618.
    [116] H.Mixdorff. A Novel Approach to the Fully Automatic Extraction ofFujisaki Model Parameters. Proceedings of IEEE International Conferenceon Acoustics，Speech and Signal Processing，2000，3：1281-1284.
    [117] S.D.S.Silva，S.L.Netto. Closed-Form Estimation of the Amplitude Com-mands in the Automatic Extraction of the Fujisaki Model. Proceedings ofIEEE International Conference on Acoustics，Speech and Signal Processing，2004，1：621-624.
    [118] H.Mixdorff，H.Fujisaki，G.P.Chen，et al. Towards the Automatic Extractionof Fujisaki model Parameters for Mandarin. Proceedings of Eurospeech，2003：873-876.
    [119] P.D.Aguero，K.Wimmer，A.Bonafonte. Joint Extraction and Orediction ofFujisaki's Intonation Model Parameters. Proceedings of International Con-ference on Spoken Language Processing，2004：757-760.
    [120]夸特尔瑞.离散时间信号处理——原理与应用.电子工业出版社，2004：15-28.
    [121] M.Tamura，M.Morita，T.Kagoshima，et al. One Sentence Voice AdaptationUsing GMM-Based Frequency-Warping and Shift with a Sub-Band BasisSpectrum Model. Proceedings of IEEE International Conference onAcoustics，Speech and Signal Processing，2011：5124-5127.
    [122] L.Saheer，J.Yamagishi，P.N.Garner，et al. Combining Vocal Tract LengthNormalization with Hierarchical Linear Transformations. Proceeding ofIEEE International Conference on Acoustics，Speech and Signal Processing，2012，5：4493-4496.
    [123] M.Pitz，S.Molau，R.Schluter，et al. Vocal Tract Normalization EqualsLinear Transformation in Cepstral Space. Proceedings of Speech and AudioProcessing，2005：930-944.
    [124] N.Minematsu. Yet another Acoustic Representation of Speech Sounds.Proceedings of IEEE International Conference on Acoustics，Speech andSignal Processing，2004，1：585-588.
    [125] P.Zolfaghari，H.Kato，Y.Minami，et al. Model Selection for Mixture ofGaussians Based Spectral Modelling. Proceedings of InternationalConference on Machine Learning for Signal Processing，2004：325-334.
    [126]孟子厚.普通话单元音女声共振峰统计特性测量.声学学报，2006，31(3)：199-202.
    [127] F.Tomas，Quatieri. Discrete-Time Speech Signal Processing：Principles andPractice. Publishing House of Electronics Industry，2004：43-76.
    [128]蔡君梅.语言类型特征可迁移性的认知研究.博士论文.上海：上海外国语大学，2007.
    [129] R.Mesiar. Level-Dependent Sugeno Integral. IEEE Transactions on FuzzySystems，2009，17(1)：167-172.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700