基于发音词典自适应的民族语口音汉语普通话语音识别
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
非母语口音、少数民族语口音是汉语普通话连续语音识别应用中必须面对的问题。论文以纳西语口音为实例,研究如何利用民族语口音的发音变异规律,在低成本和易于扩展的前提下,实现由标准普通话识别器到民族语口音普通话识别器的变换。
     论文的主要工作包括:
     (1)基于HTK平台,用863标准普通话语音数据库训练了一个标准普通话语音识别器,以作为基线系统。
     (2)采用MLLR和MAP方法,实现了对民族语口音语音数据的声学模型自适应。
     (3)用经过声学模型自适应的语音识别器对民族语口音语音数据进行语音识别,根据识别结果计算声母、韵母和音节的混淆矩阵。
     (4)研究民族语口音普通话的声母、韵母和音节的变异规律,采用专家知识指导下的数据驱动方法,设计出了一种新的多发音词典生成策略,以实现用某种口音(或某说话人)的音节混淆矩阵自动构建该种口音(或该说话人)的多发音词典。
     (5)在有语言模型和无语言模型的条件下,用实验验证了说话人相关、口音相关发音词典的有效性。
     实验结果表明,在有语言模型、不考虑声调的前提下,基线系统识别纳西语口音的最好识别率为:50.26%,引入MLLR+MAP声学模型自适应后识别率提高为:80.56%。在声学模型自适应的基础上,分别引入说话人相关、口音相关发音词典,则最好识别率可分别到达:85.15%、82.59%。
This dissertation primarily concentrates on Chinese speech recognition for nonnative speaker which is almost unavoidable for LVCSR (Large Vocabulary Continuous Speech Recognition). Taking the Putonghua spoken by the speakers whose native language is Naxi as the target languages, we attempt to establish accent-specific speech recognizers from an available standard Putonghua speech recognizer, based on the Initial-Final structure of the Chinese language, in combination with the variation regularity of pronunciation in this minorities’accent.The contributions of this dissertation are as follows:
     (1) Baseline hidden Markov models (base-line system) were trained by using the project 863 standard Mandarin corpus based on HTK platform.
     (2) Aimed at Yunnan minority Naxi speech, nonnative mandarin speech recognition is discussed applying general speaker adaptation MLLR and MAP.
     (3) Firstly, the nonnative speech data from Naxi area in Yunnan was transcribed with the baseline HMMs after adaptation. In addition, the transcribed result was forced aligning with the reference transcription through dynamic programming (DP). Finally, calculate the confusion matrix of base syllables, initials and finals.
     (4) Study the initials, finals and syllables variation regularity of linguistic minorities accented Putonghua using data-driven method in combination with expert knowledge; a novel strategy of building multi-pronunciation lexicon which can be easily extended to the other accents was proposed to automatically construct the multi-pronunciation lexicon of the given accent(speaker) based on its syllables confusion matrix.
     (5) Verify the effectiveness of speaker-dependent and accend-dependent pronunciation dictionaries.
     Experimental results show: the use of baseline, after using the language model, the highest correct rates of base syllable was 50.26%. Using MLLR+MAP, the base syllable correct rates of raised to 80.56%. After acoustic model adaptation, using of the speaker-dependent and accend-dependent pronunciation dictionary, we reached better recognition rates: 85.15%, 82.59%.
引文
[1]. Richard Sproat, Fang Zheng, Liang Gu, et al. Dialectal Chinese speech recognition: Final report [R]. CLSP Summer Workshop, 2004, 1-82.
    [2].云南省地方志编纂委员会,云南省少数民族语文指导工作委员会[M].云南省志,卷五十九,少数民族语言文字志.昆明:云南人民出版社,1997.
    [3].刘林泉.基于小数据量的方言背景普通话语音识别声学建模研究[D].清华大学计算机科学与技术系,博士论文, 2007.
    [4].齐沪扬.就“方言普通话”答客问[J].修辞学习,1994,第4期(总94期).
    [5]. Jian Yang,Hong Wei,Yuanyuan Pu,Zhengpeng Zhao. Comparison of Non-native Speaker Adaptations for Large Vocabulary Continuous Mandarin Speech Recognition [J]. International Journal of Information Technology, 2005,Vol. 11,No. 7:9-19.
    [6]. X D He, Y X Zhao. Prior knowledge guided maximum expected likelihood based model selection and adaptation for nonnative speech recognition [J]. Computer Speech and Language, Vol.21, pp. 247-265, 2007.
    [7]. E F Lussier. A tutorial on pronunciation modeling for large vocabulary speech recognition [J]. Lecture Notes in Computer Science, No.2705, pp.38-77,2003.
    [8]. C Huang, T Chen and E Chang. Accent issue in large vocabulary continuous speech recognition [C]. International Journal of Speech Technology, Kluwer Academic Publishers, Vol. 7, No. 2/3, pp. 141-153, 2004.
    [9].刘明宽,徐波,黄泰翼,胡伟湘.音节混淆字典以及在汉语口音自适应中的应用研究[J].声学学报,第27卷,第1期,pp.53-58,2002.
    [10].宋战江.汉语自然语音识别中发音建模的研究[D].清华大学计算机系,博士学位论文,2001.
    [11]. Jing Li, Thomas Fang Zheng, William Byrne, Dan Jurafsky. A Dialectal Chinese Speech Recognition Framework [J]. J. Comput. Sci. & Technol.2006,Jan. Vol. 21, No. 1, pp.106-115.
    [12].吴佩珊,基于发音词典自适应的非母语口音汉语普通话语音识别.云南大学,硕士论文, 2008.
    [13].李净.吴方言背景普通话语音识别研究[D].北京:清华大学计算机科学与技术系,2005.
    [14].陈雁翔,戴蓓蒨,周曦,刘鸣.一种适于非特定人语音识别的并行隐马尔可夫模型[J].电子与信息学报,2004,第26卷,第10期:1601-1606.
    [15].刘云冰,祝彦成,彭静,肖俊,吴传菊. HMM在说话人识别系统中的实现[J].软件导刊,2006,12月:15-16.
    [16].蔡莲红,黄德智,蔡锐.现代语音技术基础与应用[M].北京:清华大学出版社,2003.
    [17].张震宇.基于SPCE061A单片机的非特定人语音识别技术及应用[J].电子器件,2007,30卷,第5期:1645-1649.
    [18].郭圣权,连晓峰. MATLAB环境下的基于HMM模型的语音识别系统[J].计算机测量与控制,2004,Vol.12,No. 5:470-472,475.
    [19]. J J Humphriesy, P C Woodland, D Pearcez. Using Accent-Specific Pronunciation Modeling for Robust Speech Recognition [C]. In:Proc ICSLP-96:2324-2327.
    [20]. HTK. The HTK Book (Version 3.4),http://htk.eng.cam.ac.uk,2006.
    [21].石现峰,张学智,张峰.基于HTK的语音识别系统设计[J].计算机技术与发展,2006,Vol.16,No.10:37-38,41.
    [22].蔡琴,吾守尔·斯拉木.基于HTK的维吾尔语连续数字语音识别[J].现代计算机,2007,4月:14-16.
    [23]. John-Paul Hosom, Ron Cole and Mark Fanty. Speech Recognition Using Neural Networks at the Center for Spoken Language Understanding [R]. http://speech.bme.ogi.edu/tutordemos/, 1999.
    [24].何珏,刘加.汉语连续语音中HMM模型状态数优化方法研究[J].中文信息学报,第20卷,第6期,pp. 83-88, 2006.
    [25].刘加.汉语大词汇量连续语音识别系统研究进展[J].电子学报,第28卷,第1期,pp. 85-91,2000.
    [26].时宇,张益肇.微软中国研究院在普通话语音识别领域的现况和展望[C].第六届全国人机语音通讯学术会议(NCMMSC6), pp.333-337, 2001.
    [27].杨鉴.云南民族语口音汉语普通话语音识别研究[D].云南大学,博士论文, 2009.
    [28]. P C Woodland. Speaker adaptation for continuous density HMMs: a review [C]. ITRW on Adaptation Methods for Automatic Speech Recognition, Sophia Antipolis, France, pp. 11-19, 2001.
    [29]. C H Lee, J L Gauvain. Speaker adaptation based on MAP estimation of HMM parameters [C]. Proc. Int. Conf. Acoustics, Speech, and Signal Processing, vol.2, pp.652-655, 1993.
    [30]. Q Huo, C Chan, C H Lee. Bayesian adaptive learning of the parameters of hidden Markov model for speech recognition [J]. IEEE Trans. On Speech and Audio Processing, Vol.3, No. 5, pp.334-345, 1995.
    [31]. M J F Gales. Maximum likelihood linear transformation for HMM-based speech recognition [J]. Computer Speech and Language, Vol.12, pp.75-98, 1998.
    [32]. C. J. Leggetter, P. C. Woodland. Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models [J]. Computer Speech and Language. Vol.9, pp.171-185, 1995.
    [33]. Roland Kuhn, Jean-Claude Junqua, Patrick Nguyen, Nancy Niedzielski. Rapid Speaker Adaptation in Eigenvoice Space [J]. IEEE Transactions on Speech and Audio Processing, Vol. 8, No. 6, pp.695-707, November 2000.
    [34]. T. Kosaka, S. Matsunaga, S. Sagayama. Tree-structured speaker clustering for speaker-independent continuous speech recognition [C]. Int. Conf. Speech Language Processing’94, Yokohama, Japan, pp. 1375–1378, 1994.
    [35].潘复平,赵庆卫,颜永红.一种用于方言口音语音识别的字典自适应技术[J].计算机工程与应用,2005,41(23): 4-6.
    [36]. V Hoste, W Daelemans, S Gillis. Using rule-induction techniques to model pronunciation variation in Dutch [J]. Computer Speech and Language, Vol. 18, pp. 1-23, 2004.
    [37]. M Wester. Pronunciation modeling for ASR - knowledge-based and data-derived methods [J]. Computer Speech and Language, Vol. 17, pp. 69-85, 2003.
    [38]. Mingkuan Liu, Bo Xu, Taiyi Hunng, Yonggang Deng, Chengrong Li. Mandarin accent adaptation based on context-independent/context-dependent pronunciation modeling [C]. Proc. Int. Conf. Acoustics, Speech, and Signal Processing,2000. Vol.2, pp.1025-1028.
    [39].黄伯荣,廖序东.现代汉语[M].兰州:甘肃人民出版社, 1983.
    [40]. Eric Chang, Yu Shi, Jianlai Zhou, and Chao Huang. Speech lab in a box: a Mandarin speech toolbox to jumpstart speech related research [C]. Eurospeech 2001, pp. 2799-2802, Aalborg, Denmark.
    [41]. S Young, J J Odell, P C Woodland. Tree-based state tying for high accuracy acoustic modeling [C]. Proc of ARPA Human Language Tech Workshop, pp.307-312. 1994.
    [42]. J Rissanen. Universal coding, information, prediction, and estimation [J]. IEEE Transactions of Information Theory, Vol.30, pp. 629-636, 1984.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700