基于概率神经网络的汉语耳语音识别的研究

英文题名：Study of Identification for Chinese Whispered Speech Based on Probabilistic Neural Network
作者：荣薇
论文级别：硕士
学科专业名称：检测技术与自动化装置
中文关键词：耳语音 ; 语音识别 ; 概率神经网络
英文关键词：whispered speech ; speech recognition ; Probabilistic neural network
学位年度：2008
导师：陶智
学科代码：081102
学位授予单位：苏州大学
论文提交日期：2008-05-01

摘要

耳语音是一种特殊的语音交流方式。它在会场、音乐厅、图书馆和影剧院等禁止大声喧哗的场所被广泛地采用。随着移动电话的普及,人们常常需要在公共场合进行通话,为不影响他人或者保证通话的保密性可使用耳语音,为此要求移动电话有识别耳语音的能力。另外,对于喉部受损的失音患者,如能将其发出的气声自动识别出来,转换为正常音,则无疑方便了他们的语言交流。
     但耳语音识别问题,无论在国内还是国外,都处于前期研究阶段,所能利用的研究成果较少,加之耳语音的信噪比低等特点,识别起来比较困难。
     本文以实现汉语耳语音识别、提高识别率为目的,做了以下主要工作:
     1.通过研究耳语音的各种声学特性,分析不同特征参数在耳语音识别中的应用,提出了基于动态时间规划(DTW)和概率神经网络(PNN)两种模型的耳语音识别系统。
     2.对模型训练和识别算法进行软件仿真,建立用于训练和测试的耳语音数据库,并对识别算法的实时性和准确性进行测试,给出主要的仿真结果和结论。
     最后提出本课题今后进一步研究和改进的方向。
Whispered speech is a special way for communication. It is widely used in a lot of places which could not speak loudly, such as meeting-room, library, and so on. When the mobile phone becomes more and more popular, people often need to talk with it in public. In order to talk secretly, people use whispered speech sometimes. So the mobile phone should be recognizing the whispered speech. And if we can recognize the whispered speech sound by those who have problems with their throats, and try to convert it to normal speech, it'll be very helpful.
     But the recognition of whispered speech is much more difficult than normal speech because the S/N is much lower.
     In order to improve the rate of identification of the whispered speech, the main work is bellow:
     1. Based on the research of the acoustic characters of the Whispered speech, this thesis built two models of recognition system based on DTW and PNN.
     2 .Training models and algorithms are simulated by software. Training and testing database is established. Then real-time performance and accuracy of the system is tested and the main findings and conclusions are given.
     Finally, the further research and improvement direction of this subject are raised.

引文

[1]Yu H,The whisper is not helpful for treating hoarseness and recovering voice,Journal of the Central University for Nationalities,1996,5,2,163-166
    [2]Itoh T,Yakeda K,Itakura F,Acoustic analysis and recognition of whispered speech,Proc.ICASSP,Orlando,Florida,USA,2002,389-392
    [3]Morris R W,Clements M A,Reconstruction of speech from whispers,Medical Engineering & Physics,2002,4,8,15-520
    [4]Morris R W,Enhancement and recognition of whispered speech,PhD Thesis,Geo a Institute of Technology,USA,2002
    [5]沙丹青,栗学丽,徐柏龄,耳语音声调特征的研究,电声技术,2003,11,4-7
    [6]栗学丽,丁慧,徐柏龄,基于熵函数的耳语音声韵分割法,声学学报,2004,30,1,69-75
    [7]杨莉莉,李燕,徐柏龄,汉语耳语音库的建立与听觉实验研究,南京大学学报(自然科学),2005,41,3,311-317
    [8]林玮,杨莉莉,徐柏龄,基于修正MFCC参数汉语耳语音的话者识别,南京大学学报(自然科学),2006,42,1,54-62
    [9]韩纪庆,张磊,郑铁然,语音信号处理,清华大学出版社,北京,2004
    [10]易克初,田斌,付强,语音信号处理,国防工业出版社,湖南,2000
    [11]Fant G,Acoustic Theory of speech production,the Hague,Mouton,1970
    [12]李霄寒,戴蓓倩,方绍武等,高阶MFCC的话者语音识别性能及其噪声鲁棒性,信号处理,2001,17,2,124-129
    [13]Traunmullar H,Analytical expression for the tonotopic sensory scale,Journal of the Acoustical Society of America,1990,88,97-100
    [14]王昆仑,语音识别中信号特征的提取和选择,新疆师范大学学报(自然科学版),2000,19,2,15-18
    [15]张全,语音声学的进展,应用声学,2002,21,1,35-39
    [16]胡光锐,语音处理与识别,上海科学技术文献出版社,上海,1994
    [17]杨行峻,迟惠生,语音信号数字处理,电子工业出版社,1995
    [18]Specht D F,Probabilistic neural networks,Neural Networks,1990,3,2,109-118
    [19]Juang,Sadaoki,Automatic Recognition and Understanding of Spoke Language-A First Step Towards Natural Human-Machine Communication,Proceedings of the IEEE,2000,88,8,1142-1165
    [20]Huang X D,Acero A,Hon H,et al,Spoken Language Processing:A Guide to Theory,Algorithm and System Development,Prentic Hall PTR,2001
    [21]Lamel L,Rabiner L R,Rosenberg E A,Wilpon J G,An Improved Endpoint Detector for Isolated Word Recognition,Trans.IEEE ASSP,1981,29,4,777-785
    [22]LEE C H,On Automatic Speech Recognition at the Dawn of 21th Century,IEICE TRANS,INF & SYST,2003,E86-D,3
    [23]Brown P F,Acoustic-phonetic modeling problem in automatic speech recognition,ph.D thesis,Carnegie Mellon Univ,1987
    [24]Parzen,On estimation of a probability density function and mode,Annals of Mathematical Statics,1962,33,1065-1076
    [25]Lloyd G M,Wang M L,Paez T L,Minimisation of decision errors in a probabilistic neural networks for change point detection in mechanical systems.Mechanical Systems and Signal Processing,1999,13,6,943-954
    [26]Gong Y F,Speech Recognition in Noisy Environments:A Survey,Speech Communication,1995,16,261-291
    [27]Juang B H,The past present and future of speech processing,IEEE Signal Processing Magazine,1998
    [28]Ishibuchi H,Fujioka R,Tanaka H,Neural networks that learn form fuzzy if-then rules,IEEE Trans,Fuzzy Systems,1993,1,85-97
    [29]Keller J M,Tahani H,Back propagation neural networks for fuzzy logic,Information Science,1992,62,205-221
    [30]Demuth H and Beale M,Neural network toolbox user's guide,The Mathworks Inc,1997
    [31]Eckhorn R,Reitboeck H J,Arndt M,et al,Feature linking via synchronization among distributed assemblies:Simulation of results from cat visual cortex,Neural Computing,1990,2,3,293-307
    [32]Johonson J L,Pulse-coupled neural nets:translation rotation,scale,distortion,and intensity signal invariance for images,1994,33,26,6239-6253
    [33]Huang D S,Radial basis probabilistic neural networks:model and application,International Journal of Pattern Recognition and Artificial Intelligence,1999,13,7,1083-1101
    [34]Huang D S,The pattern recognition system theory based on the neural networks,Beijing Publishing House of Electronic Industry,Beijing,1996,119-137
    [35]McDermott H,Mc Kay C,Vandali A,A new portable sound processor for the University of Melbourne/Nucleus Limited multielectrode cochlear implant,Journal of the Acoustical Society of America,1992,91,6,3367-3371
    [36]Wallenberger E,Battmer R,Comparative speech recognition results in eight subjects using two different coding strategies with the Nucleus 22 channel cochlear implant,British Journal of Audiolog,1991,25,371-380
    [37]Wilson B,Finley C C,Lawson D,et al,Better speech recognition with cochlear implants,Nature,1991,352,236-238
    [38]B.Boualem,Time-Frequency Signal Analysis,Halsted,1992
    [39]郭春霞,裘雪红,基于MFCC的说话人识别系统,电子科技,2005,11,53-56
    [40]邵央,刘丙哲,李宗葛,基于MFCC和加权矢量量化的说话人识别系统,计算机工程与应用,2002,9,28,70-72
    [41]张智星,MATLAB程序设计与应用,清华大学出版社,北京,2001
    [42]何强,何英,MATLAB扩展编程,清华大学出版社,北京,2002
    [43]Naik J M,Speaker verification:Atutorial,IEEE Communication Magazine,1990,28,1,42-48
    [44]ZHANG Wanfeng,WU Zhaohui,YANG Yingchun,et al,Feature Combination For Speaker Identification,2003,21,3,10-15

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700