数学形态学在语音识别中的应用研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
由于现实环境中存在各种噪声,严重影响了语音的识别率,因此带噪语音识别的研究显得尤为重要。本文从语音信号的非线性理论出发,探讨数学形态学在提高语音识别抗噪性能中的应用。对带噪语音识别中的语音增强、特征参数提取及识别方法等关键问题进行了研究。主要研究内容如下:
     1.对基于形态滤波的语音增强方法进行了研究。采用不同的形态滤波器和结构元素对带噪语音进行增强,得到不同情况下的输出信噪比,分析了结构元素形状及长度对增强效果的影响。
     2.将形态滤波和小波变换相结合,形成形态-小波滤波器,对带有不同噪声的语音信号进行滤波。实验结果表明,这种滤波器较好地保持了语音信号形状并使信号得到增强,效果优于形态滤波器。
     3.基于形态滤波器的幂等性,采用形态预失真方法提取纯净语音的美尔倒谱等参数。对纯净、带噪、去噪及预失真语音特征参数间的距离进行了分析比较,得出了预失真方法的可行性。
     4.在形态滤波的基础上,对基音周期检测方法进行了研究。根据短时平均幅度差函数(AMDF)与修正自相关函数(MACF)的特点,设计了滤波加权修正自相关函数的基音周期检测方法。该方法利用归一化平均幅度差函数的指数形式对修正自相关函数进行加权,实现了带噪语音的基音周期检测。
     5.采用预失真特征参数作为训练数据用于隐马尔可夫模型(HMM)识别方法,提高了训练和识别的匹配性,使语音识别率较使用传统方法的识别率有较大提高。
     6.设计了基于预失真参数的改进径向基函数(RBF)神经网络语音识别方法。对隐层中心的选择、权值的计算及网络结构优化方法进行了研究,分析了不同准则对结构优化的影响,确定了改进方案。通过实验分析比较了RBF神经网络与采用预失真参数的改进方法对带噪语音的识别率。
There are kinds of noise in real circumstance, speech recognition rate is influenced seriously, so it seems very important to study noisy speech recognition. Form nonlinear theory of speech signal, this paper discusses the application of mathematical morphology for improving robustness of recognition. Speech enhancement, feature extraction and recognition method in noisy speech enhancement are studied. The main research work is as follows:
     1. Speech enhancement method based on morphological filter is studied. Noisy speech signals are enhanced using different morphological filters and structuring elements, output SNRs in different circumstances are acquired, and the influences of the shape and length of structure elements are analyzed.
     2. Morphological filter and wavelet transform are combined to form morphology-wavelet filter, speech signals with different noises are filtered. Experiments show that this filter can maintain signal shape and enhance signal, its effect is better than morphology filter.
     3. Based on idempotency of morphological filter,clean speech feature coefficients are extracted using morphology predistortion method. Feature distances of clean, noisy, denoisy and predistortion speech are analyzed and compared,and feasibility of predistortion method is achieved.
     4. On the basis of morphological filter, pitch detection methods are researched. According to the characters of short time average magnitude difference function (AMDF) and modified short time autocorrelation function(MACF), filtering weighted modified autocorrelation pitch detection method is designed. This method uses exponent of normalized AMDF to weight MACF, and realizes pitch detection of noisy speech.
     5. Predistortion feature coefficients are used in Hidden Markov Model(HMM) recognition method as training data in order to increase matching of training and recognizing process, and the speech recognition rates of this method are better than that of traditional method.
     6. Speech recognition method of RBF neural networks based on predistortion coefficients is designed. The following research work is concerned with hidden centers choosing, weights computing and network structure optimizing. Influences of different criterions are analyzed, and an improving scheme is decided. Recognition rates of noisy speech using RBF neural networks and modified method based on predistortion coefficients are tested.
引文
[1] Yifan Gong. Speech recognition in noisy environments. A survey. Speech communication,1995,16(3): 261-291
    [2] A Acero, R M Stern. Environmental robustness in automatic speech recognition. International Conference on Acoustics,Speech, and Signal Processing.1990. 2:849-852
    [3]方绍武,戴蓓倩.VQ话者模型中失真测度的鲁棒性研究.数据采集与处理,2000,15(2):157-161
    [4] K Parssinen, P Salmela, M Harju, et al. Comparing Jacobian adaptation with cepstral mean normalization and parallel model combination for noise robust speech recognition. Proceedings of Acoustics, Speech, andSignal Processing.2002. 1:13-17
    [5]王光艳.语音信号处理中的数学形态学方法研究:[硕士论文].天津:河北工业大学,2003
    [6]赵力.语音信号处理.北京:机械工业出版社,2003
    [7] Israel Cohen. Optimal Speech enhancement under siganl presence uncertainty using Log-spectral amplitude estimator. IEEE Signal Processing Letters, 2002, 9(4): 113-116
    [8] Nathalie Virag. Signal channel cpeech enhancement based on masking properties of the human auditory system. IEEE Transactions on Speech and Audio Processing, 1999, 7(2): 126-137
    [9]胡航.语音信号处理.哈尔滨:哈尔滨工业大学出版社,2000
    [10] J B Kin, K Y Lee, C W Lee. On the applications of the interacting multiple model algorithm for enhancing noisy speech..IEEE Transaction on Speech and Audio Processing, 2000,8(3): 349-352
    [11]韩纪庆,张磊,郑铁然.语音信号处理.北京:清华大学出版社,2004
    [12] A Mouchtaris, J Vander Spiegel, P Mueller. A spectral conversion approach to the iterative Wiener filter for speech enhancement . Proceedings of IEEE International Conference on Multimedia and Expo. 2004.3: 1971-1974
    [13] I Almajai, B Milner, J Darch, et al. Visually-derived wiener filters for speech enhancement. Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing.2007. 4:585-588
    [14] S Gannot, D Burshtein, E Weinstein. Iterative and sequential Kalman filter based speech enhancement algorithms. IEEE Transaction on Speech and Audio Process, 1998, 6(4):373-385
    [15] Yao Wang, Jiong An, Sethu V, et al. Perceptually motivated pre-filter for speech enhancement using Kalman filtering. Proceedings of International Conference on Information, Communications & Signal Processing. 2007. 1-4
    [16] F Asano, S Hayamizu, T Yamada, et al. Speech enhancement based on the subspace method. IEEE Transactions on Speech and Audio Processing, 2000, 8(5):497-507
    [17] Chiung-Wen Li, Sheau-Fang Lei. Signal subspace approach for speech enhancement in nonstationary noises. Proceedings of International Symposium on Communications and Information Technologies. 2007. 1580-1585
    [18] D Flogeras, R Doraiswami, M E Kaye. A real time spectral subtraction based speech enhancement scheme. Proceedings of IEEE Canadian Conference on Electrical and Computer Engineering. 2003. 2:1071-1074
    [19] M K Hasan, S Salahuddin, M R Khan.A modified a priori SNR for speech enhancement using spectral subtraction rules. IEEE Signal Processing Letters,2004,11(4):450-453
    [20] S Ayat, M T Manzuri, R Dianat, et al. An improved spectral subtraction speech enhancement system by using an adaptive spectral estimator. Proceedings of IEEE Canadian Conference on Electrical and Computer Engineering. 2005. 2:261-264
    [21] Liu Zhibin, Xu Naiping.Speech enhancement based on minimum mean-square error short-time spectral estimation and its realization. Proceedings of IEEE International Conference on Intelligent Processing Systems.1997.2:1794-1797
    [22] R C Hendriks, R Heusdens, J Jensen. An MMSE estimator for speech enhancement under a combined stochastic-deterministic speech model. IEEE Transactions on Audio, Speech, and Language Processing, 2007,15 (2):406-415
    [23] L Varner, T Miller, T Eger. A simple adaptive filtering technique for speech enhancement. Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing. 1983. 8:1126-1128
    [24] J Creighton, R Doraiswami. Real time implementation of an adaptive filter for speech enhancement. Proceedings of IEEE Canadian Conference on Electrical and Computer Engineering.2004. 4:2201-2204
    [25] P Maragos, R W Schafer. Morphological Filters—PartⅠ:Their set-theoretic analysis and relations to linear shift-invariant filters. IEEE Transactions on Acoust Speech, Signal Processing,1987,35(8): 1153-1169
    [26]王津.基于数学形态学的语音识别系统的研究与实现:[硕士论文].天津:河北工业大学,2005
    [27]武睿.数学形态学在语音增强中的应用:[硕士论文].天津:河北工业大学,2003
    [28]蒋刚毅,郑义.基于数学形态滤波的语音信号基音特征提取.声学学报,1998,23(6):522-528
    [29] John F Hemdal, Robert M Lougheed. Morphological approaches to the automatic extraction of phonetic features. IEEE Transactions on Signal Processing, 1991,39(2): 490-497
    [30] Jong Won Seok, Keun Sung Bae.Speech enhancement with reduction of noise components in the wavelet domain. Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing. 1997. 2 :1323-1326
    [31] Yan Zhu, Xue-Yao Li, Ru-Bo Zhang. Adaptive speech enhancement based on wavelet in high noise environment. Proceedings of International Conference on Machine Learning and Cybernetics.2002. 2:885-889
    [32] U Mittal, N Phamdo. Signal/noise KLT based approach for enhancing speech degraded by colored noise.IEEE Transactions on Speech and Audio Processing, 2000, 8(2):159-167
    [33] A Rezayee, S Gazor.An adaptive KLT approach for speech enhancement. IEEE Transactions on Speech and Audio Processing, 2000, 9 (2):87-95
    [34] A Wahab, Tan Eng Chong, H Abut. Robust speech enhancement using amplitude spectral estimator. Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing. 2000.6:3558-3561
    [35] M K Hasan, M S A Zilany, M R Khan, DCT speech enhancement with hard and soft thresholding criteria. Electronics Letters,2002, 38(13):669-670
    [36] Y Ephraim, D Malah, Juang B H. Speech enhancement based upon hidden Markov modeling. Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing. 1989.1:353-356
    [37] D Y Zhao, W B Kleijn. HMM-based gain modeling for enhancement of speech in noise. IEEE Transactions on Audio, Speech, and Language Processing, 2007,15(3):882-892
    [38] S Dhanjal. Artificial neural networks in speech processing: problems and challenges. Proceedings of IEEE Pacific Rim Conference on Communications, Computers and signal Processing. 2001. 2:510-513
    [39]费珍福,王树勋,何凯.分形理论在语音信号端点检测及增强中的应用.吉林大学学报,2005,23(2):139-142
    [40] W B Kleijn. Enhancement of coded speech by constrained optimization. Proceedings of IEEE Workshop Speech Coding.2002.163-165
    [41] Jiang Xiaoping, Fu Hua, Yao Tianren. A single channel speech enhancement method based on masking properties and minimum statistics. Proceedings of International Conference on Signal Processing. 2002.460-463
    [42] Ding Qi, Xu Wang, Xu Jinfu, et al. Speech enhancement based on hearing masking properties and subspace. Proceedings of International Conference on Signal Processing. 2004. 1: 307-310
    [43] N Virag. Single channel speech enhancement based on masking properties of human auditory system.IEEE Transactions on Speech Audio Processing,1999, 7 (2):126-137
    [44] M H Ghoreish, H Sheikzadeh. Hybird Speech enhancement system based on HMM and spectral subtraction. Proceedings of IEEE International Conference on Acoustic, Speech and Signal Processing. 2000. 3:1855-1858
    [45] Ki Yong Lee, McLaughlin S, Shirai K. Speech enhancement based on extended Kalman filter and neural predictive hidden Markov model. Proceedings of the IEEE Signal Processing Society Workshop on Neural Networks for Signal Processing.1996.302-310
    [46] C A Medina, A Alcaim. Wavelet denoising of speech using neural networks for threshold selection. Electronics Letters,2003,39(25):1869-1871
    [47] K Yu, J Mason, J Oglesby. Speaker recognition using hidden Markov models, dynamic time warping and vector quantization. IEEE Proceedings of Vision, Image and Signal Processing, 1995,142(5):313-318
    [48] H Bourouba, M Bedda. Hybrid approach DTW/HMMC for the recognition of the isolated Arabic words. Proceedings of International Conference on Information and Communication Technologies.2004. 481-482
    [49] S B Junior, R C Guido, Chen Sh H, et al. Improved dynamic time rarping based on the discrete wavelet transform. Proceedings of IEEE International Symposium on Multimedia Workshops.2007. 256-263
    [50] K Pan, F Soong, L Rabiner, et al. An efficient vector-quantization preprocessor for speaker independent isolated word recognition. Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing. 1985. 10:874-877
    [51] M Nose, S Maki, N Yamane, et al. N-best vector quantization for isolated word speech recognition. Proceedings of Annual Conference on SICE. 2007. 2058-2063
    [52] J A Arrowood, M A Clements. Extended cluster information vector quantization (ECI-VQ) for robust classification. Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing. 2004.1:889-892
    [53] S Levinson, L Rabiner, M Sondhi. Speaker independent isolated digit recognition using hidden Markov models. Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing. 1983. 8:1049-1052
    [54] L R Rabiner. A tutorial on Hidden Markov Models and selected applications in speech recognition. Proceedings of IEEE,1989,77(2):257-285
    [55] A V Nefian, Liang Luhong, Pi Xiaobo, et al. A coupled HMM for audio-visual speech recognition. Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing. 2002.2013-2016
    [56]战普明,王作英,陆大紟.语音识别隐马尔可夫模型的改进.电子学报,1994,22(1):9-15
    [57]贾宾,朱小燕,罗予频,等.基于状态驻留时间的汉语语音分段概率模型.清华大学学报,2000,40(1):87-90
    [58]蒋冬梅,傅国康,赵荣椿.考虑状态持续时间的改进Viterbi算法及语音识别.西北工业大学学报,2000,18(4):595-599
    [59] P Ramesh, J G Wilpon. Modeling state durations in hidden Markov models for automatic speech recognition. Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing. 1992. 1:381-384
    [60] Y N Boma, F R McInnes, M A Jack. Weighted Viterbi algorithm and state duration modelling for speech recognition in noise. Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing. 1998. 2:709-712
    [61] Y K Park, C K Un, O W Kwon. Modeling acoustic transitions in speech by modified hidden Markovmodels with state duration and state duration-dependent observation probabilities. IEEE Transactions on Speech and Audio Processing,1996, 4(5):389-392
    [62]杜世平,李海.二阶隐马尔可夫模型及其在计算语言学中的应用.四川大学学报,2004,41(2):284-289
    [63] J F Mari, D Fohr, J C Junqua. A second-order HMM for high performance word and phoneme-based continuous speech recognition. Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing.1996.1:435-438
    [64] Hsien-Leing Tsai, Shie-Jue Lee. A neural network model for spoken word recognition. Proceedings of IEEE International Conference on Systems, Man, and Cybernetics. 1997. 5:4029– 4034
    [65] G D Tattersall, S Foster, P Linford. Single layer look-up perceptrons. Proceedings of First IEEE International Conference on Artificial Neural Networks. 1989.148-152
    [66] A Ahad, A Fayyaz, T Mehmood. Speech recognition using multilayer perceptron. Proceedings of IEEE Students Conference. 2002. 1:103-109
    [67] Yuan Ling, Zhou Li-qing, Liu Ze-min. The self-organizing feature map used for speaker-independent speech recognition. Proceedings of International conference on Signal Processing. 1996. 1:733-736
    [68] Phillips W J, Tosuner C, Robertson W. Speech recognition techniques using RBF networks.Proceedings of IEEE Conference on Communications, Power, and Computing. 1995. 185-190
    [69] Greco F, Paoloni A, Ravaioli G. A recurrent time-delay neural network for improved phoneme recognition. Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing. 1991. 81-84
    [70] Tranzai Lee, Daowen Chen. New feedback method of hybrid HMM/ANN methods for continuous speech recognition. Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing. 1998. 509-512
    [71] Kyungmin Na, Soo-Ik Chae. An HMM/MLP hybrid approach for improving discrimination in speech recognition. Proceedings of IEEE International Joint Conference on Neural Networks IEEE World Congress on Computational Intelligence. 1998. 156-159
    [72] P Pujol, S Pol, C Nadeu, et al. Comparison and combination of features in a hybrid HMM/MLP and a HMM/GMM speech recognition system. IEEE Transactions on Speech and Audio Processing, 2005, 13(1):14-22
    [73]李晶皎,孙杰,张俐等.语音识别中HMM与自组织神经网络结合的混合模型.东北大学学报,1999,20(2):144-147
    [74] C Dugast, L Devillers. Incorporating acoustic-phonetic knowledge in hybrid TDNN/HMM frameworks. Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing.1992. 1:421-424
    [75] C Dugast, L Devillers, X Aubert. Combining TDNN and HMM in a hybrid system for improvedcontinuous-speech recognition. IEEE Transactions on Speech and Audio Processing,1994,2(1):217-223
    [76] A Ganapathiraju, J E Hamaker, J Picone. Applications of support vector machines to speech recognition. IEEE Transactions on Signal Processing, 2004,52(8):2348-2355
    [77] Sun Jiping, F Karray, O Basir, et al. Natural language understanding through fuzzy logic inference and its application to speech recognition. Proceedings of the 2002 IEEE International Conference on Fuzzy Systems. 2002. 2:1120-1125
    [78] R Halavati, S B Shouraki, M Eshraghi, et al. A novel fuzzy approach to speech recognition. Proceedings of Fourth International Conference on Hybrid Intelligent Systems. 2004. 340-345
    [79] J Serra. Image analysis and mathematical morphology. New York:Academic, 1982
    [80]崔毅,图像处理与分析.数学形态学方法及应用.北京:科学出版社,2000
    [81] P Maragos, R W Schafer. Morphological Filters—PartⅡ:Their relations to median, order-statistic, and stack filters. IEEE Transactions on Acoust Speech, Signal Processing, 1987, 35(8):1170-1184
    [82] R Stevenson, G Arce. Morphological filters: Statistical and further syntactic properties.IEEE Transactions on Circuits and Systems. 1987, 34(11):1292-1305
    [83]潘泉.小波滤波方法及应用.北京:清华大学出版社,2005
    [84] S Grace Chang, Bin Yu M. Vattereli.Adaptive wavelet thresholding for image denoising and compression. IEEE Transactions on Image Processing,2000,9(9):1532-1546
    [85] Yang Dali, Xu Mingxing, Wu Wenhu. A noise cancellation method based on wavelet transform. International Symposium on Chinese Spoken Language Processing.2000.211-214
    [86]张爱华,水鹏朗.基于自适应正交滤波器组的去噪新方法.西安电子科技大学学报(自然科学版). 2004,31(5):682-687
    [87]杨关良,刘磊.基于小波模极大值的自适应阈值图像去噪算法.海军工程大学学报,2007,19(2):44-47
    [88] Shahnaz Celia, Zhu Wei-Ping, Ahmad M. A time-domain pitch extraction scheme for noisy speech signals. Canadian Acoustics,2007,35(3):114-115
    [89] Pelle, Patricia A. A robust pitch extraction system based on phase locked loops. IEEE International Conference on Acoustics, Speech and Signal Processing. 2006. I249-I252
    [90] Jo Wangrae, Kim Jongkuk, Myung Jin. A study on pitch detection in time-frequency hybrid domain. Lecture Notes in Computer Science. Computational Linguistics and Intelligent Text Processing, 2005,437-440
    [91] Kamrul Hasan, Md Hussain, Shahed, et al. Signal reshaping using dominant harmonic for pitch estimation of noisy speech. Signal Processing,2006,86(5):1010-1018
    [92] Charalampidis Dimitrios, Kura Vijay B. Novel wavelet-based pitch estimation and segmentation of non-stationary speech.. 7th International Conference on Information Fusion. 2005.1383-1387
    [93] Bernadin Shonda L, Foo Simon Y. Wavelet processing for pitch period estimation. Proceedings of the Annual Southeastern Symposium on System Theory. 2006. 426-429
    [94] W W Hung. Use of fuzzy weighted autocorrelation function for pitch extraction from noisy speech. Electronics Letters,2002,38(19):1148-1150
    [95]张宏林.Visual C++数字图像模式识别技术及工程实践.北京:人民邮电出版社,2003
    [96]何强,何英.Matlab扩展编程.北京:清华大学出版社,2002
    [97]杨行俊,迟惠生.语音信号数字处理.北京:电子工业出版社,1995
    [98]易克初,田斌,付强.语音信号处理.北京:国防工业出版社,2000
    [99] Fan Yingle, Li Yi, Wu Chuanyan. Speech endpoint detection based on speech time-frequency enhancement and spectral entropy. 27th Annual International Conference of the Engineering in Medicine and Biology Society. 2005. 4682-4684
    [100]李晔,张仁智,崔慧娟.低信噪比下基于谱熵的语音端点检测算法.清华大学学报,2005,45(10):1397-1400
    [101]胡光锐,韦晓东.基于倒谱特征的带噪语音端点检测.电子学报,2000,28(10):95-97
    [102]李凯,徐强樯,左万利.基于分形特征变化的语音端点检测技术研究.小型微型计算机系统,2007,28(8):1523-1526
    [103]陈振标,徐波.基于子带能量特征的最优化语音端点检测算法研究.声学学报,2005,30(2):171-176
    [104]朱杰,韦晓东.噪声环境中基于HMM模型的语音信号端点检测方法.上海交通大学学报, 1998,32(10):14-17
    [105]陈裴利,朱杰.一种新的基于自相关相似距离的语音信号端点检测方法.上海交通大学学报,1999,33(9):1097-1099
    [106]李祖鹏,姚佩阳.一种语音段起止点端点检测新方法.电讯技术,2000,3:68-71
    [107]黄宏涛.一种基于Gaussian函数的双向选择径向基函数神经网络算法.计算机科学,2007,34(7):211-213
    [108]许东.基于MATLAB6.x的系统分析与设计——神经网络.西安:西安电子科技大学出版社,2003
    [109]飞思科技产品研发中心.MATLAB6.5辅助神经网络分析与设计.北京:电子工业出版社,2003
    [110]韩敏,郭伟,王金城.径向基函数神经网络的全监督算法.仪器仪表学报,2004,25(4):454-457
    [111]李晶皎.模式识别.北京:电子工业出版社,2004
    [112] D F Specht. The general regression neural network redidcovered. Neural Networks,1992,6(7): 1033-1034
    [113]侯祥林,张春晖,徐心和.多层神经网络共轭梯度优化算法及其在模式识别中的应用.东北大学学报,2002,23(1):20-23
    [114]阎平凡.人工神经网络与模拟进化计算.北京:清华大学出版社,2005
    [115] Rudy Setion. A penalty-function approach for pruning feedforward neural networks. NeuralComputation,1997,9(1):185-204
    [116] J Bode, X Liang, et al. Inserting background knowledge in perceptrond through modified of the learning algorithm. Proceedings of IEEE International Conference on Neural Network. 1995.807-812
    [117] Xun Liang. Removal of hidden neurons in multilayer percetrons by orthogonal projection and weight crosswise propagation. Nerual Computing and Applications, 2006,16(1):57-68
    [118] Zeng Xiaoqin, Daniel S Yeung. Hidden neuron pruning of multilayer perceptrons using a quantified sensitivity measure. Neurocomputing, 2006,69(7-9):825-837
    [119] J Barry Gomm, Ding Li Yu. Selecting radial basis function network centers with recursive orthogonal transactions least squares training. IEEE Transactions On Networks, 2000, 11(2): 306-314
    [120] D F Specht . Probabilistic neural networks. Neural Networks,1990, 3:109-118

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700