基于RBF网络的语音增强研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
实际中,语音信号不可避免地受到各种噪声的干扰,噪声降低了语音质量和可懂度,还可能导致语音处理系统性能的急剧恶化,甚至使整个系统无法正常工作。为了消除噪声干扰,语音处理系统广泛采用语音增强技术来改善语音质量和可懂度,提高系统性能。因此,研究语音增强技术的研究具有重要的意义。
     本论文研究了基于径向基函数(Radial Basis Function,简称RBF)网络的语音增强算法,并重点介绍了在频域上基于双RBF网络的语音增强算法。论文给出了语音增强算法的基本原理、实现方法以及增强效果。主要工作包括:
     1在分析传统时域上神经网络语音增强方法的基础上,提出了一种改进的方法。这种方法能够减轻神经网络的负担并且减少训练时间。在Matlab软件平台实现算法仿真,仿真结果表明该方法能够有效地抑制噪声,大幅度地提升语音信噪比(Signal-Noise Rate,简称SNR)。在加各种噪声条件下,该算法具有增强效果好、适应信噪比范围大、方法简单等优点。
     2在频域上,利用两个训练好的RBF网络分别处理噪声语音的线性预测系数和共振峰参数,并利用这些参数修正语音的频谱包络,然后重建语音。该方法对语音信号的基音频率、频谱斜率、共振峰等语音特征的影响很小,因而能够较好的保留语音信号的频谱结构,使语音品质不致降低。实验结果证明,语音的听觉质量得到很大的改善。
     3采用Mel频率倒谱距离失真度(Mel Cepstrum Distance,简称MCD)测试语音增强效果。实验表明该方法比传统信噪比更多地反映了可懂度信息,能更准确地评价语音增强算法的好坏和有效范围,是更合理的语音增强算法的度量。
In general, speech signals are inevitably corrupted by various noises. These noises degrade the quality and the intelligibility of speech signals, seriously the processing systems couldn’t work well. In order to minimize the effects of the noise on the performance of the processing systems, speech enhancement technology is applied in the various speech processing systems. Consequently the study of speech enhancement technology is very significant.
     This thesis discusses the speech enhancement technologies based on Radial Basis Function (RBF)networks, and focuses on the technologies based on double RBF networks in the frequency-domain. The fundamental and the implementation of the method and their improved forms are presented. Following is the main work of this thesis:
     1. By exploring the traditional methods, a new speech enhancement method based on RBF networks in the time-domain is proposed. This method can reduce the burden of the RBF networks and the training time efficiently. Simulation of the algorithm based on Matlab software is implemented. The results of the simulation prove that proposed method can effectively restrain noise and increase signal-noise rate (SNR). The experiment results indicate that the method can greatly improve the quality and the intelligibility of noisy speech, and have other advantages such as the widely applicable SNR range, less computation load.
     2. In the frequency-domain, double RBF networks are used to cut off the ingredient of noise. The first is used to train the formant coefficients and the second is used to train LPC coefficients. Then the modified spectrum envelope can be estimated by using these coefficients. At last the denoised signal can be reconstructed. The algorithm has it unique advantage. Particularly the method may maintain the preferable accurate of signal in speech waveform, and the speech is retained well, and the quality of speech signals have been improved obviously.
     3. Mel cepstrum distance (MCD) is suggested to evaluate the effect. Experiments show the method is more related with the intelligibility and outperforms the traditional SNR as it can offer more information regarding the applied conditions of enhancement approaches and their relative efficiency.
引文
[1] Le T.T.; Mason J.S., Artificial neural networks for nonlinear time-domain filtering of speech.,Vision, Image and Signal Processing, IEE Proceedings, June 1996,143 (3), 149 - 154
    [2] Moakes P.A.; Beet S.W., Radial basis function networks for noise reduction of speech. Artificial Neural Networks, 1995 Fourth International Conference on 26-28 Jun 1995, 7 - 12
    [3] Murakami T.; Namba M., Speech enhancement based on a combined higher frequency regeneration technique and RBF network. 2002 IEEE Region 10 Conference on Computers, Communications, Control and Power Engineering, Oct. 2002, 1(1), 457 - 460
    [4] Shao C.; Bouchard M., Efficient classification of noisy speech using neural networks. Signal Processing and Its Applications, 2003, Proceedings. Seventh International Symposium, July 2003, 357 - 360
    [5] Parveen S.; Green P., Speech enhancement with missing data techniques using recurrent neural networks. Acoustics, Speech, and Signal Processing, 2004. Proceedings. IEEE International Conference, May 2004, 733-736
    [6] Volkmer M., Neural speech enhancement in the time-frequency domain. Neural Networks for Signal Processing, 2003. IEEE 13th Workshop, Sept 2003, 617 – 626
    [7] Kubichek R., Mel-cepstral distance measure for objective speech quality assessment Communications, Computers and Signal Processing, 1993., IEEE Pacific Rim Conference on Volume 1, 19-21 May 1993, 1, 125 - 128
    [8] Somek B.; Herceg,J., Speech quality assessment. Electronics in Marine, 2004. Proceedings.46th International Symposium, June 2004, 307 - 312
    [9] 林成荫,高大启,改进的 RBF 网络及其参数优化方法,计算机工程与应用, 2004.9,40(18),95-98
    [10] 张雄伟,陈亮,杨吉斌等,现代语音处理技术及应用,北京:机械工业出版社 2003.8
    [11] 赵力,语音信号处理,北京:机械工业出版社,2003.3
    [12] 飞思科技产品研发中心,神经网络理论与 MATLAB 7 实现,北京:电子工业出版社, 2005.3
    [13] 董长虹,MATLAB 神经网络与应用,北京:国防工业出版社,2005.1
    [14] 蔡莲红,黄德智,蔡锐等,现代语音技术基础与应用,北京:清华大学出版社,2003.11
    [15] 赵胜辉,刘家康,谢湘等译,离散时间语音信号处理原理与应用,北京:电子工业出版社,2004.8
    [16] 张志涌等,精通 MATLAB 6.5 版,北京:北京航空航天大学出版社, 2003.3
    [17] 龚文凌,王洪澄,神经网络在语音信号消噪处理中的应用,计算机应用与软件,2005.2,22(2),73-75
    [18] 何强,何英,MATLAB 扩展编程,北京:清华大学出版社, 2002.6
    [19] 李宏伟,段艳丽,郭英,基于帧间重叠谱减法的语音增强算法及实现,空军工程大学学报,2001.10,2(5),48-50
    [20] 王炳锡,王洪,变速率语音编码,西安:西安电子科技大学出版社,2004.6
    [21] 蔡莲红,杨鸿武,吴志勇等译,语音合成,北京:机械工业出版社, 2005.3
    [22] 苏金明,张莲花,刘波等,MATLAB 工具箱应用,北京:电子工业出版社, 2004.1
    [23] Sheng-Nan Wu; Jeen-Shing Wang, An adaptive recurrent neuro-fuzzy filter for noisy speech enhancement。Neural Networks, 2004. Proceedings. 2004 IEEE International Joint Conference, 25-29 July 2004, 4, 3083 - 3088
    [24] Poluzzi R.; Arnone L.; Savi, A., Neuro-fuzzy filtering techniques for automatic speech recognition enhancement。Intelligent Signal Processing, 2003 IEEE International Symposium on 4-6 Sept 2003, 255 - 258
    [25] Potamitis I.; Fakotakis N.D.; Kokkinakis G., Impulsive noise suppression using neural networks. Acoustics, Speech, and Signal Processing, 2000. ICASSP '00. Proceedings. 2000 IEEE International Conference on Volume 3, 5-9 June 2000, 1871-1874
    [26] 董长虹,MATLAB 信号处理与应用,北京:国防工业出版社,2005.1
    [27] 顾明亮,王太君,何振亚,语音信号时间动态规正新方法,东南大学学报 1998.3,28 (2): 10-14
    [28] 韩纪庆,张磊,郑铁然等,语音信号处理,北京:清华大学出版社, 2004.9
    [29] Kaluzny P.; Kuklinski S., Properties of cellular neural networks in selected image processing applications. Cellular Neural Networks and their Applications, 1990. CNNA-90 Proceedings, 1990 IEEE International Workshop, Dec. 1990, 112 - 113
    [30] Lim Ee Hui; Seng K.P., RBF neural network mouth tracking for audio-visual speech recognition system. TENCON 2004. 2004 IEEE Region 10 Conference, 21-24 Nov. 2004, 84 - 87
    [31] Murakami T.; Namba M.; Hoya T., Speech enhancement based on a combined higher frequency regeneration technique and RBF networks. TENCON '02. Proceedings. 2002 IEEE Region 10 Conference on Computers, Communications, Control and Power Engineering Oct. 2002, 1, 457 - 460 58
    [32] Fei Xie; Van Compernolle,D., A family of MLP based nonlinear spectral estimators for noise reduction。Acoustics, Speech, and Signal Processing, 1994. ICASSP-94,1994 IEEE International Conference on Volume ii, 19-22 April 1994, 2, 53-56
    [33] 李琪,李晖,吴国平, 基于 BP 神经网络的数字语音去噪方法研究,中国有线电视,2004 年第 19 期 12-14
    [34] Dimolitsas S., Objective speech distortion measures and their relevance to speech quality assessments,Communications, Speech and Vision, IEE Proceedings, Oct 1989, 136(5), 317 - 324
    [35] 翟宜峰,李鸿雁,用遗传算法优化神经网络初始权重的方法,吉林大学学报,2003.5,33(2),45-50
    [36] 赵光,贾林飞,王冬霞等,基于神经网络的麦克风阵列语音增强算法,辽宁工学院学报,2005.6, 25 (3), 146-148
    [37] 姚 峰 英 , 张 敏 , 用 于 语 音 增 强 的 高 频 信 噪 比 度 量 , 声 学 学 报 ,2002.9,9(5),405-408
    [38] 罗四维,大规模人工神经网络理论基础,北京:清华大学出版社,2004.2
    [39] 智会强,牛坤,田亮,BP 网络和 RBF 网络在函数逼近领域内的比较研究, 科技通报,2005.2,21(2),193-197
    [40] 张玉瑞,陈剑波,基于 RBF 神经网络的时间序列预测,计算机工程与应用, 2005.5, 41(11),74-77

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700