用户名: 密码: 验证码:
基于独立分量分析的语音信号分离算法研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
语音信号的分离近年来成为信号处理领域的一个研究热点,它在电话会议、助听器及便携设备、机器的语音识别方面有很多的应用与影响。而盲信号处理的方法常被用于语音分离中去,“盲”是指没有关于源信号本身以及传输信道的知识,盲分离的理论基础是独立分量分析,其可以广泛的被应用于通信、图像、语音、生物医学、雷达、地震、声纳等多种类型信号的处理。本文详细研究了语音分离的基本理论,重点研究卷积混合频域解法模型框架下的语音信号分离算法。本文的主要工作如下:
     基于时域实值瞬时混合模型的盲分离算法已经研究的比较充分,但是在语音信号在现实中往往是卷积混合,而且在频域分离方法中信号是复值的,本文研究了利用复值信号特征的瞬时混合盲分离算法,对不同的复数域盲分离算法进行了编程实现,研究了卷积混合模型频域盲分离中的次序不定问题,提出了一种基于分离矩阵初始化的次序对准算法,并通过实验进行了性能比较和分析。
     卷积混合语音信号的频域盲分离算法的主要思想是利用傅立叶变换将混合模型在时域中复杂的卷积运算转换成频域当中相对简单的乘法运算,把时域中的卷积混合问题简化成为在频域中每一个频点上的瞬时混合分离问题。这样就可以运用目前已经发展相对比较成熟的瞬时盲分离算法,解出各频点的分离矩阵,再用傅立叶反变换,就可以方便地得到时域的解卷积滤波器,从而恢复出源信号。频域盲分离算法会带来普遍难以解决并影响分离效果的幅值不确定性和次序不确定性问题。不确定性是盲分离算法的固有问题,其对于时域信号而言并不会严重的影响到分离结果。但对于频域信号来说,在完成每个频率点的盲分离后,直接对各个频率点的分离矩阵进行逆傅立叶变换,不能够保证每个输出通道对应着同一个源信号的成分,很有可能夹杂了其它信号,从而使解卷积失败。
     解决次序不确定的方法又被称为次序对准算法。传统算法如相关性参数法和波达估计法鲁棒性较差,准确度不高,不能将分离和对准两步联合考虑,在逐频点对准时效果会较大受到前一频点次序对准以及第一步分离效果的影响。本文研究了相关系数法和波达估计法,并编程实现了波达估计法的次序对准算法,对波达估计法的性能以及其对最终分离结果的影响进行了分析;研究了上述次序对准算法存在的问题,并在此基础上提出了一种基于分离矩阵初始化的算法,该算法同样是利用几何信息的方法,可以在分离的过程中实现次序对准,相比传统算法鲁棒性较好,准确度较高,而且将分离和对准两步联合作为整体考虑;从而带来了在主客观评价指标上都较好的最终分离结果。
     频域盲分离的次序不定实际上仍然是一个棘手的问题,本文还介绍了利用信号频域相依性的新的频域盲分离模型,其理论上可以从本质上回避次序不定问题。
Speech separation has been a hot topic in signal processing society recently years, which has many applications and influence in telephone conference, hearing aid, portable devices, speech recognition. Blind signal processing is a useful method in speech separation, in which the term“blind”means that the source itself and the transmission channel is unknown. Independent component analysis is the theoretical basis of blind signal separation, which can be used in various signal processing fields including communications, image, speech, biology, radar, seismic, sonar and etc. The thesis starts with the research of basic theories in BSS and then goes to speech separation algorithms based on convolutive model in frequency domain. The mainly works are following.
     There are many studies in instantaneous mixing model in time and real domain, therefore in real world speech signals are convolved and will be transformed into complex value in frequency domain method. We study algorithms using the complex information to separate complex signals and the permutation problem in frequency domain method of convolution model, propose a new scheme based on separation matrix initialization, and have a performance comparison later.
     The main method of convolutive speech separation in frequency domain is that transforming the convolution in time domain to multiple computations. So the complex time domain convolutive model becomes frequency-domain instantaneous models in each bin. And we can use the well developed instantaneous algorithms to estimate separating matrices. Therefore after the transformation back to time domain, we can get the deconvolution FIR filters and have the estimated sources. But this frequency domain method brings in the permutation problem which causes performance decrease in separation. Let us assume for a couple frequency-domain signals after the separating in each frequency bin, we cannot assure that each output channel only consists of the components from the same source. So if we simply transform the separating matrices into time domain, this will cause the deconvolution failure.
     Solving the permutation problem can be called alignment. Traditional methods such as using mutual parameters and geometric DOA(direct of arrival) can be influenced by permutation in last frequency bin and the first step separation, therefore have not shown good results in robust and accuracy in the scheme that isolating separating step with permutation alignment. The thesis studies permutation alignment algorithms using correlation coefficient and DOA estimation, implements the DOA based method. We also have a study on the performance of DOA method and its influence on the final separating results. We explore the fundamental limitations of the above alignment algorithms and then propose a new scheme based on separation matrix initialization, which is also a geometric approach, consider the separating step together with alignment, can solve the permutation when separation. Because it has better results in robust and accuracy, the final estimated sources are better in both objective and subjective evaluation targets.
     Permutation problem remains being tough and still needs better solutions. The thesis also introduces a new frequency domain blind separation scheme using frequencies dependency, which can in theory probably avoid the permutation.
引文
[1] A. Cichocki, S.Amari, Adaptive Blind Signal Image Processing: Learning Algorithm and Application, John Wiley & Sons, 2002.
    [2]杨行峻等,人工神经网络与盲信号处理,北京,清华大学出版社,2003。
    [3] A. Hyv?rinen, J. Karhunen, E. Oja, Independent Component Analysis, John Wiley & Sons, 2001.
    [4]杨福生等,独立分量分析的原理与应用,北京,清华大学出版社,2006。
    [5] Simon Haykin, Unsupervised Adaptive Filtering; John Willey & Sons, 2000.
    [6] J. Benesty, S. Makino, J. Chen, Speech Enhancement, Springer, 2005.
    [7]易克初等,语音信号处理,北京,国防工业出版社,2000。
    [8] Simon Haykin, Zhe Chen,“The Cocktail Party Problem”, Neural Computation, 2005(17), p.p. 1875-1902.
    [9] Jutten C., Herault J,,“Blind separation of source, part I: An adaptive algorithm based on neuromimetic architecture”, Signal Processing, 1991, Vol.24(1):1-10.
    [10] J. F. Cardoso and A. Souloumiac,“Blind beamforming for non-Gaussian signals,”Proc. Inst. Elect. , 1993, vol. 140, no. 6, pp. 362–370.
    [11] Pierre Comon,“Independent component analysis, A new concept?”, Signal Processing, 1994, Vol.36, p.p.287-314.
    [12] Bell A. J., Sejnowski T. J.,“An information maximisation approach to blind separation and blind deconvolution”, Neural Computation, 1995, 7(6), 1129-1159.
    [13] J. F. Cardoso, Laheld B H.,“Equivariant adaptive source separation”, IEEE Tran. Signal Processing, 1996, 44(10):3017-3030.
    [14] B. A. Pearlmutter and L. C. Parra,“Maximum likelihood blind source separation: a context-sensitive generalization of ICA”, Advanced Neural Information Processing System, 1997, vol. 9, pp. 613–619.
    [15] S. Amari, A.Cichocki and H.Yang,“A New learning algorithm for Blind Signal Separation”, in Advances in Neural Information Processing Systems 8 , MIT press, 1996, pp. 757-763.
    [16] Yang H. H., Amari S.,“Adaptive on-line learning algorithms for blind separation maximum entropy and minimum mutual information”, Neural Computation, 1997, 9(7), 1457-1482.
    [17] T. W. Lee, Girolami,“A unifying information-theretic framework for independent component analysis”, Computers & Mathematics with applications, 2000, 31(11):1-21.
    [18] S. Amari,“Natural Gradient Works Efficiently in Learning”, Neural Computation, 1998, Vol.10, pp. 251-276.
    [19] A. Hyv?rinen, E. Oja,“A fast fixed-point algorithm for independent component analysis”, Neural Computation, 1997, 9(7):1483-1492.
    [20] A. Hyv?rinen,“Fast and robust fixed-point algorithms for independent component analysis,”IEEETrans. Neural Networks, May 1999, vol. 10, no. 3, pp. 626–634.
    [21] E. Bingham and A. Hyv?rinen,“A fast fixed-point algorithm for independent component analysis of complex-valued signals”, Int. J. Neural Systems, 2000, 10(1): pp. 1–8.
    [22] J.-F. Cardoso,“An efficient technique for the blind separation of complex sources”, in Proc. HOS’93, South Lake Tahoe, CA, June 1993, pp. 275–279.
    [23] Platt C., Faggin F.,“Networks for the separation of sources that are superimposed and delayed”, Advances in Neural Information Processing System, 1991, 730-737.
    [24] D. Yellin and E.Weinstein,“Multichannel signal separation: Methods and analysis”, IEEE Trans. on Signal Processing, 1996, 44:106–118.
    [25] Thi H N., Jutten C.,“Blind source separation for convolutive mixtures”, Signal Processing, 1995, 45(2):209-229.
    [26] Tokkola K.,“Blind separation of delayed sources based on information maximization”, in Proc of ICASSP, 1996:3509-3512.
    [27] S. Amari, S. C. Douglas, A. Cichocki and H. Yang,“Multichannel blind deconvolution and equalization using the natural gradient”, ICASSP, 1997, 101-104.
    [28] P. Smaragdis,“Blind separation of convolved mixtures in the frequency domain”, Neurocomputing, 1998, vol. 22, pp. 21–34.
    [29] L. Parra and C. Spence,“Convolutive blind source separation of nonstationary sources”, IEEE Trans. Speech Audio Processing, May 2000 , pp. 320–327.
    [30] Sawada, H., Mukai, R., Araki, S., Makino, S.,“A robust and precise method for solving the permutation problem of frequency-domain blind source separation”,IEEE Transactions on Speech and Audio Processing, Sept. 2004, Volume 12, Issue 5, Page(s):530– 538.
    [31] Shoji Makino, Hiroshi Sawada, Ryo Mukai and Shoko Araki,“Blind Source Separation of Convolutive Mixtures of Speech in Frequency Domain”, IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences, 2005, E88-A(7):1640-1655.
    [32] M. Ikram and D. R. Morgan,“Exploring permutation inconsistency in blind separation of speech signals in a reverberant environment,”in Proc. ICASSP 2000, 2000, vol. II,pp. 041–1044.
    [33] Ikram, M.Z.; Morgan, D.R.,“Permutation inconsistency in blind speech separation: investigation and solutions”, IEEE Transactions on Speech and Audio Processing, Jan. 2005, Volume 13, Issue 1, Page(s):1– 13.
    [34] J. K. Tugnait,“Adaptive blind separation of convolutive mixtures of independent linear signals”, Signal Processing (the EURASIP Journal), vol. 73, Issue 1-2, pp. 139-152, Feb. 1999.
    [35] Mitianoudis, N.; Davies, M.E.,“Audio source separation of convolutive mixtures”, IEEE Transactions on Speech and Audio Processing, Sept. 2003, Volume 11, Issue 5, Page(s):489 - 497
    [36] O. Yilmaz, S. Rickard,“Blind separation of speech mixtures via time-frequency masking”, IEEE Transactions on Signal Processing, 2004, 52(7):1830-1847.
    [37] J?rn Anemüller and Birger Kollmeier,“Amplitude modulation decorrelation for convolutive blind source separation”, Int. Workshop on Independent Components Analysis and Blind Signal Separation, Helsinki, Finland, 2000, pp. 215-220.
    [38] Ciaramella, A., Tagliaferri, R., Amplitude and permutation indeterminacies in frequency domainconvolved ICA, Proceedings of the International Joint Conference on Neural Networks 2003, 20-24 July 2003, vol.1, Page(s):708– 713.
    [39] Mejuto, C., Dapena, A., and Castedo, L.,“Frequency-domain Informax for blind separation of convolutive mixtures”, Int. Workshop on Independent Components Analysis and Blind Signal Separation, Helsinki, Finland, 2000, pp. 315–320.
    [40] Guddeti, R.R.; Mulgrew, B., Perceptually motivated blind source separation of convolutive mixtures, Proc. Of ICASSP '05, 18-23 March 2005, Vol. 5, p.p. 273 -276.
    [41] Araki, S.; Mukai, R.; Makino, S.; Nishikawa, T.; Saruwatari, H.,“The fundamental limitation of frequency domain blind source separation for convolutive mixtures of speech”, IEEE Transactions on Speech and Audio Processing, March 2003, Volume 11, Issue 2,Page(s):109– 116.
    [42] Parra, L.C., Alvino, C.V., Geometric source separation: merging convolutive source separation with geometric beamforming, IEEE Transactions on Speech and Audio Processing, Sept. 2002, Volume 10, Issue 6, Page(s):352– 362.
    [43] Kurita, S.; Saruwatari, H.; Kajita, S.; Takeda, K.; Itakura, F.,“Evaluation of blind signal separation method using directivity pattern under reverberant conditions”, in the Proceedings of ICASSP '00, 5-9 June 2000, vol.5, Page(s):3140 - 3143.
    [44] http://www.elec.qmul.ac.uk/ica2007/links.html
    [45]张贤达,时间序列分析:高阶统计量方法,北京,清华大学出版社,1996。
    [46]杨绿溪,何振亚等,“线性Infomax自组织算法的性能分析”,数据采集与处理,1998,13(4):306-310。
    [47]刘琚,梅良模等,“一种基于非平稳特性的前馈神经网络盲源分离方法”,山东大学学报,1999,34(3):298-303。
    [48]冯大政,史维祥,“一种自适应信号盲分离和盲辨识的有效算法”,西安交通大学学报,1998,32(5):76-79。
    [49]华荣,苏中义,“基于遗传算法过程信号的盲分离”,上海交通大学学报,2001,35(2):319-322。
    [50]林家骏等,“过程信号的盲分离及应用”,化工自动化及仪表,1999,16(5):25-27。
    [51]贾鹏,丛丰裕等,“杂系混合信号的盲分离”,上海交通大学学报,2004,38(2):203-206。
    [52] D. Roy, N. Sawhney, and A. Pentland,“Wearable audio computing: A survey of interaction techniques”, Vision and Modeling Technical Report No.434, MIT Media Lab, Cambridge, 1997
    [53] U M Bae,“Top-down attention to complement independent component analysis for blind signal separation”, Neuro-computing, 49(2002):315-327.
    [54] D. Yellin and E.Weinstein,“Criteria for multichannel signal separation”, IEEE Trans. on Signal Processing, 1996, 44:106–118.
    [55] Tong L, Inouye Y and Liu R,“Waveform-preserving blind estimation of multiple independent sources”, IEEE Trans. On Signal Processing, 1993, 41(7):2461-2470.
    [56]陈华福,尧德中,“独立成分分析的梯度算法及应用”,信号处理,18(5),2001。
    [57] R. Linsker,“A local learning rule that enables information maximization for arbitrary input distributions”, Neural Computation, 1988, 12:1661-1665.
    [58] Lee Te-Won,et al,“Independent component analysis using an extended infomax algorithm for mixed sub-Gaussian and super-Gaussian sources”, Neural computation, 1999, 11(2), 409-433.
    [59] E. Bingham and A. Hyv¨arinen,“A fast fixed-point algorithm for independent component analysis of complex-valued signals”, Int. J. Neural Systems, 2000, 10(1): pp. 1–8.
    [60] Adali, T.; Taehwan Kim; Calhoun, V.;“Independent component analysis by complex nonlinearities”Proc. Of ICASSP '04, 17-21 May 2004, Volume: 5, Pages: 525-8.
    [61] Eriksson, J., Koivunen, V.,“Complex-valued ICA using second order statistics”, Proceedings of the 2004 IEEE Signal Processing Society Workshop on Machine Learning for Signal Processing, 2004, Pages 183-191.
    [62]丛丰裕,雷菊阳等,“在线增强型复值混合信号盲分离算法研究”,西安交通大学学报,2006,40(9):1070-1073。
    [63]丛丰裕,雷菊阳等,“在线复值独立分量分析算法”,上海交通大学学报,2007,41(6):907-910。
    [64] Jan Eriksson, Visa Koivunen,“Complex Random Vectors and ICA Models: Identifiability, Uniqueness and Separability”, IEEE Transactions on Information Theory, Volume 52, Issue 3, March 2006 Page(s):1017– 1029.
    [65] Kaare Brandt Petersen, Michael Syskind Pedersen, The Matrix Cookbook, 2005. http://www.imm.dtu.dk/pubdb/views/edoc_download.php/3274/pdf/imm3274.pdf
    [66] Calhoun, V.; Adali, T.,“Complex Informax: Convergence and Approximation of Informax with Complex Nonlinearities”, Proceedings of the 12th IEEE Workshop on Neural Networks for Signal Processing, 4-6 Sept. 2002, Pages:307– 316.
    [67]贾鹏,丛丰裕等,“非正则复向量的扩展MVDR算法研究”,声学技术,2008,27(1)。
    [68] T.Kim, H. T. Attias, S.-Y. Lee and T.-W. Lee,“Blind source separation exploiting higher-order frequency dependencies”, IEEE Transactions on Speech and Audio Processing, 2007, 15(1):70-79.
    [69] T. Kim, Intae Lee and T.-W. Lee,“Independent Vector Analysis”, Fortieth Asilomar Conference on Signals, Systems and Computers, 2006, 1393-1396.
    [70] Intae Lee and T.-W Lee,“On modelling the frequency components of speech with norm-invariant joint densities”, IEEE International Symposium on Circuits and Systems, 2007, 2982-2985.
    [71] http://www. cnl.salk.edu/~tony/ica.html
    [72] J. Allen, D. Berkeley,“Image method for efficiently simulating small room acoustics”, Journal of the Acoustical Society of America, April 1979, vol. 65, no. 4, pp.943--950.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700