基于计算听觉场景分析的单通道语音分离研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
单通道语音分离系统能够在单一信道内将目标语音从嘈杂的背景干扰中提取并分离出来,常作为语音识别与说话人识别的前端模块。而基于计算听觉场景分析(Computational Auditory Scene Analysis,简称CASA)的语音分离系统能够通过计算机模拟人耳对目标语音的感知和跟踪现象,完成单通道语音分离的任务。由于其语音分离过程更接近于人耳对混合语音的感知分离过程,因此近年来该课题逐渐成为语音分离领域的研究热点。
     本文对计算听觉场景分析课题进行深入研究,详细介绍了基于CASA原理的单通道语音分离系统的结构和发展背景,并且在传统CASA系统的基础上提出了一套改进的语音分离系统。本文主要创新点如下:
     (1)基于改进阂值的有效能量特征提取。在对自然语音的浊音信号进行提取分离时,能量是重要的信号特征。传统CASA系统在计算有效能量特征时采取同一阂值,但由于噪音信号的不确定性与多样化,当混合语音中所含的噪音数据分布规律未知时,背景噪声对各频率信道有效能量特征的干扰将具有差异性,而传统恒定阈值无法有效地剔除干扰噪声单元。因此,本文采取基于平均信道能量的改进阂值方法对每个信道的时频域响应能量进行提取,提高了有效能量特征提取的精确性。
     (2)基于目标源单元的迭代基音估计算法。传统基音估计算法在进行基音估计时没有剔除干扰源单元,而是直接基于信道中的所有单元的自相关响应进行基音频率计算,导致基音计算结果具有一定的误差。本文提出的改进基音算法仅针对于已标记的目标源单元进行基音计算,首先将标记为干扰源的单元剔除,仅从估计的目标源单元中提取基音,之后再根据估计的基音轨迹进行新一轮的目标源单元标记。该算法对目标源单元标记和基音估计两个步骤进行迭代计算,直到每个浊音段的各帧基频达到稳定为止。实验证明,该算法能够提高基音估计的鲁棒性,改进了含噪环境下的传统基音提取算法。
     (3)基于谱减的改进清音分离方法。在提取了具有基音周期特征的浊音信号之后,需要将清音信号从残余干扰噪声中进一步提取出来。根据噪声信号分布的不确定性和不稳定性,本文提出了基于谱减的改进清音分离方法,通过距离加权的残余噪声估计算法得到每个清音单元中所包含的噪声能量,之后对每个清音单元进行谱减算法并标记,剔除残余噪声单元,提取出清音信号。该方法对具有时变性的残余噪声估计结果更加精确,能够提高清音分离的有效性。
     (4)基于形态学图像处理的掩码平滑。聚类后的二值掩码图被用于最终的语音重构。由于含噪情况下基音提取与目标源标记存在着不可避免的误差,导致二值掩码图中经常包含零星的残余噪声点与破损的语音段,这将会大大影响重构语音的质量与可懂度。为了降低和消除该问题对重构语音造成的影响,本文提出了基于形态学图像后处理的掩码平滑方法,该方法对聚类后的二值掩码图进行平滑处理,通过膨胀,腐蚀等形态学图像处理算法的有效结合处理,能够在不破坏图像细节信息的情况下对二值掩码图进行有效地去噪修补,从而进一步提高了分离语音的质量。
Monaural speech segregation system is able to extract the target speech from noisy environment in a single channel. It's usually the front end of speech and speaker recognition. Speech segregation system based on Computational Auditory Scene Analysis can simulate human auditory system and extract the target speech by computer to accomplish monaural speech segregation. Since its processing of the mixture speech is similar to the human perception processing of sound, the topic has been one of the most hot research issues in speech segregation field in the recent years.
     This dissertation studies the CASA topic, introduces the structure and history of CASA-based speech segregation system and proposes an improved monaural speech segregation system. The main contributions of this dissertation are presented as follows:
     (1) We propose an improved threshold selection technique for energy extraction. Response energy is an important auditory feature for speech segregation. Conventional method uses a constant value for energy extraction. As the types of noise are various and unknown, the interferences of different types of noise will differ in each channel. Conventional threshold is not able to remove the noise units effectively, so this paper proposes an improved threshold selection method for each channel based on its average mixture speech energy. The proposed method can remove background intrusion effectively and yield a significant improvement in energy extraction.
     (2) We propose an improved iterative pitch tracking algorithm based on the estimated target source. Conventional pitch tracking algorithm doesn't remove interference when detecting the target pitch, which will inevitably cause the errors of pitch estimates. The proposed pitch tracking algorithm only estimates the pitch periods based on the labeled target units. It first removes the interference units, computes the pitch periods of each frame and then labels the target units repeatedly based on estimated pitch contours. It estimates the target units and detects the pitch periods iteratively until the pitch contours become stable. The experiment results show that the proposed algorithm performs more robust and accurate than conventional pitch tracking method under various interferences environments.
     (3) We propose an improved method for unvoiced speech segregation. After voiced segregation, unvoiced speech needs to be extracted from residue noise. The proposed method extracts the unvoiced speech based on spectral subtraction. We estimate the noise energy in each unvoiced segments based on distance-weighted noise estimation algorithm. Then spectral subtraction is applied to extract and label the target unvoiced units. The proposed method performs better than conventional one while handling the time-varying noise situations. It improves the accuracy of noise estimation and yield a better performance for unvoiced speech segregation.
     (4) We introduce morphological image processing technique to improve the mask smoothing module. The mask obtained after grouping is used for speech resynthesis. As the mask usually contains residue noise particles and broken auditory segments due to the errors of pitch tracking and target units labeling, which will degrade the quality of the resynthesized speech, the proposed method based on morphological image processing is applied to solve this problem. It can remove the unwanted particles and complement the broken auditory elements while maintaining the original mask details through the effective combination of dilation and erosion processing, further enhancing the quality of segregated speech.
引文
[1]陈雪勤,赵鹤鸣,陈小平.基于计算听觉场景分析的强噪声背景下基音检测方法[J].电路与系统学报,2003,8(3):128-131
    [2]Bregman A. S. Auditory Scene Analysis:the Perceptual Organization of Sound[M].MA USA:MIT Press,1990.
    [3]王忠文,谢丽萍,梁杰豪.基于计算听觉场景分析的混叠语音分离算法[J].电声技术,2008,32(10):55-59.
    [4]赵力.语音信号处理[M].北京:机械工业出版社,2009.
    [5]王学林,周健军,凌玲,胡于进.含主动耳蜗的人耳传声有限元模拟[J].振动与冲击,2012,31(21):41-45.
    [6]Gan R.Z., Reeves B. P, Wang X. L.Modeling of Sound Transmission from Ear Canal to Cochlea [J]. Annals of Biomedical Engineering,2007,35:2180-2195.
    [7]Kim Y, Xin J. A Two-dimensional Nonlinear Nonlocal Feedforward Cochlear Model and Time Domain Computation of Multitone Interactions[J]. Multiscale Model Simulation,2005,4(2):664-690.
    [8]Stenfelt S, Puria S, Hato N. Basilar Membrane and Osseous Spriral Lamina Motion in Human Cadavas with Air and Bone Conduction Stimuli [J]. Hearing Research,2003,181: 131-143.
    [9]McEwan Alistairvan, Schaik Andre. An Analogue VLSI Implementation of the Meddis Inner Hair Cell Model [J]. EURASIP Journal on Advances in Signal Processing, 2003:356262.
    [10]DeLiang Wang, G.J.Brown. Computational Auditory Scene Analysis[M]. New Jersey, USA:IEEE Press,2006.
    [11]Andrew R.A, Nelson Cowan, Micheal F.Bunting.The Cocktail Party Phenonminon Revisited:The Importance of Working Memory Capacity[J]. Psychonomic Bulletin & Review,2001,8(2):331-335.
    [12]Brungart D.S., Chang P.S., Simpson B.D, Deliang Wang. Multitalker Speech Perception with Ideal Time-frequency Segregation:Effects of Voice Characteristics and Number of Talkers [J]. Journal of the Acoustical Society of America,2009(125):4006-4022.
    [13]Han K. and Deliang Wang. A Classification based Approach to Speech Segregation[J]. Journal of the Acoustical Society of America,2012(132):3475-3483.
    [14]Guoning Hu. Monaural speech organization and segregation, [D]. PhD Thesis, The Ohio State University,2006.
    [15]P. Ladefoged. Vowels and Consonants [M]. Oxford, UK:Blackwell,2001.
    [16]F. Tamburini, C.Caini.An Automatic System for Detecting Prosodic Prominence in American English Continuous Speech[J]. International Journal of Speech Technology, 2005,8(1):33-44.
    [17]DeLiang Wang, Guoning Hu. Unvoiced Speech Segregation[J]. IEEE International Conference on Acoustics, Speech and Signal Processing, Toulouse:IEEE Press, 2006:953-956.
    [18]杨海滨,张军.基于模型的单通道语音分离综述[J].计算机应用研究,2010,27(11):4025-4031.
    [19]P.Chang. Exploration of Behavioral Physical and Computational Approaches to Auditory Scene Analysis[D]. Master Thesis, The Ohio State University, Department of Computer Science and Engineering,2004.
    [20]Narayanan A. and DeLiang Wang. A CASA based System for Long-term SNR Estimation [J]. IEEE Transactions on Audio, Speech, and Language Processing, 2012(20):2518-2527.
    [21]黄秀轩.混叠语音的计算听觉场景分析研究[D].博士论文,华南理工大学,2004.
    [22]Srinivasan S. and DeLiang Wang. A Model for Multitalker Speech Perception[J]. Journal of the Acoustical Society of America,2008 (124):3213-3224.
    [23]李云洁.混叠语音的听觉场景分析系统研究[D].硕士论文,华南理工大学,2005.
    [24]C.Bay, S.McAdams. Schema-based Processing in Auditory Scene Analysis [J]. Perception and Psychophysics,2002,64:844-854.
    [25]DeLiang Wang,Guy J. Brown. Computational Auditory Scene Analysis [C]. USA:IEEE Press,2006.
    [26]赵鹤鸣,朱美虹,陈雪勤等.基于声音定位和听觉掩蔽效应的语音分离研究[J].电子学报,2005,33(1):158-160.
    [27]DeLiang Wang. Time-frequency Masking for Speech Separation and Its Potantial for Hearing Aid Design [J].Trends in Amplification,2008,12(4):332-353.
    [28]N.Roman, DeLiang Wang, G.J.Brown. Speech Segregation based on Sound Localization [J]. Journal of the Acoustical Society of Ameica,2003,114 (4):2236-2252.
    [29]Roman N., DeLing Wang. Binaural Tracking of Multiple Moving Sources[J]. IEEE Transactions on Audio, Speech, and Language Processing,2008(16):728-739.
    [30]王卫华,黄凤岗.基于计算听觉场景分析的语音盲分离方法[J].哈尔滨工程大学学报,2008,29(4):395-399.
    [31]M. P. Cooke. Modelling Auditory Processing and Organization [D]. PhD Thesis, University of Sheffield,1991.
    [32]T. Nakatani, H.G.Okuno,T.Kawabata. Residue-driven Architecture for Computational Auditory Scene Analysis [J]. IEEE International Conference on Acoustics, Speech and Signal Processing, Atlanta:IEEE Press,1996:653-656.
    [33]DeLiang Wang, G.J.Brown. Separation of Speech from Interfering Sounds using Oscillatory Correlation [J].IEEE Transactions on Neural Networks,1999,10(3):684-697.
    [34]Guoning Hu, DeLiang Wang. Monaural Speech Segregation Based on Pitch Tracking and Amplitude Modulation[J]. IEEE Transactions on Neural Networks,2004,15 (5):1135-1149.
    [35]Srinivasan S.,Deliang Wang. A Schema-based Model for Phonemic Restoration. Speech Communication,2005(45):63-87.
    [36]Guoning Hu, DeLiang Wang.An Auditory Scene Analysis Approach to Monaural Speech Segregation[J].Topics in Acoustic Echo and Noise Control, New York:Springer, 2006:485-515.
    [37]Yun-Kyung Lee, Oh-Wook Kwon. Application of Shape Analysis Techniques for Improved CASA-Based Speech Separation [J]. IEEE Transactions on Consumer Electronics,2009,55(1):146-149.
    [38]张君昌,刘红,姜菲.基于清浊音分离的优化小波去噪方法[J].计算机工程与应用,2009,45(31):130-133.
    [39]DeLiang Wang., Kjems U., Pedersen M.S., Boldt J.B., and Lunner T. Speech Perception of Noise with Binary Gains [J]. Journal of the Acoustical Society of America,2008(124): 2303-2307.
    [40]Ke Hu, DeLiang Wang. Incooperating Spectral Substraction and Noise Type for Unvoiced Speech Segregation[J]. In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, Taipei:IEEE Press,2009:4425-4428.
    [41]Ke Hu, Deliang Wang. SVM-based Separation of Unvoiced-voiced Speech in Cochannel Conditions[J]. In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, Japan:IEEE Press,2012:4545-4548.
    [42]Woodruff J. and Wang D.L. Integrating Monaural and Binaural Analysis for Localizing Multiple Reverberant Sound Sources[J]. In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, Dallas:IEEE Press,2010: 2706-2709.
    [43]Jan T., Wang W., Wang D.L. A Multistage Approach for Blind Separation of Convolutive Speech Mixtures[J]. In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, Taipei:IEEE Press,2009:1713-1716.
    [44]Wang Y. and Wang D.L. Towards Scaling up Classification-based Speech Separation[J]. IEEE Transactions on Audio, Speech, and Language Processing,2013(21):1381-1390.
    [45]Jin Z. and Wang D.L. A multipitch Tracking Algorithm for Noisy and Reverberant Speech [J]. In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, Dallas:IEEE Press,2010:4218-4221
    [46]Y.Wang, K.han,DeLiang Wang.Exploring Monaural Features for Classification-based Speech Segregation[J]. IEEE Transactions on Audio,Speech and Language Processing, 2013,21:270-279.
    [47]Yang Shao, Soundararajan Srinivasan, Zhaozhang Jin, et al. A Computational Auditory Scene Analysis System for Speech Segregation and Robust Speech Recognition [J]. Computer Speech and Language,2010,24:77-93.
    [48]Yipeng Li, DeLiang Wang, Musical Sound Separation based on Binary Time-frequency Masking [J]. EURASIP Journal on Audio, Speech and Music Processing,2009: No.130567.
    [49]Hsu C.L., Wang D.L., Jang J. S.R., and Hu K. A Tandem Algorithm for Singing Pitch Extraction and Voice Separation from Music Accompaniment [J]. IEEE Transactions on Audio, Speech, and Language Processing,2012 (20):1482-1491.
    [50]Li Y., Woodruff J., Deliang Wang. Monaural Musical Sound Separation based on Pitch and Common Amplitude Modulation [J]. IEEE Transactions on Audio, Speech, and Language Processing,2009(17):1361-1371.
    [51]孟晓辉,肖灵,崔杰.基于听觉感知模型的多通道语音增强系统[J].计算机工程,2010,36(13):9-12.
    [52]赵鹤鸣,朱美虹,俞一彪等.一种适于计算听觉场景分析的混叠语音基音检测方法[J].电子学报,2003,31(1):1-4.
    [53]张学良,刘文举,李鹏,徐波.改进谐波组织规则的单通道浊音分离系统[J].声学学报,2011,36(1):88-95.
    [54]Hohmann V. Frequency analysis and synthesis using a Gammatone filterbank[J]. Acta Acustica United with Acustica,2002,88(3):433-442.
    [55]A.G Katsiamis, EM Drakakis, RF Lyon. Practical Gammatone-like Filters for Auditory Processing [J]. EURASIP Journal on Audio, Speech, and Music Processing, 2007:063685.
    [56]Strahl S, Mertins A. Analysis and Design of Gammatone Signal Models[J]. The Journal of the Acoustical Society of America,2009,126:2379.
    [57]R.Meddis. Simulation of Auditory-Neural Transduction:Further Studies[J]. Journal of the Acoustic Society of America,1988,83(3):1056-1063.
    [58]R.Meddis,M.J.Hewitt. Virtual Pitch and Phrase Sensitivity of Computer Model of the Auditory Periphery[J]. Journal of the Acoustic Society of America,1991,89(6): 2866-2882.
    [59]Thomas F. Quatieri. Discrete-time Speech Signal Processing:Principles and Practice[M]. USA:Prentice Hall,2002.
    [60]J.Bird,C.J.Darwin.Effects of Difference in Fundamental Frequency in Separation Two Sentences[R].In Psychophysical and Physiological Advances in Hearing,London,1997.
    [61]R.P.Carlyon, T.M.Shackleton.Comparing the Fundamental Frequencies of Resolved and Unresolved Harmonics:Evidence for Two Pitch Mechanisms[J]. Journal of the Acoustic Society of America,1994,95:3541-3554.
    [62]王珊,许刚.基于计算听觉场景分离的讲音混叠信号分离[J].计算机工程,2007,33(18):211-213.
    [63]Chen S, Gong Q, Jin H. Gammatone Filter bank to Simulate the Characteristics of the Human Basilar Membrane[J]. Journal of Tsinghua University (Science and Technology), 2008,6:1-35.
    [64]王雨,林家骏,袁文浩.基于计算听觉场景分析的语音增强改进算法[J].华东理工大学学报,2012,38(5):617-621.
    [65]Guoning Hu, DeLiang Wang.Auditory Segmentation Based on Onset and Offset Analysis[J]. IEEE Transactions on Speech and Audio Processing,2007,15:396-405.
    [66]K. Chen, DeLiang Wang,X. Liu.Weight Adaption and Oscillatory Correlation for Image Segmentation[J].IEEE Transactions on Neural etworks,2000,11:1106-1123.
    [67]Guoning Hu, DeLiang Wang. Segregation of Unvoiced Speech from Nonspeech Interference[J]. Journal of the Acoustic Society of America,2008,124:3123-3224.
    [68]J.Weickert. A Review of Nonlinear Diffusion Filtering[J].Scale-space Theory in Computer Vision. Berlin:Springer,1997:3-28.
    [69]Guoning Hu, DeLiang Wang.Separation of Fricatives and Affricates [J]. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing. Philadelphia, USA:IEEE Press,2005:.85-92,.
    [70]Jin.Z, DeLiang Wang. A Supervised Learning Approach to Monaural Segregation of Reverberent Speech[J]. IEEE Transactions on Speech and Audio Processing,2009,17: 625-638.
    [71]张磊,刘继芳,项学智.基于计算听觉场景分析的混和语音分离[J].计算机工程,2010,36(14):24-31.
    [72]Guoning Hu,DeLiang Wang.A Tandem Algorithm for Pitch Estimation and Voiced Speech Segregation[J]. IEEE Transactions on Audio,Speech and Language Processing, 2010,18(8):2067-2079.
    [73]Inoue, T., Saruwatari, H.,Takahashi Yu,Shikano K.., Kondo K. Theoretical Analysis of Musical Noise in Generalized Spectral Subtraction Based on Higher Order Statistics.[J] IEEE Transactions on Audio, Speech, and Language Processing,2011,19(6):1770-1779.
    [74]肖东,莫福源,陈庚等.码率可调节的高质量语音编码算法[J].哈尔滨工程大学学报,2012,33(8):956-965.
    [75]DeLiang Wang, Kjem U., Pedersen M.S. Speech Intelligibility in Background Noise Ideal Binary Time-frequency Masking [J]. Journal of the Acoustical Society of America, 2009,125:2336-2347.
    [76]Pichevar R, Rouat J. A Quantitative Evaluation of Bio-inspired Sound Segregation Technique for Two-and Three-source Mixtures Sounds[J].Lecture Notes in Computer Science,2004,45:430-435.
    [77]Application Guide for Objective Quality Measurement Based on Recommendations P.862, P.862.1 and P.862.2 [S].ITU-T Rec.P.862,2005.
    [78]Y. Hu, Philipos.C.Loizou. Evaluation of Objective Quality Measures for Speech Enhancement [J]. IEEE Transactions on Audio, Speech and Language Processing,2008, 16(l):229-238.
    [79]F.Bach, M.Jordan. Discriminative Training of Hidden Markov Models for Multiple Pitch Tracking [J]. In Proceedings of the IEEE International Conference on Acoustic,Speech, and Signal Processing. Philadelphia, USA:IEEE Press,2005:489-492.
    [80]Resch B, Nilsson M, Ekman A.Estimation of the Instantaneous Pitch of Speech[J]. IEEE Transactions on Audio Speech Language Process,2007,15(3):813-822.
    [81]志华,齐东旭,杨力华.一种基于Hilbert-Huang变换的基音周期检测新方法[J].计算机学报,2006,29(1):106-115.
    [82]Veprek P, Scordilis M S. Analysis Enhancement and Evaluation of Five Pitch Determination Techniques [J]. Speech Communication,2002,37(3-4):249-270.
    [83]杨胜跃,周宴字,黄深喜.语音信号端点检测方法与展望[J].信息技术,2005,24(7):5-8.
    [84]王炳锡,屈丹,彭煊.实用语音识别基础[M].北京:国防工业出版社,2005.
    [85]Sood S., Krishnamurthy A. A robust On-The-Fly Pitch Estimation Algoritlam[J]. In Proceedings of ACM Multimedia,2004:280-283.
    [86]胡航.语音信号处理[M].哈尔滨:哈尔滨工业大学出版社,2009.
    [87]张文耀,许刚,王裕国.循环AMDF及其语音基音周期估计算法[J].电子学报,2003,31(6):886-890.
    [88]刘健,郑方,邓菁等.基于混合幅度差函数的基音提取算法[J].电子学报,2006,34(10):1925-1928.
    [89]胡光锐,韦晓东.基于倒谱特征的带噪语音端点检测[J].电子学报,2000,28(10):95-97.
    [90]曾毓敏,吴镇扬.基于倒谱修正模型的语音基音检测算法[J].南京理工大学学报,2007,31(4):503-508.
    [91]朱建伟,孙水发,但志平,雷帮军.改进的功率谱二次处理基音检测法[J].计算机工程与科学,2010,32(5):140-143.
    [92]金学成,汪增福.基于线性预测残差倒谱的基音周期检测[J].模式识别与人工智能,2008,21(1):104-110.
    [93]Alain de Cheveigne, Hideki Kawahara.YIN, a Fundamental Frequency Estimator for Speech and Music[J]. Journal of the Acoustical Society of America,2002,111(4): 1917-1930.
    [94]Roman N.,DeLiang Wang. Pitch-based Monaural Segregation of Reverberant Speech[J]. Journal of the Acoustical Society of America,2006,120:458-469.
    [95]李正征.基于听觉场景分析的抗噪声语音识别研究[D].博士论文,南开大学,2008.
    [96]Wu M., DeLiang Wang.A Pitch-based Method for the Estimation of Short Reverberation Time[J]. Acta Acustica United with Acustica,2006,92:337-339.
    [97]P. Boersma, D.Weenink. Praat:Doing phonetics by computer. http://www.fon.hum.uva.nl/praat/
    [98]Guoning Hu,DeLiang Wang.Saparation of Stop Consonants[J]. In Proceedings of IEEE International Conference on Acoustic,Speech,and Signal Processing, Hong Kong:IEEE Press,2003:749-752.
    [99]Fox J.J., Jayaprakash C, DeLiang Wang et al. Synchronization in Relaxation Oscillator Networks with Conduction Delays[J].Neural Computation.2001,13:1003-1021.
    [100]W. Kim, R. Stern. Mask Classification for Missing-feature Reconstruction for Robust Speech Recognition with Unknown Background Noise[J]. Speech Communication,2011, 53(1):1-11.
    [101]R.Kumaresan,A.Rao. Model-based Approach to Evenlope and Positive Instantaneous Frequency Estimation of Signals with Speech Applications[J].Journal of the Acousitical Society of America,1999,105:1912-1924.
    [102]Woodruff J., DeLiang Wang.Sequential Organization of Speech in Reverberant Environments by Integrating Monaural Grouping and Binaural Localization [J]. IEEE Transactions on Audio, Speech and Language Processing,2010,18:1856-1866.
    [103]A.Ali, J.Van der Spiegel.Acousitic-phonetic Features for the Automatic Classification of Stop Consonants[J]. IEEE Transactions on Audio, Speech and Language Processing, 2001,9:833-841.
    [104]J.Barker,M.Cooke,D.Ellis.Decoding Speech in the Presence of Other Sources[J].Speech Communication,2005,45:5-25.
    [105]M. Seltzer, B. Raj, and R. Stern.A Bayesian Classifier for Spectrographic Mask Estimation for Missing Feature Speech Recognition [J]. Speech Communication,2004, 43(4):379-393.
    [106]G. Hinton. Training Products of Experts by Minimizing Contrastive Divergence [J]. Neural Computation.2002,14(8):1771-1800.
    [107]Ke Hu,DeLiang Wang. Unvoiced speech segregation from nonspeech interference via CAS A and spectral substraction [J].IEEE Transactions on Audio,Speech and Language Processing,2011,19(6):1600-1609.
    [108]Kuldip Paliwal, Kamil Wojcicki, Belinda Schwerin.Single-channel Speech Enhancement Using Spectral Subtraction in the Short-time Modulation Domain[J] Speech Communication,2010,52(5):450-475.
    [109]Y. Hu, P. C.Loizou.A Comparative Intelligibillity Study of Single-microphone Noise Reduction Algorithm[J]. Journal of the Acousitical Society of America,2007, 122(3):1777-1786.
    [110]Hu K. and Wang D.L. An Unsupervised Approach to Cochannel Speech Separation [J]. IEEE Transactions on Audio, Speech, and Language Processing,2013(21):120-129.
    [111]Z.Lin, R.A Goubran, R.M.Dansereau.Noise Estimation Using Speech/Non-speech Frame Decision and Subband Spectral Tracking[J]. Speech Communication,2007, 49:542-557.
    [112]Li Y., DeLiang Wang.On the Optimality of Ideal Binary Time-frequency Masks[J]. Speech Communication,2009,51:230-239.
    [113]Rafael C. Gonzalez, Richard E.Woods, Steven L. Eddins. Digital Image Processing Using Matlab [M]. USA:Prentice Hall,2004.
    [114]Rafael C. Gonzalez, Richard E.Woods.Digital Image Processing[M].USA:Prentice Hall,2002.
    [115]崔屹.图像处理与分析—数学形态学方法及应用[M].北京:科学出版社,2000.
    [116]Valerie De Witte, Stenfan Schulte,Etienne E.Kerre.. Morphological Image iterplolation to Magnify Images with Shape Edges [J]. Image Analysis and Recognition,2006 (4141): 381-393.
    [117]Shan Gao,Duyan Bi.Equivanlence Relation between Intersecting Cortical Model and Binary Mathematic Morphology in Image Processing[J]. IEEE International Conference on Artificial Intelligence and Computational Intelligence. Taipei:IEEE Press,2009: 490-494.
    [118]周祚风,水鹏朗.利用数学形态学和方向窗的小波域双重局部维纳滤波图像去噪算 法[J].电子与信息学报,2008,30(4):885-888.
    [119]M.P.Cooke, P. Green,L.Josifovski,et al.Robust Automatic Speech Recognition with Missing and Unreliable Acoustic Data[J]. Speech Communication,2001,34:267-385.
    [120]陈虎,周朝辉,王守尊.基于数学形态学的图像去噪方法研究[J].工程图学学报,2004,2:116-119.
    [121]Srinivasan S. and Wang D.L. Transforming Binary Uncertainties for Robust Speech Recognition[J]. IEEE Transactions on Audio, Speech, and Language Processing, 2007(15):2130-2140.
    [122]张黄群,于盛林,白银刚.形态学图像去噪中结构元素选取原则[J].数据采集与处理,2008,23(S):81-83.
    [123]M.Slaney,D.Naar, R.F.Lyon.Auditory Model Inversion for Sound Separation[J]. In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, Adelaide:IEEE Press,1994:77-80,.
    [124]S.Furui.Digital Speech Processing, Synthesis and Recognition[M].New York:Marcel Deccker,2001.
    [125]Y. Shao, DeLiang Wang.Sequential Organization of Speech in Computational Auditory Scene Analysis[J].Speech Communication,2009,51:657-667.
    [126]Han K. and Wang D.L. Towards Generalizing Classification based Speech Separation[J]. IEEE Transactions on Audio, Speech, and Language Processing[J],2013(21):166-175.
    [127]Mingyang Wu, DeLiang Wang.A Multipitch Tracking Algorithm for Noisy Speech[J]. IEEE Transactions on Audio, Speech and Language Processing,2003,11(3):229-241.
    [128]Douglas S.Brungart, Peter S.Chang, Brian D.Simpson et al. Multitalker Speech Perception with Ideal Time-frequency Segregation:Effects of Voice Characteristics and Number of Talkers [J]. Journal of the Acoustical Society of America,2009,125(6): 4006-4022.
    [129]S. Srinivasan,DeLiang Wang. Robust Speech Recognition by Integrating Speech Separation and Hypothesis Testing [J]. Speech Communication,2010,52:72-81.
    [130]Peng Li,Yong Guan,Shijin Wang,et al.Monaural Speech Separation Based on MAXVQ and CAS A for Robust Speech Recognition[J]. Computer Speech and Communication, 2010,24:30-44.
    [131]Yang Shao, DeLiang Wang. Model-based Sequential Organization in Cochannel Speech [J]. IEEE Transactions on Audio, Speech, and Language Processing,2006,14(1): 289-297.
    [132]K.Han, DeLiang Wang.A Classification-based Approach to Speech Segregation [J]. Journal of the Acoustical Society of America,2012.132:3475-3483.
    [133]X.Zhao,Y.Shao.DeLiang Wang.CASA-based Robust Speaker Identification[J]. IEEE Transactions on Audio.Speech and Language Processing.2012.20:1608-1616.
    [134]Z.Jin,DeLiang Wang.Reverberant Speech Segregation Based on Multipitch Tracking and Classification[J]. IEEE Transactions on Audio.Speech and Language Processing, 2011,19:2358-2337.
    [135]P.M.Zurek, R.L.Freyman,U.Balakrishnan. Auditory Target Detection in Reverberation [J]. Journal of Acoustical Society of America,2004,115(4):1609-1620.