Supervised single-channel speech enhancement using ratio mask with joint dictionary learning

详细信息查看全文

作者：Long Zhang^a ; ^b ; ^{lonzhang@mail.ustc.edu.cn" class="auth_mail" title="E-mail the corresponding author} ; Guangzhao Bao^a ; ^b ; ^{gzbao@mail.ustc.edu.cn" class="auth_mail" title="E-mail the corresponding author} ; Jing Zhang^a ; ^b ; ^{zhj336@mail.ustc.edu.cn" class="auth_mail" title="E-mail the corresponding author} ; Zhongfu Ye^a ; ^b ; ^c ; ^{yezf@ustc.edu.cn" class="auth_mail" title="E-mail the corresponding author}
关键词：Single-channel speech enhancement ; Ratio mask ; Joint dictionary learning ; Joint sparse coding ; Ideal binary mask ; Soft mask
刊名：Speech Communication
出版年：2016
出版时间：September 2016
年：2016
卷：82
期：Complete
页码：38-52
全文大小：1010 K

文摘

A novel structure which combines the advantages of ratio mask (RM) and joint dictionary learning (JDL) is proposed for single-channel speech enhancement in this paper. The novel speech enhancement structure makes full use of the training data and overcomes some shortcomings of generative dictionary learning (GDL) algorithm. RMs of speech and interferer are introduced to provide the discriminative information both in the training stage and enhancement stage of the novel structure. In the training stage, the signals and their corresponding ideal RMs (IRMs) are used to learn the signal and IRM dictionaries jointly by K-SVD algorithm. In the enhancement stage, the mixture signal and mixture RM are sparsely represented over the composite dictionaries composed of the learned signal and IRM dictionaries to formulate a joint sparse coding (JSC) problem. Then, the estimated RMs (ERMs) of speech and interferer in the mixture are calculated to develop two soft mask (SM) filters. The proposed SM filters incorporate ideal binary mask technique and Wiener-type filter to make full use of the discriminative information provided by the ERMs. They are used to both strengthen the speech and suppress the interferer in the mixture. The proposed algorithms have shown their abilities to improve both speech intelligibility and quality. Experimental evaluations verify the proposed algorithms obtain comparable performances to a deep neural network (DNN) based mask estimator with lower computation and perform better than other tested algorithms.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700