An unsupervised approach for co-channel speech separation using Hilbert–Huang transform and Fuzzy C-Means clustering

详细信息查看全文

作者：M. K. Prasanna Kumar ; R. Kumaraswamy
关键词：BSS ; SCSS ; Fuzzy C ; Means ; EMD ; IMF ; TFR ; IBM ; HHT
刊名：International Journal of Speech Technology
出版年：2017
出版时间：March 2017
年：2017
卷：20
期：1
页码：1-13
全文大小：
刊物类别：Engineering
刊物主题：Signal,Image and Speech Processing; Social Sciences, general; Artificial Intelligence (incl. Robotics);
出版者：Springer US
ISSN：1572-8110
卷排序：20

文摘

In this paper we discuss an unsupervised approach for co-channel speech separation where two speakers are speaking simultaneously over same channel. We propose a two stage separation process where the initial stage is based on empirical mode decomposition (EMD) and Hilbert transform generally known as Hilbert–Huang transform. EMD decomposes the mixed signal into oscillatory functions known as intrinsic mode functions. Hilbert transform is applied to find the instantaneous amplitudes and Fuzzy C-Means clustering is applied to group the speakers at initial stage. In second stage of separation speaker groups are transformed into time–frequency domain using short time Fourier transform (STFT). Time–frequency ratio’s are computed by dividing the STFT matrix of mixed speech signal and STFT matrix of stage1 recovered speech signals. Histogram of the ratios obtained can be used to estimate the ideal binary mask for each speaker. These masks are applied to the speech mixture and the underlying speakers are estimated. Masks are estimated from the speech mixture and helps in imputing the missing values after stage1 grouping of speakers. Results obtained show significant improvement in objective measures over other existing single-channel speech separation methods.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700