An unsupervised approach for co-channel speech separation using Hilbert–Huang transform and Fuzzy C-Means clustering
详细信息    查看全文
  • 作者:M. K. Prasanna Kumar ; R. Kumaraswamy
  • 关键词:BSS ; SCSS ; Fuzzy C ; Means ; EMD ; IMF ; TFR ; IBM ; HHT
  • 刊名:International Journal of Speech Technology
  • 出版年:2017
  • 出版时间:March 2017
  • 年:2017
  • 卷:20
  • 期:1
  • 页码:1-13
  • 全文大小:
  • 刊物类别:Engineering
  • 刊物主题:Signal,Image and Speech Processing; Social Sciences, general; Artificial Intelligence (incl. Robotics);
  • 出版者:Springer US
  • ISSN:1572-8110
  • 卷排序:20
文摘
In this paper we discuss an unsupervised approach for co-channel speech separation where two speakers are speaking simultaneously over same channel. We propose a two stage separation process where the initial stage is based on empirical mode decomposition (EMD) and Hilbert transform generally known as Hilbert–Huang transform. EMD decomposes the mixed signal into oscillatory functions known as intrinsic mode functions. Hilbert transform is applied to find the instantaneous amplitudes and Fuzzy C-Means clustering is applied to group the speakers at initial stage. In second stage of separation speaker groups are transformed into time–frequency domain using short time Fourier transform (STFT). Time–frequency ratio’s are computed by dividing the STFT matrix of mixed speech signal and STFT matrix of stage1 recovered speech signals. Histogram of the ratios obtained can be used to estimate the ideal binary mask for each speaker. These masks are applied to the speech mixture and the underlying speakers are estimated. Masks are estimated from the speech mixture and helps in imputing the missing values after stage1 grouping of speakers. Results obtained show significant improvement in objective measures over other existing single-channel speech separation methods.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700