Single-channel Speech Separation Using Dictionary-updated Orthogonal Matching Pursuit and Temporal Structure Information

设为首页

收藏本站

网站地图 | English | 公务邮箱

About the library

Background
History
Leadership
Organization

Readers' Guide

Opening Hours
Collections
Help Via Email

Publications

Electronic Information Resources

Single-channel Speech Separation Using Dictionary-updated Orthogonal Matching Pursuit and Temporal Structure Information

详细信息查看全文

作者：Haiyan Guo ; Xiaoxiong Li ; Lin Zhou ; Zhenyang Wu
关键词：Single ; channel speech separation (SCSS) ; Sparse decomposition ; Orthogonal matching pursuit (OMP) ; Dictionary
刊名：Circuits, Systems, and Signal Processing
出版年：2015
出版时间：December 2015
年：2015
卷：34
期：12
页码：3861-3882
全文大小：1,125 KB
参考文献：1.L. Benaroya, F. Bimbot, R. Gribonval, Audio source separation with a single sensor. IEEE Trans. Audio Speech 14(1), 191鈥?99 (2006)CrossRef
2.L. Benaroya, L.M. Donagh, F. Bimbot, R. Gribonval, Non negative sparse representation for wiener based source separation with a single sensor. ICASSP IEEE Int. Conf. Acoust. Speech Signal Process. Proc. 6, 613鈥?16 (2003). doi:10.鈥?109/鈥婭CASSP.鈥?003.鈥?201756
3.S.S. Chen, D.L. Donoho, M.A. Saunders, Atomic decomposition by basis pursuit. SIAM J. Sci. Comput. 20(1), 33鈥?1 (1998)MathSciNet CrossRef
4.S.S. Chen, D.L. Donoho, Atomic decomposition by basis pursuit. SIAM Rev. 43(1), 129鈥?59 (2001)MATH MathSciNet CrossRef
5.M.P. Cooke, J. Barker, S.P. Cunningham, X. Shao, An audiovisual corpus for speech perception and automatic speech recognition. J. Acoust. Soc. Am. 120(5), 2421鈥?424 (2006)CrossRef
6.G.B. Dantzig, Linear Programming and Extensions (Princeton University Press, Princeton, 1963)MATH
7.D.L. Donoho, Y. Tsaig, I. Drori, J.L. Starck, Sparse solution of underdetermined systems of linear equations by stagewise orthogonal matching pursuit. IEEE Trans. Inform. Theory 58(2), 1094鈥?121 (2012)MathSciNet CrossRef
8.D.P.W. Ellis, R.J. Weiss, Model-based monaural source separation using a vector-quantized phase-vocoder representation. ICASSP IEEE Int. Conf. Acoust. Speech Signal Process. Proc. 5, 957鈥?60 (2006)
9.P.E. Gill, W. Murray, M.H. Wright, Numerical Linear Algebra and Optimization (Addison-Wesley, Redwood City, 1991)MATH
10.J.R. Hershey, S.J. Rennie, P.A. Olsen, T.T. Kristjansson, Superhuman multi-talker speech recognition: a graphical modeling approach. Comput. Speech Lang. 24(1), 45鈥?6 (2010)
11.G. Hu, D.L. Wang, Monaural speech segregation based on pitch tracking and amplitude modulation. IEEE Trans. Neural Netw. 15(5), 1135鈥?150 (2004)CrossRef
12.G. Hu, D.L. Wang, Auditory segmentation based on onset and offset analysis. IEEE Trans. Audio Speech 15(2), 396鈥?05 (2007)CrossRef
13.G.J. Jang, T.W. Lee, A maximum likelihood approach to single-channel source separation. J. Mach. Learn. Res. 4(7鈥?), 1365鈥?392 (2003)MathSciNet
14.G.J. Jang, T.W. Lee, A probabilistic approach to single channel source separation. in 16th Annual Neural Information Processing Systems Conference (2003)
15.G.J. Jang, T.W. Lee, Y.H. Oh, Single-channel signal separation using time-domain basis functions. IEEE Signal Process. Lett. 10, 168鈥?71 (2003)CrossRef
16.G.J. Jang, T.W. Lee, Y.H. Oh, A subspace approach to single channel signal separation using maximum likelihood weighting filters. IEEE Int. Conf. Acoust. Speech Signal Process. 5, 45鈥?8 (2003). doi:10.鈥?109/鈥婭CASSP.鈥?003.鈥?199864
17.H. Katmeoka, T. Nishimoto, S. Sagayama, Separation of harmonic structures based on tied Gaussian mixture model and information criterion for concurrent sounds. IEEE Int. Conf. Acoust. Speech Signal Process. 4, 297鈥?00 (2004)
18.M. Moussallam, G. Richard, L. Daudet, Audio source separation informed by redundancy with greedy multiscale decompositions. in European Signal Processing Conference (2012), pp. 2644鈥?648
19.P. Mowlaee, M.G. Christensen, S.H. Jensen, Improved single-channel speech separation using sinusoidal modeling. ICASSP IEEE Int. Conf. Acoust. Speech Signal Process. Proc. 21鈥?4 (2010). doi:10.鈥?109/鈥婭CASSP.鈥?010.鈥?496263
20.P. Mowlaee, M.G. Christensen, S.H. Jensen, New results on single-channel speech separation using sinusoidal modeling. IEEE Trans. Audio Speech 19(5), 1265鈥?277 (2011)CrossRef
21.P. Mowlaee, A. Sayadiyan, H. Sheikhzadeh, Evaluating single-channel speech separation performance in transform-domain. Sci. C J. Zhejiang Univ. 11(3), 160鈥?74 (2010)MATH CrossRef
22.Y.C. Pati, R. Rezaiifar, P.S. Krishnaprasad, Orthogonal matching pursuit: recursive function approximation with applications to wavelet decomposition. in Conference Record of Asilomar Conference Signals Systems Computers (1993), pp 40鈥?4
23.B.A. Pearlmutter, R.K. Olsson, Linear program differentiation for single-channel speech separation. in Proceedings of IEEE Signal Processing Society Workshop. Machine Learning Signal Processing MLSP 2006. pp. 421鈥?26 (2006). doi:10.鈥?109/鈥婱LSP.鈥?006.鈥?75587
24.T.F. Quatieri, R.G. Danisewicz, An approach to co-channel talker interference suppression using a sinusoidal model for speech. IEEE Trans. Audio Speech 38(1), 56鈥?9 (1990)CrossRef
25.M.H. Radfar, R.M. Dansereau, Single-channel speech separation using soft mask filtering. IEEE Trans. Audio Speech 15(8), 2299鈥?310 (2007)CrossRef
26.M.H. Radfar, R.M. Dansereau, A. Sayadiyan, A maximum likelihood estimation of vocal-tract-related filter characteristics for single channel speech separation. EURASIP J. Audio Speech Music Process 2007, 084186 (2007). doi:10.鈥?155/鈥?007/鈥?4186
27.M.H. Radfar, R.M. Dansereau, A. Sayadiyan, Monaural speech segregation based on fusion of source-driven with model-driven techniques. Speech Commun. 49(6), 464鈥?76 (2007)CrossRef
28.B. Raj, P. Smaragdis, Latent variable decomposition of spectrograms for single channel speaker separation. in IEEE ASSP Workshop Applications Signal Processing to Audio Acoustics, pp. 17鈥?0, doi:10.鈥?109/鈥婣SPAA.鈥?005.鈥?540157
29.A.M. Reddy, B. Raj, A minimum mean squared error estimator for single channel speaker separation. in INTERSPEECH- 2004, pp. 2445鈥?448 (2004)
30.A.M. Reddy, B. Raj, Soft mask methods for single-channel speaker separation. IEEE Trans. Audio Speech 15(6), 1766鈥?776 (2007)CrossRef
31.M.J. Reyes-Gomez, D.P.W. Ellis, N. Jojic, Multiband audio modeling for single-channel acoustic source separation. ICASSP IEEE Int. Conf. Acoust. Speech Signal Process. Proc. 5, 641鈥?44 (2004). doi:10.鈥?109/鈥婭CASSP.鈥?004.鈥?327192
32.S.T. Roweis, One microphone source separation. Adv. Neural. In. 13, 793鈥?99 (2000)
33.S.T. Roweis, Factorial models and refiltering for speech separation and denoising. in EUROSPEECH (2003), pp. 1009鈥?012
34.M.N. Schmidt, R.K. Olsson, Linear regression on sparse features for single-channel speech separation. in IEEE ASSP Workshop Applications of Signal Processing to Audio Acoustics, pp. 26鈥?9 (2007). doi:10.鈥?109/鈥婣SPAA.鈥?007.鈥?393010
35.M.N. Schmidt, R.K. Olsson, Single-channel speech separation using sparse non-negative matrix factorization. in INTERSPEECH 2006
36.Y. Shao, S. Srinivasan, Z. Jin, D. Wang, A computational auditory scene analysis system for speech segregation and robust speech recognition. Comput. Speech Lang. 24(1), 77鈥?3 (2010)CrossRef
37.M.V.S. Shashanka, B. Raj, P. Smaragdis, Sparse overcomplete decomposition for single channel speaker separation. IEEE Trans. Audio Speech 2, 641鈥?44 (2007)
38.M. Stark, M. Wohlmayr, F. Pernkopf, Source-filter-based single-channel speech separation using pitch information. IEEE Trans. Audio Speech 19(2), 242鈥?55 (2011)CrossRef
39.J.A. Tropp, A.C. Gilbert, Signal recovery from random measurements via orthogonal matching pursuit. IEEE Trans. Inform. Theory 53(12), 4655鈥?666 (2007)MATH MathSciNet CrossRef
40.T. Virtanen, Monaural sound source separation by nonnegative matrix factorization with temporal continuity and sparseness criteria. IEEE Trans. Audio Speech 15(3), 1066鈥?074 (2007)CrossRef
41.T. Virtanen, Speech recognition using factorial hidden Markov models for separation in the feature space. in INTERSPEECH 2006, pp. 89鈥?2 (2006)
42.D.L. Wang, On ideal binary mask as the computational goal of auditory scene analysis, in Speech Separation by Humans and Machines, ed. by D.L. Wang (Kluwer Academic, Norwell, 2005), pp. 181鈥?97CrossRef
43.D.L. Wang, G.J. Brown, Computational Auditory Scene Analysis: Principles, Algorithms, and Applications (Wiley, NY, 2006)CrossRef
44.D.L. Wang, G.J. Brown, Separation of speech from interfering sounds based on oscillatory correlation. IEEE Trans. Neural Netw. 10, 684鈥?97 (1999)CrossRef
45.R.J. Weiss, D.P.W. Ellis, Speech separation using speaker-adapted eigenvoice speech models. Comput. Speech Lang. 24(1), 16鈥?9 (2010)CrossRef
作者单位：Haiyan Guo (1) (2)
Xiaoxiong Li (1)
Lin Zhou (1)
Zhenyang Wu (1)

1. School of Information Science and Engineering, Southeast University, Nanjing, 210096, China
2. College of Internet of Things, Nanjing University of Posts and Telecommunications, Nanjing, 210003, China
刊物类别：Engineering
刊物主题：Electronic and Computer Engineering
出版者：Birkh盲user Boston
ISSN：1531-5878

文摘

In this paper, we propose a two-stage sparse decomposition-based method for single-channel speech separation in time domain. First, we propose a Dictionary-updated orthogonal matching pursuit (DUOMP) algorithm which is used in both separation stages. In the proposed DUOMP algorithm, all atoms of each source-specific dictionary are updated by subtracting off the current approximation of each source to the original atoms. It is proved that the DUOMP algorithm can limit the separated sources within a region where they are uncorrelated in statistical sense more quickly. Then, we propose an adaptive dictionary generation method followed by a frame labeling method to perform a second-stage separation on the mixed frames having certain temporal structure. Experiments show that the proposed method outperforms a separation method using sparse non-negative matrix factorization (SNMF), a separation method using OMP and a source-filter-based method using pitch information in overall. Additionally, what affects the performance of the proposed method is also shown. Keywords Single-channel speech separation (SCSS) Sparse decomposition Orthogonal matching pursuit (OMP) Dictionary