Single-channel Speech Separation Using Dictionary-updated Orthogonal Matching Pursuit and Temporal Structure Information
详细信息    查看全文
  • 作者:Haiyan Guo ; Xiaoxiong Li ; Lin Zhou ; Zhenyang Wu
  • 关键词:Single ; channel speech separation (SCSS) ; Sparse decomposition ; Orthogonal matching pursuit (OMP) ; Dictionary
  • 刊名:Circuits, Systems, and Signal Processing
  • 出版年:2015
  • 出版时间:December 2015
  • 年:2015
  • 卷:34
  • 期:12
  • 页码:3861-3882
  • 全文大小:1,125 KB
  • 参考文献:1.L. Benaroya, F. Bimbot, R. Gribonval, Audio source separation with a single sensor. IEEE Trans. Audio Speech 14(1), 191鈥?99 (2006)CrossRef
    2.L. Benaroya, L.M. Donagh, F. Bimbot, R. Gribonval, Non negative sparse representation for wiener based source separation with a single sensor. ICASSP IEEE Int. Conf. Acoust. Speech Signal Process. Proc. 6, 613鈥?16 (2003). doi:10.鈥?109/鈥婭CASSP.鈥?003.鈥?201756
    3.S.S. Chen, D.L. Donoho, M.A. Saunders, Atomic decomposition by basis pursuit. SIAM J. Sci. Comput. 20(1), 33鈥?1 (1998)MathSciNet CrossRef
    4.S.S. Chen, D.L. Donoho, Atomic decomposition by basis pursuit. SIAM Rev. 43(1), 129鈥?59 (2001)MATH MathSciNet CrossRef
    5.M.P. Cooke, J. Barker, S.P. Cunningham, X. Shao, An audiovisual corpus for speech perception and automatic speech recognition. J. Acoust. Soc. Am. 120(5), 2421鈥?424 (2006)CrossRef
    6.G.B. Dantzig, Linear Programming and Extensions (Princeton University Press, Princeton, 1963)MATH
    7.D.L. Donoho, Y. Tsaig, I. Drori, J.L. Starck, Sparse solution of underdetermined systems of linear equations by stagewise orthogonal matching pursuit. IEEE Trans. Inform. Theory 58(2), 1094鈥?121 (2012)MathSciNet CrossRef
    8.D.P.W. Ellis, R.J. Weiss, Model-based monaural source separation using a vector-quantized phase-vocoder representation. ICASSP IEEE Int. Conf. Acoust. Speech Signal Process. Proc. 5, 957鈥?60 (2006)
    9.P.E. Gill, W. Murray, M.H. Wright, Numerical Linear Algebra and Optimization (Addison-Wesley, Redwood City, 1991)MATH
    10.J.R. Hershey, S.J. Rennie, P.A. Olsen, T.T. Kristjansson, Superhuman multi-talker speech recognition: a graphical modeling approach. Comput. Speech Lang. 24(1), 45鈥?6 (2010)
    11.G. Hu, D.L. Wang, Monaural speech segregation based on pitch tracking and amplitude modulation. IEEE Trans. Neural Netw. 15(5), 1135鈥?150 (2004)CrossRef
    12.G. Hu, D.L. Wang, Auditory segmentation based on onset and offset analysis. IEEE Trans. Audio Speech 15(2), 396鈥?05 (2007)CrossRef
    13.G.J. Jang, T.W. Lee, A maximum likelihood approach to single-channel source separation. J. Mach. Learn. Res. 4(7鈥?), 1365鈥?392 (2003)MathSciNet
    14.G.J. Jang, T.W. Lee, A probabilistic approach to single channel source separation. in 16th Annual Neural Information Processing Systems Conference (2003)
    15.G.J. Jang, T.W. Lee, Y.H. Oh, Single-channel signal separation using time-domain basis functions. IEEE Signal Process. Lett. 10, 168鈥?71 (2003)CrossRef
    16.G.J. Jang, T.W. Lee, Y.H. Oh, A subspace approach to single channel signal separation using maximum likelihood weighting filters. IEEE Int. Conf. Acoust. Speech Signal Process. 5, 45鈥?8 (2003). doi:10.鈥?109/鈥婭CASSP.鈥?003.鈥?199864
    17.H. Katmeoka, T. Nishimoto, S. Sagayama, Separation of harmonic structures based on tied Gaussian mixture model and information criterion for concurrent sounds. IEEE Int. Conf. Acoust. Speech Signal Process. 4, 297鈥?00 (2004)
    18.M. Moussallam, G. Richard, L. Daudet, Audio source separation informed by redundancy with greedy multiscale decompositions. in European Signal Processing Conference (2012), pp. 2644鈥?648
    19.P. Mowlaee, M.G. Christensen, S.H. Jensen, Improved single-channel speech separation using sinusoidal modeling. ICASSP IEEE Int. Conf. Acoust. Speech Signal Process. Proc. 21鈥?4 (2010). doi:10.鈥?109/鈥婭CASSP.鈥?010.鈥?496263
    20.P. Mowlaee, M.G. Christensen, S.H. Jensen, New results on single-channel speech separation using sinusoidal modeling. IEEE Trans. Audio Speech 19(5), 1265鈥?277 (2011)CrossRef
    21.P. Mowlaee, A. Sayadiyan, H. Sheikhzadeh, Evaluating single-channel speech separation performance in transform-domain. Sci. C J. Zhejiang Univ. 11(3), 160鈥?74 (2010)MATH CrossRef
    22.Y.C. Pati, R. Rezaiifar, P.S. Krishnaprasad, Orthogonal matching pursuit: recursive function approximation with applications to wavelet decomposition. in Conference Record of Asilomar Conference Signals Systems Computers (1993), pp 40鈥?4
    23.B.A. Pearlmutter, R.K. Olsson, Linear program differentiation for single-channel speech separation. in Proceedings of IEEE Signal Processing Society Workshop. Machine Learning Signal Processing MLSP 2006. pp. 421鈥?26 (2006). doi:10.鈥?109/鈥婱LSP.鈥?006.鈥?75587
    24.T.F. Quatieri, R.G. Danisewicz, An approach to co-channel talker interference suppression using a sinusoidal model for speech. IEEE Trans. Audio Speech 38(1), 56鈥?9 (1990)CrossRef
    25.M.H. Radfar, R.M. Dansereau, Single-channel speech separation using soft mask filtering. IEEE Trans. Audio Speech 15(8), 2299鈥?310 (2007)CrossRef
    26.M.H. Radfar, R.M. Dansereau, A. Sayadiyan, A maximum likelihood estimation of vocal-tract-related filter characteristics for single channel speech separation. EURASIP J. Audio Speech Music Process 2007, 084186 (2007). doi:10.鈥?155/鈥?007/鈥?4186
    27.M.H. Radfar, R.M. Dansereau, A. Sayadiyan, Monaural speech segregation based on fusion of source-driven with model-driven techniques. Speech Commun. 49(6), 464鈥?76 (2007)CrossRef
    28.B. Raj, P. Smaragdis, Latent variable decomposition of spectrograms for single channel speaker separation. in IEEE ASSP Workshop Applications Signal Processing to Audio Acoustics, pp. 17鈥?0, doi:10.鈥?109/鈥婣SPAA.鈥?005.鈥?540157
    29.A.M. Reddy, B. Raj, A minimum mean squared error estimator for single channel speaker separation. in INTERSPEECH- 2004, pp. 2445鈥?448 (2004)
    30.A.M. Reddy, B. Raj, Soft mask methods for single-channel speaker separation. IEEE Trans. Audio Speech 15(6), 1766鈥?776 (2007)CrossRef
    31.M.J. Reyes-Gomez, D.P.W. Ellis, N. Jojic, Multiband audio modeling for single-channel acoustic source separation. ICASSP IEEE Int. Conf. Acoust. Speech Signal Process. Proc. 5, 641鈥?44 (2004). doi:10.鈥?109/鈥婭CASSP.鈥?004.鈥?327192
    32.S.T. Roweis, One microphone source separation. Adv. Neural. In. 13, 793鈥?99 (2000)
    33.S.T. Roweis, Factorial models and refiltering for speech separation and denoising. in EUROSPEECH (2003), pp. 1009鈥?012
    34.M.N. Schmidt, R.K. Olsson, Linear regression on sparse features for single-channel speech separation. in IEEE ASSP Workshop Applications of Signal Processing to Audio Acoustics, pp. 26鈥?9 (2007). doi:10.鈥?109/鈥婣SPAA.鈥?007.鈥?393010
    35.M.N. Schmidt, R.K. Olsson, Single-channel speech separation using sparse non-negative matrix factorization. in INTERSPEECH 2006
    36.Y. Shao, S. Srinivasan, Z. Jin, D. Wang, A computational auditory scene analysis system for speech segregation and robust speech recognition. Comput. Speech Lang. 24(1), 77鈥?3 (2010)CrossRef
    37.M.V.S. Shashanka, B. Raj, P. Smaragdis, Sparse overcomplete decomposition for single channel speaker separation. IEEE Trans. Audio Speech 2, 641鈥?44 (2007)
    38.M. Stark, M. Wohlmayr, F. Pernkopf, Source-filter-based single-channel speech separation using pitch information. IEEE Trans. Audio Speech 19(2), 242鈥?55 (2011)CrossRef
    39.J.A. Tropp, A.C. Gilbert, Signal recovery from random measurements via orthogonal matching pursuit. IEEE Trans. Inform. Theory 53(12), 4655鈥?666 (2007)MATH MathSciNet CrossRef
    40.T. Virtanen, Monaural sound source separation by nonnegative matrix factorization with temporal continuity and sparseness criteria. IEEE Trans. Audio Speech 15(3), 1066鈥?074 (2007)CrossRef
    41.T. Virtanen, Speech recognition using factorial hidden Markov models for separation in the feature space. in INTERSPEECH 2006, pp. 89鈥?2 (2006)
    42.D.L. Wang, On ideal binary mask as the computational goal of auditory scene analysis, in Speech Separation by Humans and Machines, ed. by D.L. Wang (Kluwer Academic, Norwell, 2005), pp. 181鈥?97CrossRef
    43.D.L. Wang, G.J. Brown, Computational Auditory Scene Analysis: Principles, Algorithms, and Applications (Wiley, NY, 2006)CrossRef
    44.D.L. Wang, G.J. Brown, Separation of speech from interfering sounds based on oscillatory correlation. IEEE Trans. Neural Netw. 10, 684鈥?97 (1999)CrossRef
    45.R.J. Weiss, D.P.W. Ellis, Speech separation using speaker-adapted eigenvoice speech models. Comput. Speech Lang. 24(1), 16鈥?9 (2010)CrossRef
  • 作者单位:Haiyan Guo (1) (2)
    Xiaoxiong Li (1)
    Lin Zhou (1)
    Zhenyang Wu (1)

    1. School of Information Science and Engineering, Southeast University, Nanjing, 210096, China
    2. College of Internet of Things, Nanjing University of Posts and Telecommunications, Nanjing, 210003, China
  • 刊物类别:Engineering
  • 刊物主题:Electronic and Computer Engineering
  • 出版者:Birkh盲user Boston
  • ISSN:1531-5878
文摘
In this paper, we propose a two-stage sparse decomposition-based method for single-channel speech separation in time domain. First, we propose a Dictionary-updated orthogonal matching pursuit (DUOMP) algorithm which is used in both separation stages. In the proposed DUOMP algorithm, all atoms of each source-specific dictionary are updated by subtracting off the current approximation of each source to the original atoms. It is proved that the DUOMP algorithm can limit the separated sources within a region where they are uncorrelated in statistical sense more quickly. Then, we propose an adaptive dictionary generation method followed by a frame labeling method to perform a second-stage separation on the mixed frames having certain temporal structure. Experiments show that the proposed method outperforms a separation method using sparse non-negative matrix factorization (SNMF), a separation method using OMP and a source-filter-based method using pitch information in overall. Additionally, what affects the performance of the proposed method is also shown. Keywords Single-channel speech separation (SCSS) Sparse decomposition Orthogonal matching pursuit (OMP) Dictionary
NGLC 2004-2010.National Geological Library of China All Rights Reserved.
Add:29 Xueyuan Rd,Haidian District,Beijing,PRC. Mail Add: 8324 mailbox 100083
For exchange or info please contact us via email.