Content-based singer classification on compressed domain audio data
详细信息    查看全文
  • 作者:Tsung-Han Tsai (1)
    Yu-Siang Huang (1)
    Pei-Yun Liu (1)
    De-Ming Chen (1)

    1. Department of Electrical Engineering
    ; National Central University ; No.300 ; Jhongda Rd. ; Jhongli City ; Taoyuan County ; 32001 ; Taiwan ; People鈥檚 Republic of China
  • 关键词:MP3 ; MDCT ; MFCC ; GMM
  • 刊名:Multimedia Tools and Applications
  • 出版年:2015
  • 出版时间:February 2015
  • 年:2015
  • 卷:74
  • 期:4
  • 页码:1489-1509
  • 全文大小:1,590 KB
  • 参考文献:1. Abe脽er J, Lukashevich H, Dittmar C, Schuller G (2009) Genre classification using bass-related high-level features and playing styles. Proceeding of the 10th International Society for Music Information Retrieval, pp. 453鈥?58
    2. Bouman CA (2005) Cluster: An unsupervised algorithm for modeling Gaussian mixtures. Tech.rep.,School of Electrical Engineering, Purdue University, http://engineering.purdue.edu/bouman/software/cluster
    3. C. C. Liu and P. J. Tsai, 鈥淐ontent-based retrieval of MP3 music objects, 鈥?proceeding of the ACM international conference on information and knowledge management 2001, 506鈥?11
    4. Chang LY, Yu XQ, Wan WG, Li CL, Xu XQ (2009) Research and realization of speech segmentation in MP3 compressed domain. J Comput Appl 29(4):1188鈥?192
    5. ChaoZhen,Jieping Xu(2010). Multi-modal Music Genre Classification Approach. Proceeding of the 3rd IEEE International Conference on Computer Science and Information Technology(ICCSIT)
    6. D. Pye, 鈥淐ontent-based methods for the management of digital music,鈥?proceeding of the IEEE international conference on acoustics, speech and signal processing (ICASSP 2000), 24鈥?7
    7. Gu HY, You ZR (2008) A speaker-clustering method using GMM and k-means. Proceeding of the 13th Taiwanese Association for Artificial Intelligence
    8. H.A. Patil, P. G. Radadia and T. K. Basu . Combining evidences from Mel Cepstral features and cepstral mean subtracted features for singer identification, asian language processing (IALP), 2012 International Conference on
    9. Hasan MR, Jamil M, Rahman MGRMS (2004) Speaker identification using Mel frequency cepstral coefficients. Proceeding of the 3rd International Conference on Electrical and Computer Engineering, pp. 566鈥?68
    10. Jang JS. Audio Signal Processing and Recognition Chapter 12: Speech Features, http://neural.cs.nthu.edu.tw/jang/books/audiosignalprocessing/speechFeatureMfcc.asp
    11. Langlois, T (2009). Automatic music genre classification using a hierarchical clustering and a language model approach. Proceeding of the first international conference on advances in multimedia
    12. Langlois T, Marques G (2009) A music classification method based on timbral features. Proceeding of the 10th International society for music information retrieval conference, pp. 81鈥?6
    13. Lidy T, Rauber A (2005) Evaluation of feature extractors and psycho-acoustic transformations for music genre classification. Proceeding of the 6th international society for music information retrieval, pp. 34鈥?1
    14. Liu CC, Huang CS (2002) A singer identification technique for content-based classification of MP3 music objects. Proceeding of the 11th international conference on information and knowledge management, pp. 438鈥?45
    15. Logan B (2000) Mel frequency cepstral coefficients for music modeling. Proceeding of the 1st International Symposium on Music Information Retrieval.
    16. Maddage NC, Xu C, Wang Y (2004) Singer identification based on vocal and instrumental Models. Proceeding of the 17th International Conference on Pattern Recognition, pp. 375鈥?78
    17. Mesaros A, Astola J (2005) The Mel-frequency cepstral coefficients in the context of singer identification. Proceeding of the 6th International Conference on Music Information Retrieval
    18. Mesaros A, Virtanen T, Klapuri A (2007) Singer identification in polyphonic music using vocal separation and pattern recognition methods. Proceeding of the 8th International Conference on Music Information Retrieval, pp. 375鈥?78
    19. Namunu Chinthaka M, Changsheng X, Ye W (2004) Singer Identification Based on Vocal and Instrumental models鈥? Proc 17th Int Conf Pattern Recog (ICPR鈥?4) 2:375鈥?78
    20. Panagakis I, Benetos E, Kotropoulos C (2008) Music genre classification: a multilinear approach. Proceeding of the 9th international society for music information retrieval, pp. 583鈥?88
    21. Panagakis Y, Kotropoulos C, Arce GR (2009) Music genre classification using locality preserving non-negative tensor factorization and sparse representations. Proceeding of the 10th international society for music information retrieval, pp. 249鈥?54
    22. Peng X, Xu W, Wang B (2005) Speaker clustering via novel pseudo-divergence of Gaussian mixture models. Proceeding of the 1st Natural Language Processing and Knowledge Engineering conference, pp. 111鈥?14
    23. R. Sridhar1 and T. V. Geetha. Music Information Retrieval of Carnatic Songs Based on Carnatic Music Singer Identification. Computer and Electrical Engineering, 2008. ICCEE 2008. International Conference on
    24. Shen J, Cui B, Shepherd J, Tan KL (2006) Towards efficient automated singer identification in large music databases. Proceeding of the 29th Special Interest Group on Information Retrieval, pp. 59鈥?6
    25. Sony Ericsson TrackID, http://www.sonyericsson.com/product/trackid/
    26. Sigurdsson S, Petersen KB, Lehn-Schi酶ler T (2006) Mel frequency cepstral coefficients: An evaluation of robustness of MP3 encoded music. Proceeding of the 7th International Conference on Music Information Retrieval
    27. Suraj Jadhav1, Shashank Kava, (2013) Voice Activated Calculator, International Journal of Emerging Technology and Advanced Engineering
    28. Swe Zin Kalayar Khine, Tin Lay Nwe, and Haizhou Li, 鈥淓xploring Perceptual Based Timbre Feature for Singer Identification鈥? CMMR 2007, LNCS 4969, 2008, pp. 159鈥?71
    29. Tin Lay Nwe and Ye Wang, 鈥淎utomatic detection of vocal segments in popular songs鈥? ISMIR 2004. pp. 138鈥?45
    30. Tong Zhang, (2003) 鈥淎utomatic singer identification鈥? ICME, vol.1, pp. I 鈭?3鈥?.
    31. Tsai WH, Liao SJ, Lai C (2008) Automatic identification of simultaneous singers in duet recordings. Proceeding of the 9th international conference on music information retrieval, pp. 115鈥?20
    32. Tsai WH, Wang HM (2006) Automatic singer recognition of popular music recordings via estimation and modeling of solo vocal signals. IEEE Trans 14(1):330鈥?41
    33. Wang Y, Yaroslavsky L, Vilermo M (2000) On the relationship between MDCT, SDFT and DFT. Proc 5th Int Conf on Signal Process 1:44鈥?7 CrossRef
    34. Wei Cai (2011). Automatic singer identification based on auditory features. Proceeding of the seventh international conference on natural computation
    35. W. N. Lie and C. K. Su, 鈥淐ontent-based retrieval of MP3 songs based on query by singing, 鈥?proceeding of the IEEE international conference on acoustics, speech, and signal processing (ICASSP 2004), 929鈥?32
    36. Y. H. Jiao, B. Yang, M. Y. Li and X. M. Niu, 鈥淢DCT-based perceptual hashing for compressed audio content identification, 鈥?proceeding of the IEEE workshop on multimedia signal processing (MMSP 2007), 381鈥?84
    37. Young moo and Brian Whitman, 鈥淪inger Identification in Popular Music Recordings Using Voice Coding Features鈥?ISMIR 2002
    38. Yuhua Jiao, 鈥淢DCT-Based Perceptual Hashing for Compressed Audio Content identification鈥? Multimedia Signal Processing, 2007,pp.381-384
  • 刊物类别:Computer Science
  • 刊物主题:Multimedia Information Systems
    Computer Communication Networks
    Data Structures, Cryptology and Information Theory
    Special Purpose and Application-Based Systems
  • 出版者:Springer Netherlands
  • ISSN:1573-7721
文摘
In this paper, we proposed a singer identification approach to automatically identify the singer of an unknown MP3 audio data. Differing from previous researches for singer identification in MP3 compressed domain, we use Mel-Frequency Cepstral Coefficients (MFCC) as the feature instead of MDCT (modified discrete cosine transform) coefficients. Although MFCC is often used in music classification and speaker recognition, it cannot be directly obtained from compressed music data such as MP3 format. We introduce a modified method for calculating MFCC vector in MP3 compressed domain. For describing the distribution of MFCC vector, the Gaussian mixture model (GMM) is applied. To find the nearest singer, we use maximum likelihood classification (MLC) to allot each input MFCC vector to its nearest group. The experimental result verifies the feasibility of the proposed approach.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700