Automatic Speech Feature Learning for Continuous Prediction of Customer Satisfaction in Contact Center Phone Calls
详细信息    查看全文
  • 关键词:Feature learning ; End ; to ; end learning ; Convolutional neural networks ; Conflict speech retrieval ; Automatic tagging
  • 刊名:Lecture Notes in Computer Science
  • 出版年:2016
  • 出版时间:2016
  • 年:2016
  • 卷:10077
  • 期:1
  • 页码:255-265
  • 全文大小:383 KB
  • 参考文献:1.Abdel-Hamid, O., Mohamed, A.R., Jiang, H., Penn, G.: Applying convolutional neural networks concepts to hybrid NN-HMM model for speech recognition. In: 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4277–4280, March 2012
    2.Bergstra, J., et al.: Theano: a CPU and GPU math expression compiler. In: Proceedings of the Python for Scientific Computing Conference (SciPy), Austin, TX, vol. 4, p. 3 (2010)
    3.Budnik, M., Gutierrez-Gomez, E.L., Safadi, B., Quénot, G.: Learned features versus engineered features for semantic video indexing. In: 2015 13th International Workshop on Content-Based Multimedia Indexing (CBMI), pp. 1–6, June 2015
    4.Deng, L., Li, J., et al.: Recent advances in deep learning for speech research at Microsoft. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 8604–8608. IEEE (2013)
    5.Devillers, L., Vaudable, C., Chastagnol, C.: Real-life emotion-related states detection in call centers: a cross-corpora study. In: Eleventh Annual Conference of the International Speech Communication Association, vol. 10, pp. 2350–2353 (2010)
    6.Dieleman, S., Schrauwen, B.: End-to-end learning for music audio. In: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6964–6968, May 2014
    7.Eyben, F., Wollmer, M., Schuller, B.: OpenEAR - introducing the Munich open-source emotion and affect recognition toolkit. In: 3rd International Conference on Affective Computing and Intelligent Interaction and Workshops, ACII 2009, pp. 1–6 (2009)
    8.Goodfellow, I.J., Warde-Farley, D., Mirza, M., Courville, A.C., Bengio, Y.: Maxout networks. Int. Conf. Mach. Learn. (ICML) 28, 1319–1327 (2013)
    9.Hinton, G., et al.: Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Sig. Process. Mag. 29(6), 82–97 (2012)MathSciNet CrossRef
    10.Hoshen, Y., Weiss, R.J., Wilson, K.W.: Speech acoustic modeling from raw multichannel waveforms. In: 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4624–4628. IEEE (2015)
    11.Huang, D.Y., Li, H., Dong, M.: Ensemble Nyström method for predicting conflict level from speech. In: Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2014 Asia-Pacific, pp. 1–5, December 2014
    12.Jaitly, N., Hinton, G.: Learning a better representation of speech soundwaves using restricted Boltzmann machines. In: 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5884–5887. IEEE (2011)
    13.Kim, S., Filippone, M., Valente, F., Vinciarelli, A.: Predicting the conflict level in television political debates: an approach based on crowdsourcing, nonverbal communication and Gaussian processes. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 793–796. ACM (2012)
    14.Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)
    15.Le, Q.V.: Building high-level features using large scale unsupervised learning. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 8595–8598, May 2013
    16.LeCun, Y., Bengio, Y.: Convolutional networks for images, speech, and time series. Handb. Brain Theor. Neural Netw. 3361(10) (1995)
    17.Llimona, Q., Luque, J., Anguera, X., Hidalgo, Z., Park, S., Oliver, N.: Effect of gender and call duration on customer satisfaction in call center big data. In: Proceedings of 16th Annual Conference of the International Speech Communication Association, INTERSPEECH 2015, Dresden, Germany, 6–10 September (2015)
    18.Palaz, D., Magimai-Doss, M., Collobert, R.: Convolutional neural networks-based continuous speech recognition using raw speech signal. In: 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4295–4299, April 2015
    19.Park, Y., Gates, S.C.: Towards real-time measurement of customer satisfaction using automatically generated call transcripts. In: Proceedings of the 18th ACM Conference on Information and Knowledge Management, pp. 1387–1396. ACM (2009)
    20.Räsänen, O., Pohjalainen, J.: Random subset feature selection in automatic recognition of developmental disorders, affective states, and level of conflict from speech. In: INTERSPEECH, pp. 210–214 (2013)
    21.Schuller, B., et al.: The INTERSPEECH 2013 Computational Paralinguistics Challenge: Social Signals, Conflict, Emotion, Autism
    22.Vaudable, C., Devillers, L.: Negative emotions detection as an indicator of dialogs quality in call centers. In: 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5109–5112. IEEE (2012)
    23.Vinciarelli, A., Kim, S., Valente, F., Salamin, H.: Collecting data for socially intelligent surveillance and monitoring approaches: the case of conflict in competitive conversations. In: 2012 5th International Symposium on Communications Control and Signal Processing (ISCCSP), pp. 1–4, May 2012
    24.Zweig, G., Siohan, O., Saon, G., Ramabhadran, B., Povey, D., Mangu, L., Kingsbury, B.: Automated quality monitoring for call centers using speech and NLP technologies. In: Proceedings of the 2006 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology: Companion Volume: Demonstrations, pp. 292–295. Association for Computational Linguistics (2006)
  • 作者单位:Carlos Segura (21)
    Daniel Balcells (21) (22)
    Martí Umbert (21) (23)
    Javier Arias (21)
    Jordi Luque (21)

    21. Telefonica Research Edificio Telefonica-Diagonal 00, Barcelona, Spain
    22. Department Signal Theory and Communications, Universitat Politècnica de Catalunya, Barcelona, Spain
    23. Music Technology Group, Universitat Pompeu Fabra, Barcelona, Spain
  • 丛书名:Advances in Speech and Language Technologies for Iberian Languages
  • ISBN:978-3-319-49169-1
  • 刊物类别:Computer Science
  • 刊物主题:Artificial Intelligence and Robotics
    Computer Communication Networks
    Software Engineering
    Data Encryption
    Database Management
    Computation by Abstract Devices
    Algorithm Analysis and Problem Complexity
  • 出版者:Springer Berlin / Heidelberg
  • ISSN:1611-3349
  • 卷排序:10077
文摘
Speech related processing tasks have been commonly tackled using engineered features, also known as hand-crafted descriptors. These features have usually been optimized along years by the research community that constantly seeks for the most meaningful, robust, and compact audio representations for the specific domain or task. In the last years, a great interest has arisen to develop architectures that are able to learn by themselves such features, thus by-passing the required engineering effort. In this work we explore the possibility to use Convolutional Neural Networks (CNN) directly on raw audio signals to automatically learn meaningful features. Additionally, we study how well do the learned features generalize for a different task. First, a CNN-based continuous conflict detector is trained on audios extracted from televised political debates in French. Then, while keeping previous learned features, we adapt the last layers of the network for targeting another concept by using completely unrelated data. Concretely, we predict self-reported customer satisfaction from call center conversations in Spanish. Reported results show that our proposed approach, using raw audio, obtains similar results than those of a CNN using classical Mel-scale filter banks. In addition, the learning transfer from the conflict detection task into satisfaction prediction shows a successful generalization of the learned features by the deep architecture.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700