Effective word count estimation for long duration daily naturalistic audio recordings
详细信息    查看全文
文摘
The ability to count words in extended audio sequences allows researchers to explore characteristics of speakers (i.e., leading, following, task responsibility, personal engagement), as well as the dynamics of two-way or multi-subject conversation scenarios. As such, counting the number of words spoken by a person, offers a rich information source for several applications such as health monitoring (e.g., Autism, Parkinson’s, Alzheimer’s and etc), second language learning, or language development studies. However, developing robust word count systems that can achieve high performance with low computational cost is very challenging due to the uncertain and dynamic behavior experienced in audio recordings. In this study, we address the problem for large-scale naturalistic audio recordings based on a 100-day audio collection entitled (i.e., Prof-Life-Log). This corpus contains continuously recorded audio from one person using a mobile LENA audio recording device ( LENA, 2015). The device captures audio for an entire workday which can last up to 16 hours. Our proposed framework to address word count consists of five main components, (i) Speech Activity Detection(SAD) to remove non-speech parts of the signal, (ii) Speech Enhancement to suppress the effects of background noise, (iii) Primary vs. Secondary Speaker Detection to remove secondary speaker segments, (iv) Syllable Rate Estimation to estimate the syllable rate for the primary speaker, and (v) Linear Minimum Mean Square Error Estimation (LMMSE) to find the linear mapping between syllable rate and word rate in spontaneous speech. In spite of the simplicity of the framework, it shows to be very effective in real scenarios with good performance on various datasets. As an indication of performance, the error of the framework for an entire 16 h day audio file can be as low as 1% in terms of cumulative Word Count Error.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700