摘要
提出一种通过声学模板匹配进行基于音频模板的语音关键词检索算法。该算法通过动态时间规整将音频模板与待检索语音进行匹配,获得音频模板所对应的关键词的出现位置。为了提升匹配质量,本文对音频模板进行筛选和预处理,获得较原始模板更具代表性的多模板作为匹配单元。所提出的音频模板筛选和预处理方法,与直接采用原始模板匹配相比,得到了相对55. 0%的提升。
The paper presents an audio query based keyword search algorithm by query matching. It matches audio queries with speech utterances by dynamic time warping algorithm to obtain the position of the keyword corresponding to audio queries. To improve the quality of matches,the paper implements query selection and query preprocessing to obtain a set of queries with better representation on keywords than original queries and use the set as matching units. By applying query selection and query preprocessing,the keyword search system achieves relative improvement of 55. 0%.
引文
[1]邵健.面向大规模电话交谈语音的汉语语音检索[D].北京:中国科学院声学研究所,2008.
[2]Ney H. The Use of a One-stage Dynamic Programming Algorithm for Connected Word Recognition[J]. IEEE Transactions on Acoustics,Speech,and Signal Processing,1984,32(2):263.
[3]Muller M. Information Retrieval for Music and Motion[M]. Heidelberg:Springer,2007.
[4]Deza,M. M. and Deza,E. Encyclopedia of distances[M]. Heidelberg:Springer,2009.
[5]Rodriguez-Fuentes L. J.,Varona A.,Penagarikano M.,et al. High-Performance Query-By-Example Spoken Term Detection on the SWS 2013 Evaluation[C]//Florence,Italy:IEEE International Conference on Acoustics,Speech,and Signal Processing(ICASSP),2014:7819.
[6]Chen G.,Parada C. and Sainath T. N. Query By Example Keyword Spotting Using Long Short-Term Memory Network[C]//Brisbane,Australia:IEEE International Conference on Acoustics,Speech,and Signal Processing(ICASSP),2015:5236.
[7]Hazen,T. J.,Shen,W. and White,C. Query-By-Example Spoken Term Detection Using Phonetic Posteriorgram Templates[C]//Merano,Italy:IEEE Workshop on Automatic Speech Recognition&Understanding(ASRU),2009:421.
[8]Wang,H.,Lee,T. and Leung,C. C. Unsupervised Spoken Term Detection with Acoustic Segment Model[C]//Hsinchu City,Taiwan:International Conference on Speech Database and Assessments(Oriental COCOSDA),2011:106.
[9]Xu,J.,Zhang,G. and Yan,Y. Effective Utilization of Multiple Examples in Query-By-Example Spoken Term Detection[C]//Shanghai,China:IEEE International Conference on Acoustics,Speech,and Signal Processing(ICASSP),2016:5440.
[10]Wang,Y. and Metze,F. An In-Depth Comparison of Keyword Specific Thresholding and Sum-To-One Score Normalization[C]//Singapore:15th Annual Conference of the International Speech Communication Association(Interspeech),2014:2474.