基于动态时间规整的语音关键词检索算法

英文篇名：A Keyword Search Algorithm for Speech Based on Dynamic Time Warping
作者：张舸 ; 张鹏远 ; 刘建 ; 颜永红
英文作者：ZHANG Ge;ZHANG Pengyuan;LIU Jian;YAN Yonghong;The Key Laboratory of Speech Acoustics and Content Understanding,Institute of Acoustics,Chinese Academy of Sciences;University of Chinese Academy of Sciences;Xinjiang Laboratory of Minority Speech and Language Information Processing,Xinjiang Technical Institute of Physics & Chemistry,Chinese Academy of Sciences;
关键词：关键词检索 ; 动态时间规整 ; 模板匹配 ; 模板预处理
英文关键词：Keyword search;;Dynamic time warping;;Query matching;;Query preprocessing
中文刊名：WJSY
英文刊名：Journal of Network New Media
机构：中国科学院声学研究所语言声学与内容理解重点实验室;中国科学院大学;中国科学院新疆理化技术研究所新疆民族语音言信息处理实验室;
出版日期：2019-01-15
出版单位：网络新媒体技术
年：2019
期：v.8;No.43
基金：国家自然科学基金(U1536117,11590770-4);; 国家重点研发计划重点专项(2016YFB0801203,2016YFB0801200);; 新疆维吾尔自治区科技重大专项(2016A03007-1)
语种：中文;
页：WJSY201901003
页数：6
CN：01
ISSN：10-1055/TP
分类号：22-27

摘要

提出一种通过声学模板匹配进行基于音频模板的语音关键词检索算法。该算法通过动态时间规整将音频模板与待检索语音进行匹配,获得音频模板所对应的关键词的出现位置。为了提升匹配质量,本文对音频模板进行筛选和预处理,获得较原始模板更具代表性的多模板作为匹配单元。所提出的音频模板筛选和预处理方法,与直接采用原始模板匹配相比,得到了相对55. 0%的提升。
The paper presents an audio query based keyword search algorithm by query matching. It matches audio queries with speech utterances by dynamic time warping algorithm to obtain the position of the keyword corresponding to audio queries. To improve the quality of matches,the paper implements query selection and query preprocessing to obtain a set of queries with better representation on keywords than original queries and use the set as matching units. By applying query selection and query preprocessing,the keyword search system achieves relative improvement of 55. 0%.

引文

[1]邵健.面向大规模电话交谈语音的汉语语音检索[D].北京:中国科学院声学研究所,2008.
    [2]Ney H. The Use of a One-stage Dynamic Programming Algorithm for Connected Word Recognition[J]. IEEE Transactions on Acoustics,Speech,and Signal Processing,1984,32(2):263.
    [3]Muller M. Information Retrieval for Music and Motion[M]. Heidelberg:Springer,2007.
    [4]Deza,M. M. and Deza,E. Encyclopedia of distances[M]. Heidelberg:Springer,2009.
    [5]Rodriguez-Fuentes L. J.,Varona A.,Penagarikano M.,et al. High-Performance Query-By-Example Spoken Term Detection on the SWS 2013 Evaluation[C]//Florence,Italy:IEEE International Conference on Acoustics,Speech,and Signal Processing(ICASSP),2014:7819.
    [6]Chen G.,Parada C. and Sainath T. N. Query By Example Keyword Spotting Using Long Short-Term Memory Network[C]//Brisbane,Australia:IEEE International Conference on Acoustics,Speech,and Signal Processing(ICASSP),2015:5236.
    [7]Hazen,T. J.,Shen,W. and White,C. Query-By-Example Spoken Term Detection Using Phonetic Posteriorgram Templates[C]//Merano,Italy:IEEE Workshop on Automatic Speech Recognition&Understanding(ASRU),2009:421.
    [8]Wang,H.,Lee,T. and Leung,C. C. Unsupervised Spoken Term Detection with Acoustic Segment Model[C]//Hsinchu City,Taiwan:International Conference on Speech Database and Assessments(Oriental COCOSDA),2011:106.
    [9]Xu,J.,Zhang,G. and Yan,Y. Effective Utilization of Multiple Examples in Query-By-Example Spoken Term Detection[C]//Shanghai,China:IEEE International Conference on Acoustics,Speech,and Signal Processing(ICASSP),2016:5440.
    [10]Wang,Y. and Metze,F. An In-Depth Comparison of Keyword Specific Thresholding and Sum-To-One Score Normalization[C]//Singapore:15th Annual Conference of the International Speech Communication Association(Interspeech),2014:2474.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700