人工智能技术在语音交互领域的探索与应用

英文篇名：Exploration and Application of Artificial Intelligence on Spoken Language Interaction
作者：陈志刚 ; 刘权
关键词：语音交互 ; 对话系统 ; 语义理解 ; 语音识别
英文关键词：speech interaction;;dialog system;;semantic understanding;;speech recognition
中文刊名：DZBZ
英文刊名：Information Technology & Standardization
机构：科大讯飞股份有限公司;
出版日期：2019-02-10
出版单位：信息技术与标准化
年：2019
期：No.409,No.410
语种：中文;
页：DZBZ2019Z1012
页数：5
CN：Z1
ISSN：11-4753/TN
分类号：18-22

摘要

提出一种面向自然语音交互的对话系统结构。针对传统语音交互在语音识别、语义理解、系统响应等方面所存在的难点问题,设计具备自然人机交互能力和智能产品特性的交互流程框架。该框架可支持面向多样化语音交互需求的智能响应,是人工智能技术在语音交互领域应用的前进方向。
This paper proposes an artificial intelligence dialogue framework for natural speech interaction. Aiming at the pain points of traditional speech interaction in speech recognition, semantic understanding and system response, we design a framework with more natural human-computer interaction ability and smarter product characteristics. The proposed framework can support intelligent dialogue response for diverse speech interaction demands, which represents the direction of artificial intelligence application in the field of speech interaction.

引文

[1]LeCun Y,Bengio Y,Hinton G.Deep learning[J]Nature,2015,521(7553):436.
    [2]Chen M,Mao S,Liu Y.Big data:A survey[J]Mobile networks and applications,2014,19(2)171-209.
    [3]Martin J H,Jurafsky D.Speech and language processing:An introduction to natural language processing,computational linguistics,and speech recognition[M].NJ:Pearson/Prentice Hall,2009.
    [4]Huang X,Acero A,Hon H W,et al.Spoken language processing:A guide to theory algorithm,and system development[M].Upper Saddle River:Prentice Hall PTR,2001.
    [5]Benesty J,Chen J,Huang Y.Microphone array signal processing[M].Berlin Heidelberg:Springer Science&Business Media,2008.
    [6]Blauert J.Spatial hearing:the psychophysics of human sound localization[M].Cambridge,MAMIT press,1997.
    [7]Wang D L,Brown G J.Computational auditory scene analysis:Principles,algorithms,and applications[M].Hoboken,NJ:Wiley-IEEE press2006.
    [8]Attias H,Platt J C,Acero A,et al.Speech denoising and dereverberation using probabilistic models[C]//Advances in neural information processing systems.Vancouver:NIPS,2001:758-764.
    [9]Ram?rez J,Segura J C,Ben?tez C,et al.Efficient voice activity detection algorithms using long-term speech information[J].Speech communication,2004,42(3-4):271-287.
    [10]Hinton G,Deng L,Yu D,et al.Deep neural networks for acoustic modeling in speech recognition:The shared views of four research groups[J].IEEE Signal processing magazine2012,29(6):82-97.
    [11]Yao K,Peng B,Zhang Y,et al.Spoken language understanding using long short-term memory neural networks[C]//Spoken Language Technology Workshop(SLT),2014 IEEE.CA&NVIEEE,2014:189-194.
    [12]Chen Y N,Hakkani-Tür D,Tür G,et al.Endto-End Memory Networks with Knowledge Carryover for Multi-Turn Spoken Language Understanding[C]//INTERSPEECH.SF:ISCA2016:3245-3249.
    [13]van den Oord A,Dieleman S,Zen H,et al WaveNet:A Generative Model for Raw Audio[C]//9th ISCA Speech Synthesis Workshop CA:ISCA,2016:125.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700