多路并行实时说话人识别算法研究与实现
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
随着通信技术的飞速发展,电话通信日益成为人们联系和信息交流的平台,基于电话语音的多路并行说话人识别也成为人们广泛研究的课题。当前许多的语音识别系统是基于计算机软件或者基于DSP平台,系统实现灵活,但相对于电话语音的多路并行应用来说实时性较差。
     FPGA(Field-Programmable Gate Array,现场可编程门阵列)芯片具有时钟频率高,内部时延小,全部控制逻辑由硬件完成的优点,其速度快、效率高,适于大数据量的高速传输控制。采用DSP+FPGA的说话人识别系统,可以充分利用两种芯片的各自特点,预处理和特征参数算法处理的数据量大,对处理速度要求高,但运算结构相对比较简单,适合于用FPGA进行硬件实现,模板匹配算法的特点是处理的数据量相对较少,但算法的结构复杂,适于用运算速度高、寻址方式灵活、通信机制强的DSP芯片来实现。DSP+FPGA结构可以兼顾速度及灵活性。
     本文的主要工作包括:
     (1)针对多路并行实时说话人识别系统对数据吞吐量、计算速度和资源占用有较高的要求,提出了基于FPGA+DSP平台的系统实现方案;
     (2)研究了当前说话人识别中常用的特征参数和识别方法,并根据并行说话人识别系统的特点,在占用资源和计算复杂度上做权衡,设计了基于MFCC+VQ的多路并行说话人识别系统;
     (3)研究说话人识别模板匹配经典VQ算法,针对多路并行识别系统在识别精度和处理速度的要求,对VQ算法做两方面的改进,在识别精度上提出了码字可分性加权VQ算法,在模板匹配速度上提出了码字可分性加权VQ算法的均值不等式快速搜索算法(ENNS),经仿真测试,识别精度和模板匹配速度性能均得到了一定的提高;
     (4)完成DSP平台的相关设计,包括数据接口的设计与测试(DSP与主机通信HPI接口,DSP与FPGA通信EMIFA接口)、DSP系统寄存器配置;设计多路并行事实说话人识别的流程,基于TI DSP6455平台设计并实现了改进VQ算法,并进行了优化,实验测试基于平台的定点结果和VC浮点结果之间相对误差,并对模板匹配时间做测试;
     (5)对说话人识别系统进行联调及性能测试。测试结果表明,系统能够实时处理32路并行电话语音并且识别精度比较高,达到了设计的要求。
As communication technology highly evolving, telephone communication becomes the main platform of association and information exchange between people, Multiplexing parallel Speaker Recognition which is based on telephone communication turn into extensive research assignment. At present many Speaker Recognition Systems is based on the computer software or DSP chip, which has flexible system implementation, but with regards to telephone communication it is worse in real time.
     The FPGA(Field-Programmable Gate Array) chip has advantage of high clock frequency, small inner part postpone,all control logic completed by hardware,it is quicky speed,high efficiency,and suitable for the large data stream of highly transmission control. There is two characteristics of DSP+FPGA structure,first structure flexible, strong general use,and the suitable for modularization design,thus it can raise calculate efficiency and be applicable to actually processing system;secondly, it has short development period and the system is easy to maintain and upgrade.
     This paper’s main work includes:
     (1) Aiming at characteristic of Multiplexing parallel Speaker Recognition,data throughput,resource requirement and calculate speed,this paper puts forward the system based on DSP and FPGA.
     (2) According to the characteristics of Multiplexing parallel Speaker Recognition,researching on the characteristic parameter and matching model,this paper designs recognition algorithm based on Mel Frequency Cepstrum Coefficient(MFCC) and Vector Quantization (VQ).
     (3) According to speaker mode matching method,aiming at the requirement identification rate and processing speed in system,this paper introduces code vector separability improved VQ algorithm,which advances the recognition performance;and in order to improving matching speed,add equal-average nearnest neighbor search(ENNS) to improved VQ algorithm. Simulation shows that: recognition performance and matching speed get major increase.
     (4) Then, this paper completes the related design of DSP , include designing data interface,HPI(DSP and host communication),EMIF(DSP and FPGA communication),and DSP related register set;Design and optimizing improved VQ algorithm based on TI 6455;make the experiment about relative error the DSP fixed-point and VC float-point result,and matching time.
     (5) The experiment shows that: the system can process multiplexing parallel telephone speech,and recognition performance is well.
引文
[1]吴朝晖,杨莹春.说话人识别模型与方法[M].北京:清华大学出版社,2009.(1-6)
    [2] Campell J P.Speaker Recognition:A Turorail[J].Proceedings of the IEEE,1997,85(9)
    [3] http://www.sinobiometrics.com/chinese/voice.htm, [EB/OL]2005.
    [4]黄伟.基于GMM/SVM和多子系统融合的与文本无关的话者识别[D].中国科学技术大学博士论文,合肥:中国科学技术大学博士论文,2004.
    [5] http://www.infotalkcorp.com/chinese/products/verifier.html.[EB/OL],2003
    [6] http://www.pattek.com.cn/Product.asp? id=33. [EB/OL].2005
    [7] http://www.d-ear.com/default.asp? classid=3&id=9. [EB/OL].2004
    [8] www.thinkit.com.cn/doc/TSIE-intro-chn.pdf.[EB/OL].2002
    [9] http://www.ctiforum.com/factory/software/www.finesupport.com/finesupport01_0802.htm, [EB/OL]. 2001
    [10]卞九辉.基于DSP+FPGA的视频图像处理[D].哈尔滨工业大学硕士学位论文,黑龙江:哈尔滨工业大学,2009.
    [11]纪先清.文本无关说话人确认应用研究[D].北京交通大学硕士学位论文.北京:北京交通大学,2008,06.
    [12]王炳锡,屈丹,彭煊.实用语音识别基础[M].北京:国防工业出版社,2005.(146-149)
    [13]胡航.语音信号处理[M].黑龙江:哈尔滨工业大学出版社,1999.09.(20-24)
    [14] Virtex-5 Family overview[R].Xilinx.February,2009.
    [15] TMS320C6455 Fixed-Point Digital Signal Processor.(Literature number: SPRS2761) [R]. USA:Texas Instruments Incorporated,MAY 2005-REVISED APRIL 2009.
    [16]林琳.基于模糊聚类与遗传算法的说话人识别理论研究与应用[D].吉林大学博士学位论文.吉林:吉林大学,2007
    [17]陆哲明.矢量量化编码方法及应用研究[D].哈尔滨工业大学博士学位论文.黑龙江:哈尔滨工业大学,2001.
    [18] He Jialong,Liu Liu Li,Palm G. A new codebook training algorithm for VQ-based speaker recognition[C].ICASSP-97,1997,vol.2,1091~1094.
    [19] S.P.Lioyd.Least-Square Quantization in PCM[J].IEEE trans. Inform,1982.IT-28:129 ~137
    [20] Y.Linde,A.Buzo,R.M.Gray.An Algorithm for Vector Quantizer Design[J].IEEE Trans.Communication,1980.COM-28:84-95
    [21]李弼程,邵美珍,黄洁.模式识别原理与应用[M].陕西:西安电子科技大学出版社,2008.(111-116)
    [22]孙圣和,陆哲明.矢量量化技术与应用[M].北京:科学出版社,2002.(48-50)
    [23]徐利敏,唐振民,何可可.说话人识别中基于聚类特征的矢量量化技术.计算机工程与应用[J].2007
    [24] Kinnunen,T,and Ismo karkkainen.Class-Discriminative Weighted Distortion Measure for VQ-based Speaker Identification[C].Proceedings of the Joint IAPR International Workshop on Structural,Syntactic and Statistical Pattern Recognition., London,UK.2002,Pages:681-688
    [25] Kinnunen,T.,Fr a??nti,p.:Speaker Discriminative Weighting Method for VQ-based Speaker identification[C]. Proc.3rd international Conference on Audio- and Video-based Biometric Person Authentication(AVBPA), halmstad, Sweden, 2001,Page 150-156,
    [26] S. Kwon and S. Narayanan. Speaker change detection using a new weighted distance measure.[C] In Proc. Int. Conf. on Spoken Language Processing (ICSLP 2002), pages 2537–2540, Denver, Colorado, USA,2002.
    [27] S.W.Ra,J.K.Kim,Fast Mean-Distance-Orderd Partial Codebook Search Algorithm for Image Vector Quantization [J]. IEEE Transactions on Circuits and Systems, 1993,40(9):576-579
    [28] L.Guan,M.karnel.Equal-Average Hyperplane Partitioning Method for Vector Quantization of Image Data[C].Pattern Recognition Letters,1992:693-699
    [29]李方慧,王飞,何佩琨. TMS320C6000系列DSPs原理与应用[M].北京:电子工业出版社,2002. (436-443)
    [30]韩非,胡春海,李伟. TMS320C6000系列DSP开发应用技巧[M].中国电力出版社,2008. (157-160)
    [31] TMS320C645x DSP External Memory Interface (EMIF).(Literature Number: SPRU971C)[R], USA:Texas Instruments Incorporated, December 2005–Revised May 2008.
    [32]王熹微,基于DM642的视频编码Cache优化策略.微计算机信息[J],2005.
    [33] TMS320C645x DSP Software-Programmable Phase-Locked Loop (PLL) Controller[R]. USA:Texas Instruments Incorporated,2005
    [34] TMS320C645x DSP Enhanced DMA (EDMA3) Controller User’s Guide[R]. USA:Texas Instruments Incorporated,February 2007
    [35]张雄伟,陈亮,徐光辉.DSP芯片的原理与开发应用(第三版)[M].北京:电子工业出版社,2003. (142-145)
    [36] TMS320C6455 Technical Reference(Literature Number:SPRU965A)[R]. USA:Texas Instruments Incorporated,May2005-Revised Ausgust 2005
    [37] Sen M.Kuo,Woon-seng Gan. Digital Signal Processors:Architectures, Implementations, and Applications[M],北京:清华大学出版社2005.6.(180-192)
    [38]陈亮,杨吉斌,张雄伟.信号处理算法的实时DSP实现[M].北京:电子工业出版社,2008.(62-64)
    [39]龚纯,王正林.MATLAB语言常用算法程序集[M].北京:电子工业出版社.2003.(178-182)
    [40]林静然.基于T1 DSP的通用算法实现[M].北京:电子工业出版社.2008.(45-60)
    [41] TMS320C6000 Optimizing Compiler v6.1 User’s Guide (Literature Number:SPRU187O)[R]. USA:Texas Instruments Incorporated,May 2008.
    [42] Andrew Bateman,lain Paterson-Stephens.The DSP Handbook Algorithms,Applications and Design Techniques[M].北京:机械工业出版社,2003.(107-109)
    [43] Texas Instruments.TMS320C64x+ DSP Little-Endian DSP Library Programmer’s Reference.(Literature Number: SPRUEB8B)[R].USA:Texas Instruments Incorporated,March 2006 Revised March 2008.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700