法庭语音比对中话者自身变化性建模方法研究

英文篇名：Study on Modeling Method of Inter-Speaker Variability in Forensic Voice Comparison
作者：王华朋 ; 姜囡 ; 刘恩 ; 晁亚东
英文作者：WANG Huapeng;JIANG Nan;LIU En;CHAO Yadong;Department of Audio-Visual Data Inspection Technology, Criminal Investigation Police University of China;
关键词：似然比 ; 证据强度 ; 建模 ; 梅尔频率倒谱系数(MFCC) ; 伽马通频率倒谱系数(GFCC)
英文关键词：likelihood ratio;;evidence strength;;modeling;;Mel Frequency Cepstral Coefficients(MFCC);;Gammatone Frequency Cepstral Coefficients(GFCC)
中文刊名：JSGG
英文刊名：Computer Engineering and Applications
机构：中国刑事警察学院声像资料检验技术系;
出版日期：2018-12-12 15:08
出版单位：计算机工程与应用
年：2019
期：v.55;No.927
基金：2016国家社会科学基金重点项目(No.16AYY015);; 辽宁省重点研发计划项目(No.2017231006);; 公安部公安理论及软科学项目(No.2017231006)
语种：中文;
页：JSGG201908017
页数：7
CN：08
分类号：116-121+220

摘要

针对法庭说话人识别中待鉴定人员语音样本不足的问题,提出了一种新的对说话人自身变化性建模的替代性方法以及相应的方差控制算法。使用同条件下的参考数据库构建识别系统的多个相同说话人得分模型,代替检验需要的多个非同期的带检验人员语音样本比较时的得分模型,以获得能反映说话人自身变化性的统计模型。基于目前最新的法庭证据评估的似然比证据强度评估体系,使用MFCC(Mel Frequency Cepstral Coefficients)和GFCC(Gammatone Frequency Cepstral Coefficients)特征对该方法的有效性进行了验证,并对上述特征进行了特征级和决策级融合。实验结果表明:该方法在纯净语音环境和噪声环境下都具有很高的识别率和稳定性,并且特征级融合能进一步提高识别系统的性能。
Focusing on the lack of voice samples of a person to be examined in forensic speaker recognition, this paper proposes a new alternative method modeling the self-variability of target speaker and corresponding variance control algorithm. The method constructs multiple same-speaker scores of recognition system from a reference database under similar condition to take the place of multiple non-contemporaneous voice samples needed in examinations. The aim is to obtain the statistical model that can reflect the self-variability of the target speaker. MFCC and GFCC are used to test the performance of the proposed method in state-of-art evidence estimation framework based on likelihood ratio, and feature fusion and decision fusion are also been applied in the experiment. Results show that the proposed method has a very high rate of recognition and stability under the condition of clean voice and noisy voice, and feature fusion can further improve recognition performance.

引文

[1]Rose P.Likelihood ratio-based forensic voice comparison with higher level features:research and reality[J].Computer Speech&Language,2017,45:475-502.
    [2]Ishihara S.Sensitivity of likelihood-ratio based forensic voice comparison under mismatched conditions of withinspeaker sample sizes across databases[J].Australian Journal of Forensic Sciences,2018,50(4):307-322.
    [3]Saks M J,Koehler J J.The coming paradigm shift in forensic identification science[J].Science,2005,309(5736):892-895.
    [4]Morrison G S.The impact in forensic voice comparison of lack of calibration and of mismatched conditions between the known-speaker recording and the relevantpopulation sample recordings[J].Forensic Science International,2018,283.
    [5]Report to the president,forensic science in criminal courts:ensuring scientific validity of feature-comparison methods[Z].Executive Office of the President of the United States,President’s Council of Advisors on Science and Technology,2016.
    [6]Morrison G S.Admissibility of forensic voice comparison testimony in England and Wales[J].Criminal Law Rev,2018,1.
    [7]Meuwly D,Ramos D,Haraksim R.A guideline for the validation of likelihood ratio methods used for forensic evidence evaluation[J].Forensic Science International,2017,276:142-153.
    [8]Leegwater A J,Meuwly D,Sjerps M,et al.Performance study of a score-based likelihood ratio system for forensic fingermark comparison[J].Journal of Forensic Sciences,2017,62(3):626-640.
    [9]Wang H,Zhang C.Forensic automatic speaker recognition based on likelihood ratio using acoustic-phonetic features measured automatically[J].Journal of Forensic Science and Medicine,2015,1(2):119-123.
    [10]王华朋,杨洪臣.声纹识别特征MFCC的提取方法研究[J].中国人民公安大学学报(自然科学版),2008(1):28-30.
    [11]Hasan T,Saeidi R,Hansen J H L,et al.Duration mismatch compensation for i-vector based speaker recognition systems[C]//2013 IEEE International Conference on Acoustics,Speech and Signal Processing(ICASSP),2013:7663-7667.
    [12]王华朋.基于听觉模型的法庭语音证据特征量化[J].中国刑警学院学报,2018(1):119-122.
    [13]茅正冲,王正创,黄芳.基于GFCC与RLS的说话人识别抗噪系统研究[J].计算机工程与应用,2015,51(10):215-218.
    [14]Garcia-Romero D,McCree A.Supervised domain adaptation for i-vector based speaker recognition[C]//2014 IEEEInternational Conference on Acoustics,Speech and Signal Processing(ICASSP),2014:4047-4051.
    [15]熊冰峰,曾以成,谢小娟.一种改进的听觉特征参数应用于说话人识别[J].计算机应用,2016,36(S1):82-85.
    [16]Brümmer N.FoCal multi-class:toolkit for evaluation,fusion and calibration of multi-class recognition scores tutorial and user manual[EB/OL].(2007).http://sites.google.com/site/nikobrummer/focalmulticlass.
    [17]Willis S M M L,Mcdermott S.ENFSI guideline for evaluative reporting in forensic science[R].European Network of Forensic Science Institutes,2015.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700