基于联合因子分析的耳语音说话人识别研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
说话人识别,作为生物特征识别的重要组成部分,可广泛应用于公安司法、生物医学工程、军队安全系统等领域。随着计算机和网络技术的迅速发展,说话人识别技术已取得了长足的进步。耳语发音方式是一种特殊的语音交流形式,在很多场合应用。由于耳语音与正常音之间存在较大差异,耳语方式下说话人识别无法照搬正常音说话人识别的方法,尚有很多问题亟待解决。
     本文以与文本无关的耳语说话人识别为研究对象,进行了较为深入的探索。耳语音说话人识别所面临的问题主要包括:耳语数据库的不完善,对于正常语音,美国国家标准技术局给出了统一的数据库资源用于开展说话人识别研究,而耳语音在这方面的资源较为匮乏;耳语音特征表达问题,耳语音由于其发音的特殊性,有些常用的特征参数无法提取,其频谱参数的获取较正常音也更加困难;耳语音是气声发音,声级较低,较易受噪声干扰,且耳语音往往在手机通话时使用,易受信道环境影响;同时,耳语发音时,受发音场所制约,情感表达受限,且发音状态、心理因素都会产生一定的变化,更易受到说话人心理因素、情绪及发音状态的影响。因此,较之正常发音,耳语发音方式下说话人识别面临的主要难点是:特征参数更难提取,易受说话人自身状态影响,以及对信道变化更加敏感等。
     针对这些问题,本文开展了以下几个方面的工作:
     1.提出了反映耳语音说话人特征的参数提取算法。耳语音无基频、声源特征难以体现,作为表征声道特性的共振峰参数,其提取算法的可靠性显得尤为重要。本文提出了基于频谱分段的耳语音共振峰提取算法,该方法可动态地进行频谱分段,通过选择性线性预测获得滤波器参数,采用并联的逆滤波控制得到共振峰。该方法为解决因耳语发音导致的共振峰偏移、合并、平坦等问题提供了有效途径。另一方面,本文依据变量统计里中心与平坦度可衡量信号稳定性的特点,结合人耳听觉模型,提出了Bark子带谱中心与Bark子带谱平坦度的概念,与其他频谱变量组成特征参数集,可有效表征耳语发音方式下说话人特征。
     2.提出了基于特征映射及说话人模型合成的非典型情绪下耳语说话人识别方法。较好地解决训练语音与测试语音说话人情绪状态失配的问题。由于耳语音在情绪表达方面不如正常音有效,无法明晰地进行情感分类,本文通过耳语音说话人状态的A、V因子分类方法,模糊其情感表达的一一对应性,并在测试阶段,作为语音信号的前端处理手段,对每一段语音进行说话人状态分辨,而后实现特征域或模型域的补偿。实验表明,基于特征映射及说话人模型合成的说话人状态补偿方法不仅体现了耳语音的独特性,更能有效提高非典型情绪下耳语音说话人识别的正确率。
     3.提出了基于潜因子分析的非典型情绪下耳语说话人识别方法。为耳语说话人状态补偿提供了有效的途径。因子分析不关注公共因子所代表的具体物理含义,仅是在众多变量中找出具有代表性的因子,且可通过因子数目的增减,调节算法的复杂度。根据潜因子理论,可将耳语音特征超矢量分解为说话人超矢量与说话人状态超矢量,通过均衡的训练语音分别估计说话人与说话人状态空间,并在测试阶段,对每一段语音估计其说话人因子,进而做出判决。潜因子分析方法规避了测试环节中的说话人状态分类,相较于对分类方法有依赖性的补偿算法,可进一步提升耳语说话人识别率。
     4.提出了基于联合因子分析的多信道下非典型情绪耳语音说话人识别方法。实现了耳语音说话人识别中的信道与说话人状态双重补偿。根据联合因子分析的基本概念,可将语音特征超矢量分解为说话人超矢量、说话人状态超矢量以及信道超矢量。针对因耳语音训练数据不充分,无法同时估计出说话人、说话人状态及信道空间的问题,用联合因子分析方法,在获得UBM模型后,计算语音的Baum-Welch统计量,并首先估计说话人空间,而后采用并行模式分别估计说话人状态及信道空间。测试阶段,对于语音的特征矢量减去信道及说话人状态偏移,变换后的特征用于说话人识别。实验结果表明,基于联合因子分析的方法可同时进行信道及说话人状态补偿,相较于其他算法,可获得更好的识别效果。
Speaker Identification (SI), an important part of biometrics identification technology, iswidely used in public safety, judicial system, biomedical engineering, etc. It has made greatprogress with the rapid development of computer science and network technology.Nowadays, the study on whispered speech focuses not only on its fundamental research,but also on its applications. Speaker identification of whispered speech is an interestingwhile challenging task. So many issues are still to be resolved as to its particulararticulation.
     This paper pays special attention to text-independent speaker identification of whisperedspeech. The difficulties are as follows. First, the database of whispered speech is faulty,unlike the voiced one, whose database is provided by NIST for the research of SI. Secondly,with the characteristic of whisper, some parameters are not available, and some are moredifficult to be abstracted. What’s more, as the excitation of whispered speech is exhalation,it is more easily to be affected by noise. Meanwhile, as the whispered speech is oftenencountered in mobile communication, it is often influenced by its channel. Finally, whenwhispering, the speaker might be restricted by the surroundings, which will lead to changeof speaking mode or psychological factors. Hence, whispered speech is more likely to beaffected by the state of speakers. In short, the obstacles of SI for whispered speech are: thedifficulties in obtaining the parameters, the affections from the channels and the states ofspeakers as well.
     The contributions of this dissertation to speaker identification of whispered speech are asfollows:
     1. The algorithms for abstracting the parameters to represent the characteristics ofwhispered speakers are proposed. As there is no fundamental frequency in whisperedspeech, the reliability of the abstraction of formants is essential. Formant estimation ofwhispered speech based on spectral segmentation is proposed. This algorithm candynamically segment the spectrum, and get the parameters of the inverse filters bylinear prediction. It can solve the merged and shifted formants, which might often beencountered in the whispered speech. On the other hand, the SFMB and SCB are defined to represent the speakers’ trait of whispered speech. It is based on the propertythat the central and flatness can figure the stability of signal.
     2. Speaker identification of whispered speech based on feature mapping and speakermodel synthesis are proposed. They are smart ways to solve the mismatch betweentraining set and test set from the speakers’ state. As the whispered speech, compared tothe voice one, is weaker in delivering emotions, the classification of A, V factors forwhispers is proposed in this dissertation. It can also be taken as the pre-process for SIof whispers. The experimental results show the algorithms based on feature mappingand SMS are efficacious to the SI of whispered speech with perceptible mood.
     3. Speaker identification based on latent factor analysis of whispered speech withperceptible mood is proposed. It offers a probability to the speakers’ statecompensation. The factor analysis doesn’t care about the physical meanings of eachfactors, it’s only a mathematical way to find the representative factors from the massvariables. By plus or minus the quantities of the factors, it can adjust the complexitiesof the algorithms. As to the latent factor theory, the supervectors of whispered speechcan be decomposed into speaker and speakers’ state supervectors. It needs balanceddata to train the space of the vectors mention above. In the test stage, the speaker’ssupervector should be estimated from each session. The latent based algorithm canobtain better recognitions by avoiding the classification of speakers’ state.
     4. Speaker identification of whispered speech based on the joint factor analysis isproposed. It’s a compensation algorithm for SI of whispers with perceptible mood anddifferent channels. According to the JFA theory, the supervectors of speech signal canbe decomposed into speaker, speaker’s state and channel supervectors. As the trainingset is not large enough, it is not available to estimate the spaces of the supervectorsmentioned above. Hence, the procedures are: estimate the UBM model, calculate theBaum-Welch statistics, obtain the speaker’s space and get the speaker’s state andchannel space at the same time. When testing, the abstracted parameters should bytransformed by eliminating its channel factor and speaker’s state factor as well. Theexperimental results show the superiority of this algorithm compared to othercompensation methods.
引文
[1] Schwartz M. Air consumption values for oral and whispered plosive-vowel syllables [J].Journal of the Acoustical Society of America.1971,49(2): p610.
    [2] Thomas I. B. Perceived pitch of whispered vowels [J]. Acoustical Soc America-J.1969,46(2):468-470.
    [3] Lass N. J., Hughes K. R., Bowyer M. D. Speaker sex identification from voiced,whispered, and filtered isolated vowels [J]. Journal of the Acoustical Society of America.1976,59(3):675-678.
    [4] John A. Perception of voicing and place features in whispered speech: A dichoticchoice analysis [J]. Perception&Psychophysics.1977,21(4):315-322.
    [5] Leonard R. Facial Movements of males and females while producing commonexpressions and sentences by voice and by whisper [PhD Thesis]. Michigan StateUniversity, USA.1968.
    [6] Smith W. F. A formant study of whispered vowels [PhD Thesis]. University ofOklahoma, USA.1973.
    [7] Watterson T. L. Some acoustic and perceptual effects of oral-nasal coupling area onphonated and whispered vowels [PhD Thesis]. The University of Oklahoma HealthSciences Center, USA.1976.
    [8] Alleva F. Search organization in the whisper continuous speech recognition system [C].IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings.1997:295-302.
    [9] Akemi I. Speech database Design for a Concatenative Text-to-Speech Synthesis Systemfor Individuals with Communication Disorder [J]. International Journal of speechtechnology.2003,6:379-392.
    [10] Tran V. A., Bailly G., Lvenbruck H., Jutten C. Improvement to a NAM capturedwhisper-to-speech system [C].9th Annual Conference of the International SpeechCommunication Association. INTERSPEECH2008:1465-1468.
    [11] Tran V. A., Bailly G., Lvenbruck H., Toda T. Improvement to a NAM-capturedwhisper-to-speech system [J]. Speech Communication.2010,52(4):314-326.
    [12]张翠玲,谭铁军,刘昇.伪装语音的自动话者识别研究[J].刑事技术.2007,(2):18-21.
    [13] Heeren W., Van Heuven V. J. Perception and production of boundary tones inwhispered Dutch [C].10th Annual Conference of the International Speech CommunicationAssociation. INTERSPEECH2009:2411-2414.
    [14] Hamid R. S., McLoughlin I. V., Ahmadi F. Spectral Enhancement of WhisperedSpeech Based on Probability Mass Function [C].2010Sixth Advanced InternationalConference on Telecommunication.2010:207-211.
    [15] Zhang C., Hansen John H. L. An entropy based feature for whisper-island detectionwithin audio streams [C],9th Annual Conference of the International SpeechCommunication Association. INTERSPEECH2008:2510-2513.
    [16] Taisuke I., Kazuya T., Fumitada I., Analysis and recognition of whispered speech [J].Speech Communication.2005,45:139-152.
    [17] Fan X., Hansen John H. L. Speaker identification for whispered speech based onfrequency warping and score competition [C].9th Annual Conference of the InternationalSpeech Communication Association. INTERSPEECH2008:1313-1316.
    [18] Lass N. J., Hughes K. R., Bowyer M. D. Speaker sex identification from voiced,whispered, and filtered isolated vowels [J] Journal of the Acoustical Society of America.1976,59(3):675-678.
    [19] Robert W. M., Mark A. C., Reconstruction of speech from whispers [J]. MedicalEngineering&Physics.2002,24(7):515-520.
    [20] Tartter V. C., Braun D. Hearing smiles and frowns in normal and whisper registers [J].Journal of the Acoustical Society of America1994,96(4):2101.
    [21] Fukumoto M., Tonomura Y., Whisper: A wristwatch style wearable handset [C].Conference on Human Factors in Computing Systems.1999:112-119.
    [22] Choi S., Moon W., Lee J. H. A new microphone system for near whispering [J].Journal of the Acoustical Society of America.2003,114(2):801-812.
    [23] Morris R. Clements M. Estimation of speech spectra from whispers [C]. IEEEInternational Conference on Acoustics, Speech and Signal Processing. ICASSP2002,4(2):4159.
    [24] Morris R. Enhancement and recognition of whispered speech [PhD Thesis]. GeorgiaInstitute of Technology,2003.
    [25] Ahmadi F., McLoughlin I. V., Sharifzadeh H. R. Analysis-by-Synthesis Method forWhisper-Speech Reconstruction [C].2008IEEE Asia-Pacific Conference on Circuits andSystems. China,2008:1280-1283.
    [26] Sharifzadeh H. R., McLoughlin I. V., Russell M. J. A Comprehensive Vowel Space forWhispered Speech [J]. Journal of Voice,2012,26(2):49-56.
    [27] Matsuda M., Mori H., Kasuya H. Formant structure of whispered vowels [J]. Journalof the Acoustical Society of Japan.2000,56(7):477-487.
    [28] Turkmen H. I., Karsgil M. E. Normally Phonated Speech Recovery from Whispers byMELP [C].2008IEEE16th Signal Processing, Communication and ApplicationsConference. Japan,2008:1-4.
    [29] Itoh T., Takeda K., Itakura F. Acoustic analysis and recognition of whispered speech[C]. IEEE International Conference on Acoustics, Speech and Signal Processing–Proceedings. ICASSP2002:389-392.
    [30] Qin J., Szu-Chen S. J., Tanja S. Whispering Speaker Identification [C]. ICME2007:1027-1030.
    [31] Xing F. Hansen John H. L. Speaker identification with whispered speech on modifiedLFCC parameters and feature mapping [J]. IEEE International Conference on Acoustics,Speech, and Signal Processing. CASSP2009:4553-4556.
    [32] Fan X., Hansen John H. L. Speaker identification for whispered speech usingmodified temporal patterns and MFCCs [C].10th Annual Conference of the InternationalSpeech Communication Association. INTERSPEECH2009:896-899.
    [33] Traunmuller H., Eriksson A. Acoustic effects of variation in vocal effort by men,women, and children [J]. Journal of the Acoustical Society of America.2000,107(6):3438-3451.
    [34] Gao M. Tones in Whispered Chinese [M.A. Thesis]. University of Victoria, Canada,2003.
    [35] Jovicic S. T., Formant feature differences between whispered and voiced sustainedvowels [J]. Acustica-Acta Acustica.1998,84(4):739-43.
    [36] Icat F., Ilk H. G., Investigation on differences between whispered and phonatedsustained Turkish vowels [C], IEEE12th Signal Processing and CommunicationsApplications Conference. SIU2004:564-566.
    [37] Viet-Anh T., Gerard B., Helene L., Tomoki T. Improvement to a NAM-capturedwhisper-to-speech system [J]. Speech Communication.2010,52(4):314-326.
    [38] Tatsuya H., Makoto O., Shota S., Tomoki T., Keigo N., Yoshitaka N., Kiyohiro S.Silent-speech enhancement using body-conducted vocal-tract resonance signals [J]. SpeechCommunication.2010,52:301-313.
    [39] Carlin M. A., Smolenski B. Y., Wenndt S. J. Unsupervised detection of whisperedspeech in the presence of normal phonation [C], INTERSPEECH2006and9thInternational Conference on Spoken Language Processing. INTERSPEECH2006(2):685-688.
    [40] Adam D. R., Veeraphol P., Shirley G., Cheryl A. M., Robert T. S. LaryngealHyperfunction During Whispering: Reality or Myth [J]? Journal of Voice2006,20(1):121-127.
    [41] Johan S., Ronald S., Markus H., Frank M. Whispering–A Single-Subject Study ofGlottal Configuration and Aerodynamics [J]. Journal of Voice.2010,24(5):574-584.
    [42] Linda L., Pam B. Private whispers/public eyes: Is receiving highly personalinformation in a public place stressful [J]? Interacting with Computers.2009,21:316-332.
    [43] Sarria-Paja, Milton F., Tiago H. Whispered speech detection in noise usingauditory-inspired modulation spectrum features [J]. IEEE Signal Processing Letters.2013,20(8):783-786
    [44] Irino T., Aoki Y., Kawahara H., Patterson R. D. Comparison of performance withvoiced and whispered speech in word recognition and mean-formant-frequencydiscrimination [J]. Speech Communication.2012,54(9):998-1013.
    [45] Li X. L., Xu B. L. Formant comparison between whispered and voiced vowels inMandarin [J], Acta Acustica united with Acustica.2005,91(6):1079-1085.
    [46] Li X. L., Xu B. L. Tone features in whispered Chinese [J]. Progress in Natural Science.2005,15(3):285-288.
    [47] Lin W., Yang L. L., Xu B. L. A new frequency scale of Chinese whispered speech inthe application of speaker identification [J]. Progress in Natural Science.2006,16(10):1072-1078.
    [48] Li X. L., Ding H., Xu B. L. Entropy-based initial/final segmentation for Chinesewhispered speech [J]. Shengxue Xuebao/Acta Acustica.2005,30(1):69-75.
    [49] Jin Y., Zhao Y., Huang C. W., Zhao L. Study on the emotion recognition of whisperedspeech [C]. Proceedings of the2009WRI Global Congress on Intelligent Systems. GCIS2009(3):242-246.
    [50] Zhang X. D., Bao Y. Q., Xi J., Zhao L., Zou C. R. Whispered speech emotionrecognition based on MD-CM-SFLA neural network [J]. Dongnan Daxue Xuebao.2012,42(5):848-853.
    [51] Zhang X. D., Zhou J., Liang R., Zhao L., Tao L., Zou C. R. Unsupervised learning ofphonemes of whispered speech in a noisy environment based on convolutive non-negativematrix factorization [J]. Information Sciences.2014,257:115-126.
    [52] Zhang X. D., Wang H. B., Fang X. Y., Tao L., Zhao L. Improving whisperintelligibility in noise environment based on joint time frequency analysis [J]. InformationTechnology Journal.2013,12(6):1089-1097.
    [53] Zhang X. D., Liang R. Y., Zhao L., Zou C. R. Whisper intelligibility enhancementusing a supervised learning approach [J]. Circuits, Systems, and Signal Processing.2012,31(6):2061-2074.
    [54]茹婷婷,谢湘.耳语音数据库的设计与采集[J].清华大学学报.2008,48(S1):725-729.
    [55] Ru T. T., Xie X., Yin H., Kuang J. M. Mandarin connected digits recognition forwhispered speech [J]. Proceeding of Annual International Conference on SpeechCommunication Assocication. INTERSPEECH2008:1141-1144.
    [56] Yang C. Y., Brown G., Lu L., Yamagishi J., King S. Noise-robust whispered speechrecognition using a non-audible-murmur microphone with VTS compensation [C],Proceedings of8th International Symposium on Chinese Spoken Language Processing.ISCSLP2012:220-223.
    [57] Li B., Rong R. Tones in whispered Mandarin [C], Proceedings of8th InternationalSymposium on Chinese Spoken Language Processing. ISCSLP2012:422-425.
    [58] Huang C., Tao X. Y., Tao L., Zhou J., Bin W. H. Reconstruction of whisper in Chineseby modified MELP [C], Proceedings of7th International Conference on Computer Scienceand Education.2012:349-353.
    [59]张翠玲,谭铁军,刘昇.伪装语音的自动话者识别研究[J].刑事技术.2007(2):18-21.
    [60] Gong C. H., Zhao H. M., Wang Y. L., Wang M., Yan Z. Y. Development of Chinesewhispered database for speaker verification [C].1st Asia Pacific Conference onPostgraduate Research in Microelectronics and Electronics. PrimeAsia2009:197-200.
    [61] Chen X. Q., Zhao H. M. Relationship between fundamental and formant frequency inwhispered Mandarin [C]2008International Conference on Audio, Language and ImageProcessing. ICALIP2008:303-306.
    [62] Lv G., Zhao H. M. Formant frequency estimations of whispered speech in Chinese [J].Archives of Acoustics.2009,34(2):127-135.
    [63] Gong C. H., Zhao H. M., Lü G., Liu J. X. A preliminary study on vocal tract system ofChinese whispered vowels [C].2nd International Conference on Bio-Inspired Computing:Theories and Applications. BICTA2007:177-181.
    [64] Pan X. Y., Zhao H. M., Chen X. Q., Xu M. Endpoint detection of whispers based onthe fitting characteristic of EMD [J]. Journal of Electronics and Information Technology.2008,30(2):362-366.
    [65]谈雪丹,顾济华,赵鹤鸣,陶智,沈圆圆.基于HHT瞬时能频值的含噪耳语音声韵分割.通信技术.2010,43(6):207-212.
    [66] Tan X. D., Gu J. H., Zhao H. M., Tao Z. A noise robust endpoint detection algorithmfor whispered speech based on Empirical Mode Decomposition and entropy [C].3rdInternational Symposium on Intelligent Information Technology and Security Informatics.IITSI2010:355-359.
    [67] Gong C. H., Zhao H. M. Tone recognition of Chinese whispered speech [C].2008Pacific-Asia Workshop on Computational Intelligence and Industrial Application. PACIIA2008(1):418-422.
    [68] Chen X. Q., Zhao H. M., Perceiving of tone in whispered Chinese based on auditorymodel [J]. Tien Tzu Hsueh Pao/Acta Electronica Sinica.2009,37(4):864-867.
    [69] Gong C. H., Zhao H. M., Zou W., Wang Y. L., Wang M. A preliminary study onemotions of Chinese whispered speech [C].2009International Forum on ComputerScience-Technology and Applications. IFCSTA2009:429-433.
    [70] Gong C. H., Zhao H. M., Tao Zhi, Yan Z. Y., Gu X. J. Feature analysis on emotionalChinese whispered speech [C].2010International Conference on Information, Networkingand Automation.2010(2):2137-2141.
    [71] Lv G., Zhao H. M. A modified adaptive algorithm for formant bandwidth in whisperconversion [C].2009International Asia Conference on Informatics in Control, Automation,and Robotics. CAR2009:368-371.
    [72] Tao Z., Gu J. H., Tan X. D., Xu Y. S., Han T., Zhao H. M. Reconstruction of normalspeech from whispered speech based on RBF neural network [C].3rd InternationalSymposium on Intelligent Information Technology and Security Informatics. IITSI2010:374-377.
    [73] Tao Z., Zhao H. M., Wu D., Chen D. Q., Zhang X. J. Speech enhancement based onmodified Mel masking model and speech absence probability in whispers [J]. ShengxueXuebao/Acta Acustica.2009,34(4):370-377.
    [74] Tao Z., Zhang X. J., Zhao H. M., Kulesza W. Noise reduction in whisper speech basedon the auditory masking model [C].2010International Conference on Information,Networking and Automation. ICINA2010(2):2272-2277.
    [75]王敏,赵鹤鸣.基于多带解调分析和瞬时频率估计的耳语音话者识别[J].声学学报.2010,35(4):471-476.
    [76] Lv G., Zhao H. M. Joint factor analysis of channel mismatch in whispering speakerverification [J]. Archives of Acoustics.2013,37(4):555-559.
    [77] Gu X. J., Zhao H. M., Lü G. An application in whispered speaker identification usingfeature and model hybrid compensation method [J]. Shengxue Xuebao/Acta Acustica.2012,37(2):198-203.
    [78] A. K. Jain, A. Ross. Multibiometric Systems [J]. Communieation of the ACM.2004,47(1):34-40.
    [79]刘镝.说话人识别中信息融合算法的研究[博士学位论文].北京交通大学.2011.
    [80]单振宇.情感说话人识别及其解决方法的研究[博士学位论文].浙江大学.2010.
    [81]黄挺.情感说话人识别中基频失配及其补偿方法研究[博士学位论文].浙江大学.2011.
    [82]侯丽敏.基于非线性理论和信息融合的说话人识别[博士学位论文].上海大学.2005.
    [83]吴朝晖,杨莹春.说话人识别模型与方法[M].清华大学出版社,2009.
    [84] Anusuya M. A., Katti K. Speeeh Recognition by Machine, A Review [J]. InternationalJoumal of Computer Science and Information Security.2010,6(3):181-205.
    [85]俞一彪.基于互信息理论的说话人识别研究[博士学位论文].上海大学.2004.
    [86] S. Furui, Recent advances in speaker recognition [J]. Pattern Recognition Letters.1997,18:859-872.
    [87] Campbell J. P. Speaker Recognition: A Tutoria [J]. Proceedings of the IEEE.1997,85(9):1437-146.
    [88] Kinnunen T., Li H. An Overview of Text-independent Speaker Recognition: fromFeatures to Supervectors [J]. Speech Communication.2010,52(1):12-40.
    [89]姜涛.网络环境下说话人识别关键技术研究[博士学位论文].哈尔滨工业大学.2011.
    [90] Li Q., Juang B. H., Lee C. H. Automatic verbal information verification of userauthentication [J]. IEEE Trans. On Speech and Audio Processing.2000,8(5):585-596.
    [91] Roberto T., Daniel P. An overview of speaker identification: accuracy and robustnessissues [J]. IEEE Circuits and Systems Magazine.2011,11(2):23-61.
    [92] Huggins M., Grieco J. Confidence metrics for speaker identification [C]. Proceedingsof International Conference on Spoken Language Processing.2002:1381-1384.
    [93] Kersta L. Voiceprint Identification [J]. Nature.1962,196(4861):1253-1257.
    [94] Pruzansky S. Pattern-Matching procedure for automatic talker recognition [J]. Journalof the Acoustical Society of America.1963,35(3):354-35.
    [95] Doddington G. R. A method of speaker verification [J]. Journal of the AcousticalSociety of America.1971,49(1):139.
    [96] Furui S. An Analysis of Long-term Variation of Feature Parameters of Speech and ItsApplication to Talker Recognition [J]. Electronics and Communications in Japan.1974,57A (12):34-42.
    [97] Makhoul J. Linear Prediction: A Tutorial Review [J]. Proceeding of the IEEE.1975,63(4):561-58.
    [98] Atal B. S. Effectiveness of Linear Prediction Characteristics of the Speech Wave forAutomatic Speaker Identification and Verification [J]. Journal of the Acoustical Society ofAmerica.1974,55(6):1304-1312.
    [99] Kumar N., Andreou A. G. Heteroscedastic Discriminant Analysis and Reduced RankHMMs for Improved Speech Recognition [J]. Speech Communication.1980,26(4):357-366.
    [100] Hermansky H. Perceptual Linear Predictive Analysis of Speech [J]. Journal of theAcoustical Society of America.1990,87(4):1738-1752.
    [101] Sakoe H., Chiba S. Dynamic Programming Algorithm Optimization for SpokenWord Recognition [J]. IEEE Transactions on Acoustics, Speech and Signal Processing.1978,26(2):43-49.
    [102] Doddington G. R. Speaker Recognition Identifying People by Their Voices [C].Proceedings of the IEEE.1985,73(11):1651-166.
    [103] Shikano K. Text-independent Speaker Recognition Experiments Using Codebooks inVector Quantization [J]. Journal of the Acoustical Society of America.1985,77(S):11.
    [104] Waibel A. Modular Construction of Time-Delay Neural Networks for SpeechRecognition [J]. Neural Computation.1989,1(1):39-46.
    [105] Hermansky H., Morgan N. RASTA Processing of Speech [J]. IEEE Transactions onSpeech and Audio Processing.1994,2(4):578-58.
    [106] Rabiner L. R., A Tutorial on Hidden Markov Models and Selected Applications inSpeech Recognition [C]. Proceedings of the IEEE.1989,77(2):257-28.
    [107] Reynolds D. A., Rose R C. Robust Text-independent Speaker Identification UsingGaussian Mixture Speaker Models [J]. IEEE Transactions on Speech and AudioProcessing.1995,3(1):72-83.
    [108] Campbell W. M., Campbell J. P, Reynolds D. A, et al., Phonetic Speaker Recognitionwith Support Vector Machines [C]. Neural Information Processing Systems Conference.2003:1377-1384.
    [109] Reynolds D. A., Quatieri T. F., Dunn R. B. Speaker Verification Using AdaptedGaussian Mixture Models [J]. Digital signal processing.2000,10(1-3):19-41.
    [110] Pelecanos J., Sridharan S. Feature Warping for Robust Speaker Verification [C]. ASpeaker Odyssey-The Speaker Recognition Workshop.2001:213-218.
    [111] Reynolds D. A. Channel Robust Speaker Verification via Feature Mapping [C].International Conference on Acoustics, Speech, and Signal Processing.2003:53-56.
    [112] Zheng N., Lee T., Ching P. C. Integration of Complementary Acoustic Features forSpeaker Recognition [J]. IEEE Signal Processing Letters.2007,14(3):181-184.
    [113] Campbell W. M., Campbell J. P., Gleason T. P., et al. Speaker Verification UsingSupport Vector Machines and High-level Features [J]. IEEE Transactions on Audio,Speech, and Language Processing.2007,15(7):2085-2094.
    [114] Stolcke A., Kajarekar S., Ferrer L, et al. Speaker Recognition With SessionVariability Normalization Based on MLLR Adaptation Transforms [J]. IEEE Transactionson Audio, Speech, and Language Processing.2007,15(7):1987-1998.
    [115] Campbell W. M., Sturim D. E., Reynolds D. A., et al. SVM Based SpeakerVerification using a GMM Supervector Kernel and NAP Variability Compensation [C].IEEE International Conference on Acoustics, Speech and Signal Processing.2006:97-100.
    [116] Vogt R. J., Kajarekar S., Sridharan S. Discriminant NAP for SVM SpeakerRecognition [C]. A Speaker Odyssey-The Speaker Recognition Workshop.2008:1-6.
    [117] Kajarekar S. S., Stolche A. NAP and WCCN: Comparison of Approaches usingMLLR-SVM Speaker Verification System [C]. IEEE International Conference onAcoustics, Speech and Signal Processing,2007:249-252.
    [118] Dehak N., Dehak R., Kenny P., et al. Support Vector Machines versus Fast Scoring inthe Low-Dimensional Total Variability Space for Speaker Verification [C]. Interspeech.2009:559-1562.
    [119] Hatch A. O., Stolcke A. Generalized Linear Kernels for One-Versus-AllClassification: Application to Speaker Recognition [C]. IEEE International Conference onAcoustics, Speech and Signal Processing.2006:585-588.
    [120] Gupta V., Kenny P., Ouellet P., Boulianne G., Dumouchel P. Combining Gaussianized/non-Gaussianized features to improve speaker diarization of telephone conversations [J].IEEE Signal Processing Letters.2007,14(12):1040-1043.
    [121] Kenny P., Boulianne G., Ouellet P., Dumouchel P. Joint factor analysis versuseigenchannels in speaker recognition [J]. IEEE Transactions on Audio, Speech andLanguage Processing.200715(4):1435-1447.
    [122] Kenny P., Boulianne G., Ouellet P., P. Dumouchel. Speaker Adaptation using anEigenphone [J]. Basis IEEE Transactions on Speech and Audio Processing.2004,12(6):579-589.
    [123] Hollien H., W. Majewski. Speaker identification by long-term spectra under normaland distorted speech conditions [J]. Journal of Acoustics Society of America.1977,62(4):975-980.
    [124] Scherer K. R., Johnstone T., Klasmeyer G., et al. Can automatic speaker verificationbe improved by training the algorithms on emotional speech [J]? International Conferenceon Spoken Language Processing. ICSLP,2000,2:807-810.
    [125] Grimaldi M., Cummins F. Speaker identification using instantaneous frequencies [J].IEEE Transactions on Audio, Speech and Language Processing.2008,16(6):1097-1111.
    [126] Fan X., John H.L. Hansen. Speaker identification with whispered speech based onmodified LFCC parameters and feature mapping [C]. Proceedings of IEEE InternationalConference on Acoustics, Speech and Signal Processing.2009,4553-4556.
    [127] Fan X., John H.L. Hansen. Acoustic analysis for speaker identification of whisperedspeech [C]. Proceedings of IEEE International Conference on Acoustics, Speech, andSignal Processing.2010,5046-5049.
    [128] Fan X., John H.L. Hansen. Speaker identification within whispered speech audiostreams [J]. IEEE Transactions on Audio, Speech and Language Processing.2011,19(5):1408-1421.
    [129] Fan X., John H.L. Hansen. Speaker identification for whispered speech using atraining feature transformation from neutral to whisper [C]. Proceedings of the AnnualConference of the International Speech Communication Association.2011,2425-2428.
    [130] Fan X., John H.L. Hansen. Acoustic analysis and feature transformation from neutralto whisper for speaker identification within whispered speech audio streams [J]. SpeechCommunication.2013,55(1):119-134.
    [131] Jawarkar N., Holambe R., Basu T. K. Speaker identification using whispered speech
    [C]. Proceedings of3rd International Conference on Communication Systems and NetworkTechnologies.2013:778-781.
    [132] Sarria-Paja M., Falk T. H., O’Shaughnessy D. Whispered speaker verification andgender detection using weighted instantaneous frequencies [C]. Proceedings of38th IEEEInternational Conference on Acoustics, Speech, and Signal Processing.2013:7209-7213.
    [133] Qian X. H., Zhao H. M. Adaptive order of fractional fourier transform for whisperedspeaker identification [C]. Proceedings of International Conference on Automatic Controland Artificial Intelligence.2012:363-366.
    [134] Xu J., Zhao H. M. Speaker identification with whispered speech usingunvoiced-consonant phonemes [C]. Proceedings of International Conference on ImageAnalysis and Signal Processing.2012:136-139.
    [135] Gu X. J., Zhao H. M. Whispered speech speaker identification based on SVM andFA [C]. Proceedings of International Conference on Audio, Language and ImageProcessing.2010:757-760.
    [136] Zhang Q. F., Zhao H. M., Gu X. J. A joint factor analysis approach to whisperingspeaker identification under mismatched speaking manners and channels [C]. Proceedingsof International Conference on Signal Processing Proceedings.2012:540-544.
    [137]郭武.复杂信道下的说话人识别[博士学位论文].中国科学技术大学.2007.
    [138] Markel J. E., Gray A. H. Linear Prediction of Speech [M]. Secaucus, NJ, USA:Springer-Verlag New York, Inc,1982.
    [139]赵力.语音信号处理[M].北京:机械工业出版社,2003.
    [140] Stevens S. S., Volkman J., The relation of pitch of frequency: A revised Scale [J].American Journal of Psychol.1940(53):329-353.
    [141]许东星.基于GMM和高层信息特征的文本无关说话人识别研究[博士学位论文].中国科学技术大学.2009.
    [142] Kenny P, Boulianne G., Dumouchel P. Eigenvoice Modeling with Sparse TrainingData [J]. IEEE Transactions on Speech and Audio Processing.2005,13(3):345-359.
    [143] Dehak N., Kenny P., Dehak R., Dumouchel P., Ouellet P. Front-End Factor Analysisfor Speaker Verification [J]. IEEE Transactions on Audio, Speech and LanguageProcessing.2011,19(4):788-798.
    [144] Kenny P., Reynolds D., Castaldo F. Diarization of Telephone Conversations usingFactor Analysis [J]. IEEE Journal of Selected Topics in Signal Processing.2010,4(6):1059-1080.
    [145]郭武,李轶杰,戴礼荣,王仁华.采用非监督得分规整和因子分析的说话人确认[J].电子学报.2009,37(4):776-779.
    [146]陈继旭,刘明辉,戴蓓蒨,李辉.文本无关说话人确认中的一种新的评分规整方法[J].信号处理.2006,22(4):545-549.
    [147]易克初,填斌,付强.语音信号处理[M].北京:国防工业大学出版社.2000:18-23
    [148]陈雪勤.汉语耳语音——正常音转换机理的研究[博士学位论文].苏州大学.2009.
    [149]林焘,王理嘉著.语音学教程[M].北京大学出版社.1992.
    [150] Tsunoda K., Niimi S., Hirose H. The roles of posterior cricoarytenoid andthyropharyngeus muscles in whispered speech [J]. Folia Phoniatr Logop.1994,46(3):139-151.
    [151] Tartter V. C. What’s in a whisper [J]? Journal of Acoustical Society of American.1989,86(5):1678-1683.
    [152] McLoughin I. V., Chance R.J. LSP analysis and processing for speech coders [J].Electronics Letter.1997,33(9):743-744.
    [153] Gold B., Rabiner L. Parallel processing techniques for estimating pitch periods ofspeech in the time domain [J]. Journal of the Acoustical Society of America.1969,46:442-448.
    [154]沙丹青,栗学丽,徐柏龄.耳语音声调特征的研究[J].电声基础.2003,11:4-7.
    [155]杨莉莉,李燕,徐柏龄.汉语耳语音库的建立与听觉实验研究[J].南京大学学报(自然科学版).2005,41(5):311-317.
    [156]栗学丽.汉语耳语音转换为正常音的研究[博士学位论文].南京大学.2004.
    [157]杨顺安.浊音源动态特性对合成音质的影响[J].中国语文.1986,3:173-181.
    [158]潘欣裕,赵鹤鸣.中文耳语元音的声调特征研究[J].信号处理.2011,27(10):1525—1530.
    [159]赵鹤鸣,朱祺,陈雪勤等.临界频带变换用于混叠语音分离的研究[J].声学学报.2004,29(2):177—181.
    [160]谢波.普通话语音情感识别关键技术研究[博士学位论文].浙江大学.2006.
    [161]尤鸣宇.语音情感识别的关键技术研究[博士学位论文].浙江大学.2007.
    [162] Fernandaz R., Picard R. W. Modeling Driver’s speech under stress [J]. SpeechCommunication.2003,40:145-159.
    [163] Lee C. M., Narayanan S. Toward detecting emotions in spoken dialogs [J]. IEEETransactions on Speech and Audio.2005,13(2):293-303.
    [164]陶建华,蔡莲红,赵世霞,吴志勇.汉语文语转换系统中可训练韵律模型的研究[J].声学学报,2001,26(1):67-72.
    [165]陶建华,谭铁牛.数字化人类情感——和谐人机交互环境中的情感计算[J].微电脑世界.2004,(1):29-32.
    [166]蒋丹宁,蔡莲红.基于语音声学特征的情感信息识别[J].清华大学学报(自然科学版).2006,46(1):86-89.
    [167]黄程韦,金赟,包永强,余华,赵力.嵌入马尔可夫网络的多尺度判决融合耳语音情感识别[J].信号处理.2013,29(1):98-106.
    [168] Cowie R., Douglas-Cowie E., Tsapatsoulis N., Kollias S., Fellenz W., Taylor J.Emotion recognition in human–computer interaction [J]. IEEE Signal Processing Magazine.2001,18:32-80.
    [169] Chen L. J., Mao X., Xue Y. L., Lee L. C. Speech emotion recognition: Features andclassification models [J]. Digital Signal Processing.2012,22(6):1154-1160.
    [170]黄程韦,赵艳,金赟,包永强,于寅骅,赵力.实用语音情感的特征分析与识别的研究[J].电子与信息学报.2011,33(1):112-116.
    [171] Eun H. K., Kyung H. H., Soo H. K., Yoon K. K. Improved Emotion RecognitionWith a Novel Speaker-Independent Feature [J]. IEEE Transactions on Mechatronics.2009,14(3):317-325.
    [172] Moataz E. A., Mohamed S. K., Fakhri K. Survey on speech emotion recognition:Features, classification schemes, and databases [J]. Pattern Recognition.2011,44:572-587.
    [173] Bosch L. Emotions, speech and the ASR framework [J]. Speech Communication.2003,(40):213-225.
    [174] Busso C., Lee S., Narayanan S. Analysis of emotionally salient aspects offundamental frequency for emotion detection [J], IEEE Transaction on Audio SpeechLanguage Process.2009,17(4):582-596.
    [175] Rabiner L., Juang B. Fundamentals of Speech Recognition [M]. Prentice Hall,1993.
    [176] Hernando J., Nadeu C. Linear prediction of the one-sided autocorrelation sequencefor noisy speech recognition [J]. IEEE Transaction on Speech Audio Processing.1997,5(1):80-84.
    [177] Bou-Ghazale S., Hansen J. A comparative study of traditional and newly proposedfeatures for recognition of speech under stress [J], IEEE Transaction on Speech AudioProcess.2000,8(4):429-442.
    [178] Devijiver P. A., Kittler J. Pattern Recognition: A statistical approach [M]. PrenticeHall.1982.
    [179]谢波,陈岭,陈根才,陈纯.普通话语音情感识别的特征选择技术[J].浙江大学学报(工学版).2007,41(11):1816-1822.
    [180] Ververidis D., Kotropoulos, C. Fast and accurate sequential floating forward featureselection with the Bayes classifier applied to speech emotion recognition [J]. SignalProcessing.2008,88(12):2956-1970
    [181] Devillers L., Lamel L. Emotion detection in task-oriented dialogs [C]. Proceedingsof the International Conference on Multimedia and Expo.2003:549-552.
    [182] Chen L., Huang T., Miyasato T., Nakatsu R., Multimodal human emotion/expressionrecognition [C], Proceedings of the IEEE Automatic Face and Gesture Recognition.1998:366-371.
    [183]赵迎春,张劲松,韩晶晶等.中国儿童情感评价图片库的简历[J].中国儿童保健杂志.2009;17(3):290-292.
    [184]金赟,赵艳,赵力等.耳语音情感数据库的设计与建立[J].声学技术.2010;29(1):63-68
    [185] Jin X. C., Wang Z. F. An Emotion Space Model for Recognition of Emotions inSpoken Chinese [C]. Proceedings of First International Conference on AffectiveComputing and Intelligent Interaction.2005:397-402.
    [186] Pereira C. Dimensions of emotional meaning in speech [C]. ISCA Workshop onSpeech and Emotion: A conceptual framework for research.2005:25-28.
    [187] Tara R., Kompe R., Pardo J. M. Emotion space improves emotion recognition [C].Proceedings of International Conference on Spoken Language Processing.2002:2029-2032.
    [188] Tull R. G., Rutledge J. C., Larson C. R. Cepstral analysis of "cold-speech" forspeaker recognition: A second look [J]. The Journal of the Acoustical Society of America.1996,100(4):2760.
    [189] Xu M., Zhang L., Wang L. Database Collection for Study on Speech VariationRobust Speaker Recognition [C]. Proceedings of Asian Region of International Committeefor Co-ordination and Standardization of Speech Databases. Japan,2008:25-27.
    [190] Hollien H., Majewski W., Hollien P. Speaker identification by long-term spectraunder normal, stress, and disguise condition [J]. The Journal of the Acoustical Society ofAmerica.1974,55(S1): S20.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700