文本无关短语音说话人识别技术研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
近年来,随着应用需求的推进和相关理论的发展,说话人识别的研究取得了很大的进展,国内外研究机构正在积极推动其新理论的研究、新方法的实验和实用化进程,其中,利用短语音进行训练和识别的研究备受关注。
     从2004年开始NIST(美国国家标准与技术署)在举办说话人识别评测(SRE)时就按照语音长度划分测试项,在语音长度最短的测试项中,训练和识别的语音长度都不大于10秒。从评测的结果来看,与语音长度较长的测试项相比,该测试项性能下降严重。这主要是因为目前的说话人识别系统主要采用概率统计模型,识别性能很大程度上依赖于训练语音和测试语音的匹配程度,而通常采用的短时倒谱特征中同时包含说话人信息和语义信息,其中语义信息的差异会影响训练和识别的匹配程度。文本相关的说话人识别性能远优于文本无关的说话人识别的主要原因就在于它保证了训练和识别中的语义是完全匹配的。但是在文本无关的说话人识别中,如果训练和测试语音太短,两者的语义内容可能存在较为严重的失配现象,而现有的语音信号处理技术不能实现语音中的语义信息和说话人信息的分离,所以这是影响文本无关说话人识别性能的重要因素。
     为了研究语音长度对说话人识别性能的影响,提高短语音的识别性能,本文研究主要基于以下两个思路展开:1、研究如何克服短语音条件下训练和识别语音的语义不匹配对识别性能的影响,并且针对说话人辨认和说话人确认两种应用,分别提出了解决方法。2、研究如何通过从长度有限的语音中提取尽可能多的语音特征,丰富说话人特征的描述,进而提高短语音条件下说话人识别的性能。
     本课题的主要贡献和创新点包括以下几个方面:
     1)提出了基于“说话人属性约束”的特征变换方法,通过相对抑制语义信息对短时倒谱特征分布的影响,突出说话人信息在语音特征空间分布中的作用,使同一说话人的特征分布更集中,不同说话人间的区分更明显,从而提高了短语音说话人辨认的识别率。本文利用语音信号服从内蕴的非线性流形结构分布的特点,基于语音特征在空间中的局部几何结构,构建了近邻关系关联包;利用说话人属性约束变换,减少了短时倒谱特征中语义信息对说话人辨认的影响;并推导出了该变换中的显性变换矩阵,在GMM-UBM(Gaussian Mixture Model-Universal Background Model,混合高斯模型-通用背景模型)模型的基线说话人辨认系统进行了测试。在同一数据集上,跟已有的特征变换方法相比,在训练语音长度为10秒,测试语音长度为10秒、8秒、5秒、3秒和2秒时,该方法误识率的相对改善率分别为13.48%、9.58%、8.75%、9.90%和11.92%。
     2)提出了基于UBM(Universal Background Model,通用背景模型)混元子空间的文本无关说话人确认方法,寻找训练语音和测试语音的超向量中语义匹配的单元,充分利用这部分的识别结果,同时,减少超向量特征中语义不匹配部分的影响,降低了短语音说话人确认中的等错误率。本文根据文本相关的说话人识别的性能要远优于文本无关的说话人识别的性能的客观事实,以及训练语音和测试语音中的语义信息不匹配是影响短语音说话人识别性能的主要原因,提出了基于通用背景模型的混元在空间中分布的近邻关系,通过划分混元子空间的方法将文本无关的说话人识别隐性地转换为基于“语义内容”的说话人识别方法。利用语音特征对混元子空间的归属关系,对训练语音和测试语音进行拆分,基于各子空间内的子超向量识别,实现文本无关到“语义内容”相关的转换,最后通过设计合理的融合方法对各子空间的识别结果进行融合。在同一数据集上,本文提出的说话人确认方法和已有的基于子空间的说话人确认系统相比,在训练语音的长度为10秒,测试语音长度为10秒、8秒、5秒、3秒和2秒时,其等错误率的相对改善率分别为8.67%、10.22%、6.13%、5.00%和6.10%。
     3)提出了“仿生神经网络激励源”特征,将仿生模式识别的思想引入到说话人激励源建模中,验证了该特征用于说话人识别的有效性,并与基于短时倒谱特征的系统结合,提高了说话人识别的性能。针对现有的基于AANN(Auto-Associate Neural Network,自联想神经网络)方法从LP(Linear Prediction,线性预测)残差中提取激励源特征的不足,提出了基于仿生神经网络的说话人LP残差建模方法,并以此构建了激励源特征和相应的识别系统。该方法避免了传统神经网络中复杂的迭代训练过程,同时利用仿生模式识别的“基于认知而非区分”的思想有效地提高了系统在小样本,也就是短语音条件下的识别效果。在同一数据集上,基于LP残差向量,跟已有的基于AANN的识别方法相比,本文提出的基于仿生神经的识别方法在说话人辨认中,当训练语音的长度为10秒,测试语音长度为10秒、8秒、5秒、3秒和2秒时,其误识率相对改善率分别为6.98%、11.59%、9.67%、9.00%和18.45%。鉴于在说话人识别中,基于LP残差的激励源特征对短时倒谱特征具有很好的互补性,研究了基于短时倒谱特征和激励源特征融合的短语音说话人识别,并设计了基于可信度的短时倒谱特征和激励源特征判决融合方法。通过对不同特征间相关性的度量,研究了说话人识别中LP残差激励源特征对短时倒谱特征的互补性,为说话人识别中激励源特征和短时倒谱特征的结果融合提供了理论依据。针对说话人辨认和说话人确认,分别采用了基于单次识别中各特征识别结果可靠性的动态融合方法和基于不同特征在说话人识别中固有的区分性能的静态融合方法。相对于单一的短时倒谱特征,两种特征的识别结果融合之后,当训练语音的长度为10秒,测试语音长度分别为10秒、8秒、5秒、3秒和2秒时,系统识别性能的相对改善率分别为13.44%、11.11%、10.22%、10.12%和8.95%(说话人辨认)和5.51%、5.02%、10.72%、8.43%和2.55%(说话人确认)。
In recent years, along with the advancement from application demands and the development of relative theories, research on speaker recognition has made great progress. Various research facilities inside and outside are positively promoting its new theory research, new method experiment and practical advancement. Among these researches, the training and testing based on short-duration speech attract much attention.
     From 2004, in the Speaker Recognition Evaluations organized by National Institute of Standard and Technology, test items are designed according to the length of speech, and in the test item with shortest-duration the lengths of training speech and testing speech are not more than 10 seconds. It can be concluded from the evaluation results that compared with the test items with longer speech, when the lengths of training speech and testing speech are reduced to 10 seconds, the performance of speaker recognition degrades drastically. The main reason is that current speaker recognition systems are mainly based on probability statistical models, their recognition performances mostly depend on the matching degree of training speech and testing speech. But the most widely used short-time cepstral features both contain speaker information and content information, the content information difference in the short-time cepstral feature influences the matching degree of training and testing speech. And the main explanation why text-dependent speaker recognition has great advantage over text-independent speaker recognition is that it guarantees the contents of training speech and testing speech are totally matched. But in the text-independent speaker recognition, if the length of training speech and testing speech are too short, there may exist serious mismatch phenomenon, and since current speech signal processing technology can't separate the content information and speaker information from speech signal, it becomes the main factor to restrict the performance of text-independent speaker recognition.
     To research on the influence of speech length on speaker recognition performance and improve the performance of short-duration text-independent speaker recognition, the research of this thesis is from two aspects. Firstly, research on how to reduce the influence of content information in the short-time cepstral features, and to the application of speaker identification and speaker verification, respectively proposes two schemes. Secondly, research on how to get more speech feature from speech with limited length to enrich speaker feature description, which can help improve speaker recognition performance under short-duration environment.
     The main work and contributions of this thesis are outlined as follows:
     1) A feature transformation method based on speaker attribution constraint is proposed to suppress the influence of content information on the distribution of short-time cepstral features, which can make the features from the same speaker become more centered, and the distinction between different speakers more obvious, and improve the identification rate in short-duration speaker identification. This thesis firstly uses the speech characteristic of obeying nonlinear manifold structure, and through analysis on the local geometry structure of speech feature, constructs neighbor relationship package, and secondly uses the speaker attribution constrained transformation to suppress the influence of content information in the short-time cepstral feature, finally a dominant transformation matrix is deviated. The effect of this transformation is tested on the baseline system based on GMM-UBM. On the same test database, compared with other feature transformation methods, the best relative improvement rate of SAC-LPP is 13.48%, 9.58%,8.75%,9.90%and 11.92%, when the length of training data is 10 seconds and the length of testing data is 10 seconds,8 seconds,5 seconds,3 seconds and 2 seconds.
     2) A text-independent speaker verification scheme based on UBM mixture subspace is proposed, which can search the content information matching unit in the training supervector and testing supervector, then the influence of the mistching part in the supervector can be reduced, and the equal error rate in short-duration speaker verification can be reduced. Based on the objective fact that the performance of text-dependent speaker recognition is superior over the text-independent speaker recognition and the main reason that influences the short-duration speaker recognition is the mismatch between the content information from the training speech and testing speech, this thesis proposes a method to use the neighbor distribution relationship of UBM mixtures to roughly classify the content information in the training and testing speech, then in each subspaces, text-independent speaker recognition is transformed into "content-dependent" speaker recognition. Moreover, dual-confidence subspace fusion scheme is proposed which distributes different weights based on the feature distribution of training speech and testing speech and the distinguishing ability of each subspace, through it the detailed information of speech is fully used. On the same test database, compared with other subspace methods, the best relative improvement rate of the proposed method is 18.67%,10.22%,6.13%, 5.00% and 6.10%, when the length of training data is 10 seconds and the length of testing data is 10 seconds,8 seconds,5 seconds,3 seconds and 2 seconds.
     3) A "biomimetic neural network excitation source" feature is proposed, in which the thought of biomimetic pattern recognition is introduced to model the excitation source from speech data. The effectiveness of this feature in speaker recognition is validated, and it can improve the performance of speaker recognition when integrating with short-time cepstral feature. Due to the disadvantage of using AANN to extract and model excitation source feature from LP residual, this thesis proposes to use biomimetic neural network to model speaker LP residual excitation feature, and constructs excitation source feature and relative recognition system. This method not only avoids the complicated iteration training in the traditional neural network, but also uses the principle contained in biomimetic pattern recognition that "recognition based on learning but not distinguishing" to make it has great performance under little smple environment, that is short-utterance environment. On the same test database, based on LP residual vector, when the length of training data is 10 seconds, compared with the AANN recognition method, the BNN proposed via this thesis can relatively reduce the identification error rates by 6.98%,11.59%,9.67%,9.00% and 8.45%, when the length of testing data is 10 seconds,8 seconds,5 seconds,3 seconds and 2 seconds. Due to the complementary characteristics of LP residual excitation source to short-time cepstral feature in speaker, this thesis studies on the integration of short-time cepstral feature and LP residual excitation source feature in speaker recognition, and designs dicision integration method based on the confidence of each feature. Through the measurement on correlations among different features used in speaker recognition, the complimentary of LP residual excitation source feature and short-time cepstral feature in speaker recognition is proposed theoretically. And different fusion methods are proposed for speaker identification and speaker verification. In the dynamical fusion method, the reliability of each feature is gotten from single recognition result, and in the static fusion method, the distinguishing ability of each feature is gotten from its inherent speaker distinguishing ability. Compared with single short-time cepstral feature, when the length of training data is 10 seconds, the fusion method can reduce the identification error rates by 13.44%, 11.11%,10.22%,10.12% and 8.95%(speaker identification), reduce the equal error rates by 5.51%,5.02%,10.72%,8.43% and 2.55%(speaker verification), when the length of testing data is 10 seconds,8 seconds,5 seconds,3 seconds and 2 seconds.
引文
[1]T. Kinnunen and H. Li. An overview of text-independent speaker recognition:from features to supervectors[J]. Speech Communication,2010,52(1):12-40.
    [2]柳艳红.短语音说话人识别算法及说话人识别技术应用研究[R].博士后研究报告,清华大学,2009.
    [3]L. G. Kersta. Voiceprint identification[J]. Nature,1962,196(4861):1953-1957.
    [4]Bellegarda, J.R., Naik, D. and Neeracher, M.. Language-independent, short-enrollment voice verification over a far-field microphone[C]. In Proc. of the International Conference on Acoustics, Speech, and Signal Processing, Salt Lake City, Utah, USA,2001,1:445-448.
    [5]A.Rosenberg, J.Hirschberg and M.Bacchiani. Caller Identification for the SCAN Mail Voicemail Browser[C]. In Proc. of the International Conference on Acoustics, Speech, and Signal Processing, Salt Lake City, Utah, USA,2001:1-4.
    [6]http://www.nist.gov/speech/tests/spk/index.htm[EB/OL]. The official website for the NIST Speaker Recognition Evaluations.
    [7]Joo, T.H. and Oppenheim, A.V.. Effects of FFT coefficient quantization on sinusoidal signal detection[C]. In Proc. of the International Conference on Acoustics, Speech, and Signal Processing, New York, USA,1988,3:1818-1821.
    [8]Rodriguez-Porcheron, D. and Faundez-Zanuy, M.. Speaker recognition with a MLP classifier and LPCC codebook[C]. In Proc. of the International Conference on Acoustics, Speech, and Signal Processing, Phoenix, USA,1999,2:1005-1008.
    [9]Costin, M. and Zbancioc, M.. Improving cochlear implant performances by MFCC technique[C]. International Symposium on Signals, Circuits and Systems, Bangkok, Thailand, 2003,2:449-452.
    [10]Yiu-Pong Lai, Manhung Siu and Mak, B.. Joint Optimization of the Frequency-Domain and Time-Domain Transformations in Deriving Generalized Static and Dynamic MFCCs[J]. Signal Processing Letters,2006,13(11):707-710.
    [II]Wang Yutai, Li Bo and Jiang Xiaoqing. Speaker recognition based on dynamic MFCC parameters[C]. In Proc. of the International Conference on Image Analysis and Signal Processing, Linhai, China,2009:406-409.
    [12]Zufeng Weng, Lin Li and Donghui Guo. Speaker recognition using weighted dynamic MFCC based on GMM[C]. In Proc. of the International Conference on Anti-Counterfeiting Security and Identification in Communication(ASID), Chengdu, China,2010:285-288.
    [13]Longbiao Wang, Ohtsuka, S. and Nakagawa, S.. High improvement of speaker identification and verification by combining MFCC and phase information[C]. In Proc. of IEEE International Conference on Acoustics, Speech and Signal Processing, Taipei, Taiwan, 2009:4529-4532.
    [14]Longbiao Wang, Minami, K. and Yamamoto, K.. Speaker identification by combining MFCC and phase information in noisy environments[C]. In Proc. of IEEE International Conference on Acoustics, Speech and Signal Processing, Dallas,Texas,USA,2010:4502-4505.
    [15]Murty, K.S.R. and Yegnanarayana, B.. Combining evidence from residual phase and MFCC features for speaker recognition[J]. Signal Processing Letters, IEEE,2006,13(1):52-55.
    [16]LU Xu gang and DANG Jian wu. An investigation of dependencies between frequency component s and speaker characteristics for text-independent speaker identification[J]. Speech Communication,2008,50:312-322.
    [17]Sivakumaran, P. and Ariyaeeinia, A.M.. The use of sub-band cepstrum in speaker verification[C]. In Proc. of IEEE International Conference on Acoustics, Speech and Signal Processing, Istanbul, Turkey,2000,2:1073-1076.
    [18]Obuchi, Y.. Mixture weight optimization for dual-microphone MFCC combination[J]. IEEE Workshop on Automatic Speech Recognition and Understanding,2005:325-330.
    [19]Yuan Yujin, Zhao Peihua and Zhou Qun. Research of speaker recognition based on combination of LPCC and MFCC[C]. In Proc. of IEEE International Conference on Intelligent Computing and Intelligent Systems(ICIS), Shanghai, China,2010:765-767.
    [20]Mezghani, A. and O'Shaughnessy, D.. Speaker verification using a new representation based on a combination of MFCC and formants[C]. In Proc. of Canadian Conference on Electrical and Computer Engineering, Saskatoon, Canada,2005:1461-1464.
    [21]Wei Huang, Jianshu Chao and Yaxin Zhang. Combination of pitch and MFCC GMM supervectors for speaker verification[C]. In Proc. of International Conference on Audio, Language and Image Processing(ICALIP), Shanghai, China,2008:1335-1339.
    [22]Harrag, A., Mohamadi, T. and Serignat, J.F.. LDA Combination of Pitch and MFCC Features in Speaker Recognition[C]. Annual IEEE INDICON, Madras,India,2005:237-240.
    [23]Peng Yuan, Mu Lin and Kong Xiangli. A study on echo feature extraction based on the modified relative spectra (RASTA) and perception linear prediction (PLP) auditory model[C]. In Proc. of IEEE International Conference on Intelligent Computing and Intelligent Systems(ICIS), Guangzhou, China,2010:657-661.
    [24]Wang Jian. Study of speaker recognition based on improved feature parameter fusion[C]. In Proc. of International Conference on Environmental Science and Information Application Technology (ESIAT), Wuhan, China,2010:341-344.
    [25]Nengheng Zheng, Tan Lee and P. C. Ching. Integration of Complementary Acoustic Features for Speaker Recognition[J]. Signal Processing Letters, IEEE,2007:181-184.
    [26]Ning Wang, Ching, P.C. and Nengheng Zheng. Robust Speaker Recognition Using Denoised Vocal Source and Vocal Tract Features[J]. IEEE Transactions on Audio, Speech, and Language Processing,2011:196-205.
    [27]Nengheng Zheng and Ching, P.C Using Haar transformed vocal source information for automatic speaker recognition[C]. In Proc. of IEEE International Conference on Acoustics, Speech, and Signal Processing, Montreal, Quebec, Canada,2004,1:77-80.
    [28]Sonmez, K., Heck, L. and Weintraub, M.. A lognormal tied mixture model of pitch for prosody based speaker recognition[C]. In Proc. of European Conference on Speech Communication and Technology, Rhodes, Greece,1997:1391-1394.
    [29]Sonmez, K., Shriberg, E. and Heck, L.. Modeling dynamic prosodic variation for speaker verification[C]. In Proc. of International Conference on Spoken Language Processing, Sydney, Australia,1998:2631-2634.
    [30]Tanabian, M.-M., Tierney, P. and Zahirazami, B.. Automatic speaker recognition with formant trajectory tracking using CART and neural networks[C]. In Proc. of IEEE Canadian Conference on Electrical and Computer Engineering, Saskatoon, SK,2005:1225-1228.
    [31]Mezghani, A. and O'Shaughnessy, D.. Speaker verification using a new representation based on a combination of MFCC and formants[C]. In Proc. of IEEE Canadian Conference on Electrical and Computer Engineering, Saskatoon, SK.2005:1461-1464.
    [32]王振力,裴凌波,于元斌.一种基于噪声对消与倒谱均值相减的鲁棒语音识别方法[J],智能系统学报,2008,3(6):552-556.
    [33]Liu, F.-H., Acero, A. and Stern, R.M.. Efficient joint compensation of speech for the effects of additive noise and linear filtering[C]. In Proc. of IEEE International Conference on Acoustics, Speech, and Signal Processing, San Francisco, USA,1992,1:257-260.
    [34]Viikki O and Laurila K. Cepstral domain segmental feature vector normalization for noise robust speech recognition[J]. Speech Communication,1998,25:133-147.
    [35]Lee, L. and R.C.Rose. Speaker normalization using efficient frequency warping procedures[C]. In Proc. of IEEE International Conference on Acoustics, Speech, and Signal Processing, Atlanta, GA,1996:353-356.
    [36]Pelecanos J. and S.Sridharan. Feature warping for robust speaker verification[C]. In Proc. Speaker Odyssey, Crete,Greece,2001:213-218.
    [37]Reynolds, D.A.. Channel robust speaker verification via feature mapping[C]. In Proc. of the International Conference on Acoustics, Speech, and Signal Processing, Hong Kong, China,2003, 2:53-56.
    [38]Eide, E. and Gish, H.. A parametric approach to vocal tract length normalization[C]. In Proc. of IEEE International Conference on Acoustics, Speech, and Signal Processing, Atlanta, Georgia, USA,1996,1:346-348.
    [39]T.K.Vintsyuk. Speech discrimination by dynamic programming[J]. Cybernetics and Systems Analysis,1968,4:81-88.
    [40]H.Sakoe and S.Chiba. Dynamic programming algorithm optimization for spoken word recognition[J]. IEEE Trans, on Acoustics, Speech and Signal Processing,1978,26(1):43-49.
    [41]L R.Rabiner, C K.Pan, and F K.Soong. On the performance of isolated word speech recognizers using vector quantization and temporal energy contours[J]. AT&T Tech, J.,1984, 63(3):1245-1260.
    [42]L.R.Rabiner. A tutorial on hidden Markov models and selected applications in speech recognition[J]. In Proc. of the IEEE,1989,77(2):257-286.
    [43]B. H. Juang and L. R. Rabiner. Hidden Markov Models for speech[J]. Technometrics,1991, 33(3):251-272.
    [44]R.A.Sukkar, M.B.Gandhi and A.R.Setlur. Speaker verification using mixture decomposition discrimination[J]. IEEE Trans, on Speech and Audio Processing,2000,8(2):292-299.
    [45]K.R.Farrell, R.J.Mammone and K.T.assaleh. Speaker recognition using neural networks and conventional classifiers[J]. IEEE Trans, on Speech and Audio Processing,1994,2(1):194-204.
    [46]D.A.Reynolds and R.C.Rose. Robust text-independent speaker identification using Gaussian mixture speaker models[J]. IEEE Trans, on Speech and Audio Processing,1995,3(1):72-83.
    [47]D.A.Reynolds. Speaker identification and verification using Gaussian speaker models[J]. Speech Communication,1995,17:91-108.
    [48]M.Schmidt and H.Gish. Speaker identification via support vector classifiers[C]. In Proc. of IEEE International Conference on Acoustics, Speech, and Signal Processing, Atlanta, Georgia, USA,1996,1:105-108.
    [49]V.Wan and W.M.Campbell. Support vector machines for speaker verification and identification[C]. In Proc. of IEEE Signal Processing Society Workshop on Neural Networks for Signal Processing X, Sydney, Australia,2000:775-784.
    [50]Campbell, W., Sturim, D., and Reynolds, D. Support vector machines using GMM supervectors for speaker verification[J]. IEEE Signal Processing Letters,2006,13(5):308-311.
    [51]Kenny, P. Joint factor analysis of speaker and session variability:theory and algorithms[R]. Technical report CRIM-06/08-14,2006.
    [52]Kenny, P., Boulianne, G. and Ouellet, P.. Speaker and session variability in GMM-based speaker verification[J]. IEEE Trans, on Audio, Speech and Language,2007,15(4):1448-1460.
    [53]Kenny, P., OuelletP. and Dehak, N.. A study of inter-speaker variability in speaker verification[J]. IEEE Trans, on Audio, Speech and Language,2008,16(5):980-988.
    [54]Campbell, W., Sturim, D., and Reynolds, D.. SVM based speaker verification using a GMM supervector kernel and NAP variability compensation[C]. In Proc. of International Conference on Acoustics, Speech, and Signal Processing, Philadelphia, USA,2005:637-640.
    [55]Reynolds D. A.. Channel robust speaker verification via feature mapping[C]. In Proc. of International Conference on Acoustics, Speech, and Signal Processing, Hong Kong, China,2003, 2:53-56.
    [56]Roland Auckenthalera, b, Michael Careya and Harvey Lloyd-Thomasa. Score normalization for text-independent speaker verification system[J]. Digital Signal Processing, 2000,10(1-3):42-54.
    [57]Reynolds D. A.. The effect of handset variability on speaker recognition performance: experiments on the switchboard corpus[C]. In Proc. of International Conference on Acoustics, Speech, and Signal Processing, Atlanta, Georgia, USA,1996,1:113-116.
    [58]Li K. P. and Porter J.E.. Normalizations and selection of speech segments for speaker recognition scoring[C]. In Proc. of International Conference on Acoustics, Speech, and Signal Processing, Seattle, Washington, USA,1998,1:595-598.
    [59]Ben M., Blouet R. and Bimbot F.. A Monte Carlo method for score normalization in automatic speaker verification using Kullback-Leibler distances[C]. In Proc. of International Conference on Acoustics, Speech, and Signal Processing, Orlando, Florida, USA,2002,1: 689-692.
    [60]Sturim D. E. and Reynolds D. A.. Speaker adaptive cohort selection for TNorm in text-independent speaker verification[C]. In Proc. of International Conference on Acoustics, Speech, and Signal Processing, Philadelphia, Pennsylvania, USA,2005,1:741-744.
    [61]Najim DEHAK. Discriminative and generative approaches for long- and short-term speaker characteristics modeling:application to speaker verification[D]. PhD. thesis,2009.
    [62]Solewicz, Y.A. and Koppel, M.. Using post-classifiers to enhance fusion of low-and high-level speaker recognition[J]. IEEE Trans, on audio, speech, and language processing,2007, 15(7):2063-2071.
    [63]王尔玉,郭武,李轶杰,戴礼荣等.采用模型和得分非监督自适应的说话人识别[J].自动化学报,2009,35(3):267-271.
    [64]Higgins, Alan and Naylor, Joe. Automatic speaker recognition system[R]. Final Report ITT Defense Communications Div., San Diego, CA,1984.
    [65]Sadaoki Furui.50 years of progress in speech and speaker recognition[C]. In Proc. of SPECOM, Patras, Greece,2005:1-9.
    [66]B. Fauve, N. Evans and N. Pearson. Influence of task duration in text independent speaker verification[C]. In Proc. Interspeech, Honolulu, Hawaii, USA,2007:794-797.
    [67]B. Fauve, N. Evans, and J. Mason. Improving the performance of text independent short-duration SVM-and GMM-based speaker verification[C]. In Proc. IEEE Odyssey Workshop, Stellenbosch, South Africa,2008:1-7.
    [68]The 2008 NIST Speaker Recognition Evaluation Results[EB/OL]. http://www.itl.nist.gov/iad/mig/tests/sre/2008/official_results/index.html.
    [69]Chi-Sang Jung, Moo Young Kim and Hong-Goo Kang. Selecting feature frames for automatic speaker recognition using mutual information[J]. IEEE Transactions on Audio, Speech, and Language Processing,2010:1332-1340.
    [70]H. Gish and M. Schimidt. Text-independent speaker identification. IEEE Signal Process[J]. Mag.,1994,11(4):465-470.
    [71]D. A. Reynolds. Large population speaker identification using clean and telephone speech[J]. IEEE Signal Processing Letters,1995,2(3):46-48.
    [72]Wai Nang Chan, Nengheng Zheng and Tan Lee. Discrimination power of vocal source and vocal tract related features for speaker segmentation[J]. IEEE Trans. on audio, speech and language processing,2007,15(6):1884-1892.
    [73]H.S. Jayanna and S.R. Mahadeva Prasanna. Multiple frame size and rate analysis for speaker recognition under limited data condition[J]. IET SIGNAL PROCESSING,2009,3(3): 189-204.
    [74]Zhonghua Fu and Rongchun Zhao. Speaker modeling technique based on regression class for speaker identification with sparse training[J]. Sinobiometrics 2004, LNCS3338:610-616.
    [75]Patrick Kenny, Gilles Boulianne and Pierre Dummouchel. Eigenvoice modeling with sparse training data[J]. IEEE Trans. on audio, speech and language processing,2005,13(3):345-354.
    [76]Lin Lin and Shuxun Wang. A Kernel Method for Speaker Recognition with Little Data[C]. In Proc. of 8th International Conference on Signal Processing, Guilin, China,2006,1:16-20.
    [77]林琳.基于模糊聚类与遗传算法的说话人识别理论研究及应用[D].Ph.D.Thesis,吉林大 学,2007.
    [78]吴礼福,解焱陆,戴蓓蓓等.基于CGMM-UBM的电话短语音说话人确认[J].电路与系统学报,2007,12(5):131-136.
    [79]Ji Ming and Jie Lin. Modeling long-range dependencies in speech data for text-independent speaker recognition[C]. In Proc. of International Conference on Acoustics, Speech, and Signal Processing, Las Vegas, Nevada, USA,2008:4825-4828.
    [80]Altincay, H. and Demirekler, M.. Speaker identification by combining multiple classifiers using dempster-shafer theory of evidence[J]. Speech Communication,2003,41(4):531-547.
    [81]Hannani, A., Petrovska-Delacr'etaz, D. and Chollet, G.. Linear and non-linear fusion of ALISP-based and GMM systems for text-independent speaker verification[C]. In Proc. Odyssey: the Speaker and Language Recognition Workshop, Toledo, Spain,2004:111-116.
    [82]Chen, K., Wang, L. and Chi, H.. Methods of combining multiple classifiers with different features and their applications to text-independent speaker recognition[J]. International Journal of Pattern Recognition and Artificial Intelligence,1997,11(3):417-445.
    [83]Damper, R. and Higgins, J.. Improving speaker identification in noise by subband processing and decision fusion[J]. Pattern Recognition,2003,24:2167-2173.
    [84]Farrell, K., Ramachandran, R. and Mammone, R.. An analysis of data fusion methods for speaker verification[C]. In Proc. of International Conference on Acoustics, Speech and Signal Processing, Seattle, Washington, USA,1998,2:1129-1132.
    [85]Fredouille, C., Bonastre, J.-F. and Merlin, T.. AMIRAL:a block-segmental multirecognizer architecture for automatic speaker recognition[J]. Digital Signal Processing,2000,10(1): 172-197.
    [86]Kinnunen, T., Hautamaki, V. and Franti, P.. Fusion of spectral feature sets for accurate speaker identification[C]. In Proc. of 9th International Conference on Speech and Computer (SPECOM), St. Petersburg, Russia,2004:361-365.
    [87]Mak, M.-W., Cheung, M. and Kung, S.. Robust speaker verification from GSM-transcoded speech based on decision fusion and feature transformation[C]. In Proc. of International Conference on Acoustics, Speech, and Signal Processing, Hong Kong, China,2003,2:745-748.
    [88]Moonasar, V. and Venayagamoorthy, G.. A committee of neural networks for automatic speaker recognition (ASR) systems[C]. In Proc. of International Joint Conference on Neural Networks(IJCNN), Washington, DC, USA,2001:2936-2940.
    [89]Ramachandran, R., Farrell, K. and Ramachandran, R.. Speaker recognition-general classifier approaches and data fusion methods[J]. Pattern Recognition,2002,35:2801-2821.
    [90]Slomka, S., Sridharan, S. and Chandran, V.. A comparison of fusion techniques in mel-cepstral based speaker identification[C]. In Proc. of International Conference on Spoken Language Processing(ICSLP), Sydney, Australia,1998:225-228.
    [91]Hatch, A., Stolcke, A. and Peskin, B.. Combining feature sets with support vector machines: Application to speaker recognition[C]. In the 2005 IEEE Workshop on Automatic Speech Recognition and Understanding(ASRU), San Juan, Puerto Rico,2005:75-79.
    [92]Reynolds, D., Andrews, W. and Campbell, J.. The Super SID project:exploiting high-level information for high-accuracy speaker recognition[C]. In Proc. of International Conference on Acoustics, Speech and Signal Processing(ICASSP), Hong Kong, China,2003:784-787.
    [93]Tong, R., Ma, B. and Lee, K.. Fusion of acoustic and tokenization features for speaker recognition[C]. In 5th International Symposium on Chinese Spoken Language Processing(ISCSLP), Singapore,2006:494-505.
    [94]Luciana Ferrer, Kemal Sonmez and Elizabeth Shriberg. An anto-correlation kernel for improved system combination in speaker verification[C]. In the Speaker and Language Recognition Workshop, Stellenbosch, South Africa,2008.
    [95]Garcia-Romero, D., Fierrez-Aguilar, J. and Gonzalez-Rodriguez, J.. On the use of quality measures for text-independent speaker recognition[C]. In Proc. of Speaker Odyssey:the Speaker Recognition Workshop, Toledo, Spain,2004,4:105-110.
    [96]Kryszczuk, K., Richiardi, J. and Prodanov, P.. Reliability-based decision fusion in multimodal biometric verification systems[J]. EURASIP Journal of Advances in Signal Processing,2007.
    [97]Solewicz, Y. and Koppel, M.. Using post-classifiers to enhance fusion of low-and high-level speaker recognition[J]. IEEE Trans, on Audio, Speech and Language Processing, 2007,15(7):2063-2071.
    [98]Haizhou Li, Bin Ma and Kong-Aik Lee. The I4U system in NIST 2008 speaker recognition evaluation[C]. In Proc. of International Conference on Acoustics, Speech and Signal Processing(ICASSP), Taipei, Taiwan,2009:4201-4204.
    [99]钱博.基于汉语元音映射的说话人识别技术研究[D].Ph.D.Thesis,南京理工大学,2007.
    [100]陈立伟,赵春晖,白玉等.基于非齐次隐马尔科夫模型的特定人元音的识别方法[J].哈尔滨工程大学学报,2006,27(2):296-300.
    [101]颜凯.基于高斯混合模型的说话人识别算法研究[D].硕士毕业论文,南京理工大学,2009.
    [102]刘明辉.基于GMM和SVM的文本无关的说话人确认方法研究[D]Ph.D.Thesis,中国科技大学,2007.
    [103]许东星.基于GMM和高层信息特征的文本无关说话人识别研究[D]Ph.D.Thesis,中国科技大学,2009.
    [104]吴德辉.基于因子分析的鲁棒性话者确认方法研究[D].硕士毕业论文,中国科技大学,2009.
    [105]解焱陆.基于特征变换和分类的电话语音文本无关说话人识别研究[D]Ph.D. Thesis,中国科技大学,2007.
    [106]李冬冬.基于扩展和聚类的情感鲁棒说话人识别研究[D]Ph.D. Thesis,浙江大学,2008.
    [107]陆伟.基于缺失特征的文本无关说话人识别鲁棒性研究[D]Ph.D. Thesis,中国科技大学,2008.
    [108]G Fant. Structural classification of Swedish Phonemes[J]. Speeeh Transmission Laboratory Quarterly Progress and Status People,1960.
    [109]S.Young, G.Evermann and D.Kershaw. The HTK Book(for HTK Version 3.1)[M].2001.
    [110]Benjamin J. Shannon and Kuldip K. Paliwal. Feature extraction from higher-lag autocorrelation coefficients for robust speech recognition[J]. Speeeh Communication,2006, 48:1458-1485.
    [111]Mark D. Skowronski and John G. Harris. Exploiting independent filter bandwidth of human factor cepstral coefficients in automatic speeeh recognition[J]. The Journal of the Acoustical Society of America,2004,116(3):1774-1780.
    [112]郭武,王仁华,戴礼荣.基于基音周期与清浊音信息的梅尔倒谱参数.数据采集与处理[J].2007,22(2):230-233.
    [113]章熙春等.语音MFCC特征计算的改进算法[J].数据采集与处理,2005,20(2):161-165.
    [114]D.A.Reynolds. Speaker identification and verification using Gaussian mixture speaker models[J]. Speech Communication,1995,2(17):91-108.
    [115]Reynolds, D., Quatieri, T. and Dunn, R.. Speaker verification using adapted gaussian mixture models[J]. Digital Signal Processing,2000,10(1):19-41.
    [116]Dempster, A., Laird, N. and Robin, D.. Maximum likelihood from incomplete data via the EM algorithm[J]. Journal of the Royal Statistical Society,1997:1-38.
    [117]J.L. Gauvain and C. H. Lee. Maximum a posteriori estimation for multivariate Gaussian mixture observations of markov chains[J]. IEEE Trans, on Speech Audio Processing,1997,2: 291-298.
    [118]John Shawe-Taylor and Nello Cristianini. Kernel methods for pattern analysis[M]. U.K.:Cambrige University Press,2004.
    [119]B Scholkopf. Learning with Kernels:Support vector Machines, regularization, optimization and beyond[M]. U.S.A.:MIT Press,2002.
    [120]Campbell,W.M., Sturim, D. E. and Reynolds, D.. Support vector machines using GMM supervectors for speaker verification[J]. IEEE Signal Processing Letters,2006,13(5):308-311.
    [121]国际中文语言资源联盟[EB/OL].http://www.d-ear.com/CCC/corpora.htm.
    [122]W. Zhang, Y. Yang and Z. Wu. Experimental evaluation of a new speaker identification framework using PCA[C]. IEEE International Conference on SMC, Washington, D.C., USA 2003:4147-4152.
    [123]C.B.Lima, A.Alcain, and J.A.Apolinario Jr. On the use of PCA in GMM and AR-Vectors models for text independent speaker verification[J]. DSP-2002,2:595-598.
    [124]C. Seo and K. Y. Lee. GMM based on local PCA for speaker identification[J]. Electronics Letters,2001,37:1486-1488.
    [125]邱政权,尹俊勋.通过分离语音空间和说话人空间的说话人识别[J].计算机工程与应用,2008,44(12):212-214.
    [126]俞一彪,芮贤义,许允喜.说话人语音特征子空间分离及识别应用[J].电路与系统学报,2008,13(1):7-11.
    [127]Haoze Lu, Masafumi Nishida and Yasuo Horiuchi. Text-independent speaker identification in phoneme-independent subspace using PC A transformation[J]. International Journal of Biometrics,2010,2(4):379-390.
    [128]章万锋.基于PCA与LDA的说话人识别研究[D].硕士毕业论文,浙江大学,2004.
    [129]Fant, G.. Acoustic Theory of Speech Production[M]. Mouton, The Hague,1970.
    [130]L. C. W. Pols, L. J. Th. van der Kamp, and R. Plomp. Perceptual and physical space of vowel sounds[J]. Jouranl of the Acoustical Society of America,1969,46(28):458-467.
    [131]Jansen, A. and Niyogi, P. A geometric perspective on speech sounds[R]. Technical report, University of Chicago.2005.
    [132]Jansen, A. and Niyogi, P. Intrinsic Fourier analysis on the manifold of speech sounds[C]. In Proc. of the IEEE Int. Conf. on Acoustics, Speech and Signal Processing(ICASSP), Toulouse, France,2006,1:241-244.
    [133]Andrew Errity, B.Sc.. Exploring the dimensionality of speech using manifold learning and dimensionality reduction methods[D]. Ph.D.Thesis, Dublin City University,2010.
    [134]Xugang Lu, Jianwu Dang. Vowel production manifold intrinsic factor analysis of vowel articulation[J]. IEEE Trans on audio, speech and language processing,2010,18(5):1053-1062.
    [135]M. Nishida and Y. Ariki. Speaker Verification by Intergrating Dynamic and Static Features using Subspace Method[C]. In Proc. of ICSLP, Beijing, China,2000,3:1013-1016.
    [136]M. Nishida and Y. Ariki. Speaker recognition by separating phonetic space and speaker space[J]. EUROSPEECH-2001:1381-1384.
    [137]H. Sebastian Seung and Daniel D. Lee.. The manifold ways of perception[J]. Science, 2000,290(5500):2268-2269.
    [138]Tenenbaum J.B., Silva V.de and Langford J.C. A global geometric framework for nonlinear dimensionality reduction[J]. Science,2000,290(12):2319-2323.
    [139]Roweis S.T and Saul L.K. Nonlinear dimensionality analysis by locally linear embedding[J].Science,2000,290(12):2323-2326.
    [140]Christoph Bregler and Stephen M.Omobundro. Nonlinear Manifold Learning for Visual Speech Recognition[C]. In Proc. of Fifth International Conference on Computer Vision, Cambridge, MA,1995:494-499.
    [141]Mikhail Belkin and Partha Niyogi. Laplacian Eigenmaps and Spectral Techniques for Embedding and Clustering[J]. Advances in Neural Information Processing Systems, Vancouver, British Columbia, Canada,2001:1-7.
    [142]Xiaofei He, Partha Niyogi. Locality Preserving Projections(LPP)[J]. Advances in neural information processing systems,2003:1-8.
    [143]林宇生.鉴别特征抽取方法及其在人脸识别中的应用研究[D]Ph.D.Thesis,南京理工大学,2008.
    [144]Tianhao Zhang, Kaiqi Huang and Xuelong Li. Discriminative Orthogonal Neighborhood Preserving Projections for Classification[J]. IEEE Trans. on Systems, Man, and Cybernetics, Part B:Cybernetics,2009,40(1):253-263.
    [145]E. Kokiopoulou and Y. Saad. Orthogonal neighborhood preserving projections:A projection-based dimensionality reduction technique[J]. IEEE Trans. on Pattern Anal. Mach. Intell.,2007,29(12):2143-2156.
    [146]J.P.Eatock and J.S.Mason. A quantitative assessment of the relative discriminating properties of phonemes[C]. In Proc. of the IEEE International Conference on Acoustics, Speech and Signal Processing, Adelaide, Australia,1994:133-136.
    [147]黄伟.基于GMM/SVM和多子系统融合的与文本无关的话者识别[D].Ph.D.中国科学技术大学,2004.
    [148]K.Yoshida, K.Takagi, and K.Ozeki. Speaker identification using sub band HMMs[C]. In Proc. of the European Conference on Speech Communication and Technology(EOROSPEECH), Budapest, Hungary,1999:1019-1022.
    [149]L.Besacier and J.F.Bonastre. Subband architecture for automatic speaker recognition[J], Signal Processing,2000,80:1245-1259.
    [150]P.Sivakumaran and A.M.Ariyaeeinia. The use of sub-band cepstrum in speaker verification[C]. In Proc. of the International Conference on Acoustics, Speeeh, and Signal Processing(ICASSP), Istanbul, Turkey,2000,2:1073-1076.
    [151]Yanhua Long, Wu Guo and Bin Ma. Subspace construction and selection for speaker recognition[C]. In Proc. of Information, Communications and Signal Processing(ICICS), Beijing, China,2009:1-4.
    [152]王坚.语音识别中的说话人自适应研究[D]Ph.D.Thesis,北京邮电大学,2007.
    [153]CALINSKI T and HARABASZ J. A Dendrite Method for Cluster Analysis[J]. Communications in Statistics,1974,3(1):1-27.
    [154]M Ramze Rezaee and BPF Lelieveldt. On cluster validity for the fuzzy c-mean model[J]. IEEE Trans, on Fuzzy Systems,1995,3(3):370-379.
    [155]DW Kim and KH Lee. On cluster validity index for estimation of optimal number of fuzzy clusters[J]. Pattern Recognition,2004,37:2009-2024.
    [156]MILLIGAN G W and COOPER M C. An examination of procedures for determining the number of clusters in a data set[J]. Psyehometrik,1985,58(2):159-179.
    [157]Chien-Lin Huang, Bin Ma and Chung-Hsien Wu. Robust speaker verification using short-time frequency with long-time window and fusion of multi-resolutions[J]. In Proc. of 9th Annual Conference of the International Speech Communication Association, Brisbane, Australia, 2008:1897-1900.
    [158]T. Thiruvaran, E. Ambikairajah, and J. Epps. Extraction of FM components from speech signals using an all-pole model [J]. IET Electronics Letters,2008,44(6):449-450.
    [159]孙俊.基于激励源及其韵律特征的源-目标说话人声音转换研究[D]Ph.D.Thesis,中国科学技术大学,2006.
    [160]ZHENG Nengheng. Speaker recognition using complementary information from vocal source and vocal tract[D]. Ph.D.Thesis, The Chinese University of Hong Kong,2005.
    [161]John R. Deller, Jr., John H.L. Hansen and John G. Proakis. Discrete-Time Processing of Speech Signal[M]. New York:IEEE Press,2000.
    [162]Debadatta Pati and S.R.Mahadeva Prasanna. Speaker recognition from excitation source perspective[J]. IETE Technical review,2010,27(2):138-157.
    [163]R.E.Slyh, W.T.Nelson, and E.G.Hansen. Analysis of m rate, shimmer, jitter, and F0 contour features across stress and speaking style in the SUSAS database[J]. In Proc. of IEEE Int. Conf. Acoust., Speech and Signal Processing,1999,4:2091-2094.
    [164]M.Farrus and J.Hemando. Using jitter and shimmer in speaker verification[J]. IET Signal Processing,2009,3(4):247-257.
    [165]冯康,时慧琨.语音信号基音检测的现状及展望[J].微机发展,2004,14(3):95-101.
    [166]D.W.Famsworth. High speed motion pictures of the human vocal cords[J]. Bell Lab Record,1940,18:203-208.
    [167]T.F.Michael, D.Plumpe and D.A.Reynolds. Modelling of glottal flow derivative wave form with application to speaker identification[J]. IEEE Trans. on Speech Audio,1999, 7(5):569-586.
    [168]Wong, D. Y., Markel, J. D. and Gray, A. H.. Least squares glottal inverse filtering from the acoustic speech waveform[J]. IEEE Transaction on Acoustical, Speech, Signal,1979, 27:350-355.
    [169]Michael D. Plumpe, Thomas F. Quatieri and Douglas A. Reynolds. Modeling of the glottal flow derivative waveform with application to speaker identification[J]. IEEE transaction on speech and audio processing,1999,7(5):569-586.
    [170]Elliot Moore and Mark Clements. Algorithm for automatic glottal waveform estimation without the reliance on precise glottal closure information[C]. In proceedings of the international conference on acoustics, speech and signal processing, Montreal, Quebec, Canada,2004,1: 101-104.
    [171]J Makhoul. Linear prediction:a tutorial review[J]. Proceedings of the IEEE.1975, 63(4):61-580.
    [172]Atal, B.S.. Automatic recognition of speakers from their voices[J]. Proceedings of the IEEE,1976,64 (4):460-475.
    [173]Feustel, T.C., Velius, G.A. and Logan, R.J.. Human and machine performance on speaker identity verification[J]. Speech Technology,1989:169-170.
    [174]Murthy, K.S.R., Prasanna, S.R.M. and Yegnanarayana, B.. Speaker-specific information from residual phase[C]. In Proc. of International Conference on Signal Processing and Communications, Bangalore, India,2004:516-519.
    [175]B.Yegnanarayana, K.S.Reddy and S.P.Kishore. Source and system feature for speaker recognition using AANN models[C]. In Proc. of IEEE Int.Con.Acoust.Speech and Signal Processing, Salt Lake City, UT, USA,2001:409-412.
    [176]M.Chetouani, M.Faundez-Zanuy and B.Gas. Investigation on LP-residual representation for speaker identification[J]. Patter Recognition,2009,42:487-494.
    [177]K.S.R. Murty and B.Yegnanarayana. Combing evidence from residual phase and MFCC features for speaker recognition[J]. IEEE Signal Processing Letters,2006,13(1):52-55.
    [178]P.Thevenaz and H.Hugli. Usefulness of the LPC-residue in text-independent speaker verification[J]. Speech Communication,1995,17:145-157.
    [179]S.Hayakawa, K.Takeda and F.Itakura. Speaker identification using harmonic structure of LP-residual spectrum[J]. Biometric personal Authentification, Lecture notes, Springer, Berlin, 1997,1206:253-260.
    [180]S.R.Mahadeva Prasanna, Cheedella S.Gupta and B.Yegnanarayana. Extraction of speaker-specific excitation information from linear prediction residual of speech[J]. Speech Communication,2006,48:1243-1261.
    [181]Parham Aarabi, Guangji Shi and Maryam Modir Shanechi. Phase-based speech processing[M]. Singapore:World Scientific,2006.
    [182]Murty, K.S.R. and Yegnanarayana, B.. Combining evidence from residual phase and MFCC features for speaker recognition[J]. Signal Processing Letters, IEEE,2006,13(1):52-55.
    [183]K.S. Murthy and S.R.Prasanna. Speaker specific information from residual phase[C]. In Proc. of International Conference on Signal Processing Communication, Bangalore, India,2004.
    [184]S.Furui. Cepstral analysis technique for automatic speaker verification[J]. IEEE Transactions on Acoustical, Speech and Signal Processing,1981,29(2):254-272.
    [185]G. Strang and T. Nguyen. Wavelets and Filter Banks[M]. U.K.:Wellesley-Cambridge Press,1996.
    [186]N.Zheng, T.Lee and P.C.Ching. Integration of complimentary acoustic features for speaker recognition[J]. IEEE Signal In Proc.Letter,2007,14(3):181-184.
    [187]D.Talkin. A robust algorithm for pitch tracking(RAPT)[J]. In Speech Coding and Synthesis,1995:495-518.
    [188]Dabbechies. Ten Lectures on Wavelets[M]. U.S.A.:Society of Industrial and Applied Mathmatics,1992.
    [189]王守觉和来疆亮.多维空间仿生信息学入门[M].北京:国防工业出版社,2008.1.
    [190]王改良和武妍.基于仿生模式识别理论的声调识别[J].计算机应用,2010,30(10):2709-2711.
    [191]王宪保,周德龙和王守觉.基于仿生模式识别的构造型神经网络分类方法[J].计算机学报,2007,30(12):2109-2114.
    [192]杨行峻,迟惠生,杨顺安等.语音信号数字处理[M].北京:电子工业出版社,1995.
    [193]王波.电话信道说话人识别技术研究[D]Ph.D.Thesis,信息工程大学,2006.
    [194]王炳锡,屈丹,彭煊.实用语音识别基础[M].北京:国防工业出版社,2005.
    [195]C.A.Shipp and L.I.Kuncheva. Relationships between combination methods and measures of diversity in combining classifiers[J]. Information Fusion,2002,3:135-148.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700