详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
     (5)根据语音韵律特征与情感唤醒度、音质特征与愉悦度之间的相关性,提出了一种基于情感维度的情感建模方法。该方法利用韵律特征和音质特征分别为每种情感构建唤醒度和愉悦度概率模型,然后将每个情感语音样本在12个维度模型上的概率输出作为特征训练情感类别模型。本文利用高斯混合模型(Gaussian Mixture Model,GMM)构建情感维度模型,并提出了一种基于对训练样本进行聚类分析的GMM初始参数估计方法。在最后识别时,选用了支持向量机(Surport Vecter Machine,SVM)来构造六类情感类别识别器。根据该情感维度模型,本文进行了汉语语音情感识别的相关实验,获得了优于情感场方法的识别率。
Speech is one of the most convenient means of communication between people and it is one of the fundamental methods of conveying emotion as well as semantic information. Moreover, emotion plays an important role in communication. So emotion information processing in speech signals has gained increasing attention during the last few years as the need for machines to understand human well in human-machine interaction has grown. Being one of the most branchs of emotion information processing in speech, emotion recognition in speech is the fundemental of the nature human-machine communication. However, the research about the human emotion is still at the exploratory stage. There is still no acknowledged definition of human emotion. And emotion has strong social and culture characteristics. On the other hand, speech signals contain complex information. All of these factors are great challenges for emotion recognition in human speech, which is in its infancy.
     In order to establish a speaker independent speech emotion recognition system without getting any profit from context or linguistic information, this paper focuses on emotional speech corpus establishment, acoustic features extraction of speech, analysis and selection of emotional features, emotion dimension space, emotion modeling and emotion recognition. Based on the analysis of adequate number of emotional speech samples, two methods of emotion modeling are presented in this paper, which provide a theoretical and technical framework for emotion recognition in spoken language. Base on these studies, two emotion recognition algorithms are accomplished and a speaker and content independent Mandarin emotion recognition system is completed.
     The innovative points and main contributions of this paper are as follows:
     (1) An algorithm based on the modified cepstrum is presented for the estimation of the fundamental frequency (F0) of speech signals. Voicing decisions are made using a decision function which is composed of cepstral peak, zero-crossing rate, and energy of short-time segments of speech signals. An accurate voiced/unvoiced classification is obtained based on this decision function. Then a dynamic programming method is used to realize pitch tracking. The consecution of F0 is considered sufficiently in the cost function. The proposed algorithm can avoid the problem of pitch doubling and pitch halving effectively, as well as preserve the legitimate doubling and halving of F0. And the algorithm has some desirable advantages such as high accuracy and smooth F0 contour, which needs no further smoothing.
     (2) This paper analyzes the relationships between emotion states and speech acoustic features, including prosody and voice quality. The shortage of short-time energy on distinguishing emotion states is pointed out in this paper. On the oterh hand, we find that the proportion of energy below 250Hz to the whole is one of the potential choices for emotion recognition in speech. And the characters of the pitch contour and pitch derivative are analyzed for the purpose of emotion recognition. At the same time, the differences of emotional acoustic features between male speech and female speech are found out and a gender distinguish method is developed based on these findings. In this method, the mean, range and variance of F0 are used as features and Fisher linear discriminant function is used to distinguish male speech and female speech. Experimental results show that the proposed method gains a high accuracy.
     (3) A conception of an emotion space model based on the results from psychological research is presented and a perceptual experiment is reported. In the experiment, we have studied how the six basic emotions of Mandarin in the emotion space. Furthermore, we have studied the relationships between the prosodic and quality features and the mean ratings in the two dimensional space of arousal and valence.
     (4) From the point of view for emotion modeling, the paper uses emotion field and emtional potency to describe the emotion space, by introducing the conception of data field and potential function into the emotion modeling. Through this method, any emotion in the emotin space can be seen as the composite of all basic emotions in this research. The contribution of each basic emtion to the emotion is determined by the emotional potency which is formed by the former in the later. The center of each basic emotion is searched by hill climbing algorithm. The emotion recognition algorithm based on this model performs well than the traditional methods.
     (5) A dimension based emtoin model is presented according to the relationships between the acoustic features of speech and emotion dimensions. In this modeling method, prosodic features are used to construct the statistic arousal models and quality features are used to contruct the statistic valence modesl. Then the probability outputs of all these dimension models are considered as the features to establish the emotion category models. GMM is selected to construct the emotion dimension models and a new algorithm for the estimation of the GMM's origin parameters is proposed based on clustering method. SVM is used to establish the emotion catergory models. Experimental results indicate that the emotion recognition algorithm based on this model gains the better performance than the emotion field method.
     The two emotion modeling methods proposed in this paper, which are with scientific foundations and good performances, provide a direction for the future work of emotion recognition in spoken language.
Abelin A and Allwood J (2000). Cross linguistic interpretation of emotional prosody. In: Proc. ISCA Workshop (ITRW) on Speech and Emotion: A conceptual framework for research, Belfast.
    Ahmadi S, Spanias A S (1999). Cepstrum-based pitch detection using a new statistical V/UV classification algorithm. IEEE Trans. on Speech and Audio Processing, 7(3): 333-338.
    Ambrus D C (2000). Collecting and recording of an emotional speech Database. Technical Report, Faculty of Electrical Engineering and Computer Science, Institute of Electronics, University of Maribor.
    Amir N, Ron S, Laor N (2000). Analysis of an emotional speech corpus in Hebrew based on objective criteria. In: Proc. ISCA Workshop on Speech and Emotion, Belfast. 1:29-33.
    Alku P, Vilkman E and Laine U K (1991). Analysis of glottal waveform in different phonation types using the new IAIF-method. In: Proc. 12th International Congress on Phonetic Sciences, Aix-en-Provence. 4:362-365.
    Alku P (1992). Glottal wave analysis with pitch synchronous iterative adaptive inverse filtering. Speech Communication, 11:109-118.
    Alter K, Rank E, Kotz S A, Pfeifer E, Besson M, Friederici A D, Matiasek J (1999). On the relations of semantic and acoustic properties of emotions. In: Proc. the 14th International Conference of Phonetic Sciences (ICPhS-99), San Francisco, California, 2121-2124.
    Alter K, Rank E, and Kotz S A (2000) Accentuation and emotions-two different systems. In: Proc. ISCA Workshop on Speech and Emotion: A conceptual framework for research, Belfast.
    Banse R and Scherer K (1996). Acoustic profiles in vocal emotion expression. J. Personality Social Psych., 70(3):614-636.
    Batliner A, Fischer K, Huber R, Spiker J and Noth E (2000). Desperately seeking emotions: actors, wizards, and human beings. In: Proc. ISCA Workshop on Speech and Emotion.
    Boersma P (1993). Accurate short-term analysis of the fundamental frequency and the harmonics-to-noise ratio of a sampled sound. In: Proc. of the Institute of Phonetic Sciences of the University of Amsterdam, 17: 97-110.
    Burkhardt F and Sendlmeier W F (2000). Verification of acoustical correlates of emotional speech using formant synthesis. In: ISCA Workshop on Speech and Emotion, Newcastle, Northern Ireland, UK.
    Cahn J (1989). Generating Expression in Synthesized Speech. Master's thesis. USA: MIT.
    Chen Y and He T (2005). Affective computing model based on rough sets. In: ACII 2005, LNCS 3784. 606-613.
    Choukri K (2003). European Language Resources Association, (ELRA). Available from: .
    Classen K, Dogil G, Jessen M, Marasek K, Wokurek W (1998). Stimmqualitat und wortbetonung im Deutschen. In: Linguistische Berichte 174. Westdeutscher Verlag. 202-245.
    Cole, R (2005). The CU kids' speech corpus. The Center for Spoken Language Research (CSLR). Available from: .
    Cowie R, Douglas-Cowie E (1996). Automatic statistical analysis of the signal and prosodic signs of emotion in speech. In: Proc. ICSLP 1996. 3:1989-1992.
    Cowie R, Douglas-Cowie E, Tsapatsoulis N, et al. (2001). Emotion recognition in human-computer interaction. IEEE Signal Proc. Mag., 18(1):32-80.
    Cowie R, Cornelius R R (2003). Describing the emotional states that are expressed in speech. Speech Communication, 40:5-32.
    Davidson R J, Abercrombie H, Nitschke J B and Putnam K (1999). Regional brain function, emotion and disorders of emotion. Current Opinion Neurobiology, 9(2): 228-34.
    Daubechies I (1992). Ten lectures on Wavelets. Philadelphia, Pennsylvania: Society for Industrial and Applied Mathematics.
    Davitz J R (1964). Auditory correlates of vocal expression of emotional feeling. In: Davitz J R. The Communication of Emotional Meaning. New York: McGraw-Hill. 101-112.
    de Cheveignre A, Kawahara H (2002). YIN, a fundamental frequency estimator for speech and music. Journal of the Acoustical Society of America, 111(4): 1917-1930.
    Dellaert F, Polzin T, Waibel A (1996). Recognizing emotion in speech. In: Porc. of ICSLP'96, Delaware, USA. 1970-1973.
    Douglas-Cowie E, Cowie R, and Schroeder M (2000). A new emotion database: considerations, sources and scope. In: Proc. ISCA (ITWR) Workshop Speech and Emotion: A conceptual framework for research, Belfast. 39-44.
    Douglas-Cowie E, Campbell N, Cowie R, and Roach P (2003). Emotional speech: towards a new generation of databases. Speech Communication, 40:33-60.
    Duifhuis H, Willems L F, Sluyter R J (1982). Measurement of pitch in speech: an implementation of Goldstein's theory of pitch perception. Journal of the Acoustical Society of America, 71 (6): 1568-1580.
    Edgington M (1997). Investigating the limitations of concatenative synthesis. In: Proc. Eurospeech 1997, Rhodes, Greece. 593-596.
    Engberg I S, Hansen A V (1996). Documentation of the Danish Emotional Speech database (DES). Internal AAU report, Center for Person Kommunikation, Aalborg Univ., Denmark.
    Fant G, Liljencrants J, Lin Q (1985). A four-parameter model of glottal flow. STL-QPSR 4, Speech, Music and Hearing, Royal Institute of Technology, Stockholm, 1-13.
    Fernandez R and Picard R W (2002). Modeling drivers' speech under stress. In: Proc. ISCA Workshop (ITRW) on Speech and Emotion: A conceptual framework for research, Belfast.
    Fischer K (1999). Annotating emotional language data. Tech. Rep. 236, Univ. of Hamburg.
    Frick R W (1985). Communicating emotion: the role of prosodic features. Psychological Bulletin 97, 412-429.
    Fujisaki H and Ljungqvist M (1986). Proposal and evaluation of models for the glottal source wave form. In: Proc. ICASSP-86, Tokyo, Japan. 1605-1608.
    Gershenson C (1999). Modelling emotions with multidimensional logic. In: Proc. 18th International Conference of the North American on Fuzzy Information Processing Society. 42-46.
    Gobl C and Ni Chasaide A (2003). The role of voice quality in communicating emotion, mood and attitude. Speech Communication, 40:189-212.
    Gustafson-Capkova S (2001). Emotions in Speech: Tagset and Acoustic Correlates. Speech Technology, term paper. Autumn.
    Hall M A, Smith L A (1999). Feature selection for machine learning: comparing a correlation-based filter approach to the wrapper. In: Proc. Florida Artificial Intelligence Symposium, FLAIRS-99.
    Hansen J, Bou-Ghazale S (1997). Getting started with SUSAS: a speech under simulated and actual stress database. In: Proc. Eurospeech 1997, Rhodes, Greece. 5:2387-2390.
    Hart J T, Collier R, Cohen A (1990). A Perceptual Study of Intonation: An Experimental-Phonetic Approach to Speech Melody. New York: Cambridge University Press.
    Hermansky H (1990). Perceptual linear predictive (PLP) analysis of speech. Journal of the Acoustical Society of America, 87(4): 1738-1752.
    Hess U and Kirouac C (2000). Emotion expression in groups. In: Lewis M and Haviland-Jones J M, eds. Handbook of Emotions. New York: The Guilford Press. 368-381.
    Hess W (1983). Pitch Determination of Speech Signals: Algorithms and Devices. Berlin: Springer-Verlag.
    Hess W J (1992). Pitch and voicing determination. In: Furui S, Sohndi M M, eds. Advances in Speech Signal Processing. New York: Marcel Dekker. 3-48.
    Heuft B, Portele T and Rauth M (1996). Emotions in timedomain synthesis. In: Proc. ICSLP, Philadelphia, USA. 1974-1977.
    Huang X, Acero A, Hon H-W (2001). Spoken Language Processing: A Guide to Theory, Algorithm, and System Development. Upper Saddle River, New Jersey: Prentice Hall PTR.
    Huber R (1998). Prosodische Linguistische Klassifikation von Emotionen. PhD Thesis. Germany: University of Erlangen-Nuremberg.
    Iida A, Campbell N, Iga S, Higuchi F and Yasumura M (2000). A speech synthesis system with emotion for assisting communication. In: Proc. ISCA Workshop (ITRW) on Speech and Emotion: A conceptual framework for research, Belfast. 167-172.
    Iida A, Campbell N, Higuchi F, Yasumura M (2003). A corpus-based speech synthesis system with emotion. Speech Communication, 40:161-187.
    Iriondo I, Guaus R, Rodriguez A (2000). Validation of anacoustical modeling of emotional expression in Spanish using speech synthesis techniques. In: Proc. ISCA Workshop on Speech and Emotion, Belfast. 1:161-166.
    Kawanami H, Iwami Y, Toda T, Shikano K (2003). GMM-ased voice conversion applied to emotional speech synthesis. In: Proc. Eurospeech 2003.4:2401-2404.
    Kienast M and Sendlmeier W F (2000). Acoustical analysis of spectral and temporal changes in emotional speech. In: Proc. ISCA (ITWR) Workshop Speech and Emotion: A conceptual framework for research, Belfast.
    Kiessling A (1997). Extraktion und Klassifikation prosodischer Merkmale in der automatischen Sprachverarbeitun. Berichte aus der Informatik, Shaker, Aachen.
    Kieβling A, Kompe R, Niemann H, Noth E, Batliner A (1992). DP-based determination of F0 contours from speech signals. In: Proc. IEEE ICASSP-92.2:17-20.
    Kim T-Y (2003). Speech production. Technical report, Intelligent Information & Signal Processing Lab, Korea University.
    Kira K, Rendell L A (1992). The feature selection problem: Traditional methods and a new algorithm. In: Proc. 9th National Conference on Artificial Intelligence. 129-134.
    Kiriyama S, Hirose K, Minematsu N (2002). Prosodic focus control in reply speech generation for a spoken dialogue system of information retrieval. In: Proc. IEEE Workshop on Speech Synthesis.
    Klasmeyer G. (I997). The perceptual importance of selected voice quality parameters. In: Proc. ICASSP-97, Munich, Germany.
    Kleinginna P R, Kleinginna A M (1981). A categorized list of emotion definitions with suggestions for a consensual definition. Motivation and Emotion, 5: 345-379.
    Kobayashi H, Shimamura T (1998). A modified cepstrum method for pitch extraction. In: The 1998 IEEE Asia-Pacific Conference on Circuits and Systems. 299-302.
    Laukkanen A-M, Vilkman E, Alku P, Oksanen H (1996). Physical variation related to stress and emotionally state: a preliminary study. Journal of Phonetics, 24: 313-335.
    Laukkanen A-M, Vilkman E, Alku P, Oksanen H (1997). On the perception of emotions in speech: the role of voice quality. Scandinavian Journal of Logopedics, Phoniatrics and Vocology, 22: 157-168.
    Laver J (1980). The Phonetic Description of Voice Quality. Cambridge: Cambridge University Press.
    Lee C M, Narayanan S S, Pieraccini R (2001). Recognition of negative emotion in the human speech signals. In: Workshop on Auto. Speech Recognition and Understanding.
    Lee C M, Narayanan S S, Pieraccini R (2002a). Classifying emotions in human-machine spoken dialogs. In: IEEE International Conference on Multimedia and Expo. 737-740.
    Lee C M, Narayanan S S, Pieraccini R (2002b). Combining acoustic and language information for emotion recognition. In: Proc. ICSLP 2002, Denver, CO.
    Lee C M (2004). Recognizing emotions from spoken dialogs: a signal processing approach. Ph.D. Thesis. USA: University of Southern California.
    Liberman M (2005). Linguistic Data Consurtium (LDC). Available from: .
    Linnankoski I, Leinonen L, Vihla M, Laakso M, Carlson S (2005). Conveyance of emotional connotations by a single word in English. Speech Communication, 45: 27-39.
    Makarova V and Petrushin V A (2002). RUSLANA: A database of Russian Emotional Utterances. In: Proc. ICSLP 2002, Colorado, USA. 2041-2044.
    McGilloway S, Cowie R, Doulas-Cowie E, Gielen S, Westerdijk M, Stroeve S (2000). Approaching automatic recognition of emotion from voice: a rough benchmark. In: Procings of the ISCA workshop on Speech and Emotion, Belfast. 207-212.
    Mehrabian A and Russel J (1974). An Approach to Environmental Psychology, Cambridge MA: MIT Press. 192-203.
    Mozziconacci S J L and Hermes D J (1997). A study of intonation patterns in speech expressing emotion or attitude: production and perception. IPO Annual Progress Report 32, IPO, Eindhoven, The Netherlands. 154-160.
    Montero J M, Gutierrez-Arriola J, Colas J, Enriquez E, Pardo J M (1999). Analysis and modelling of emotional speech in Spanish. In: Proc. Internat. Conf. on Phonetics and Speech (ICPhS '99), San Francisco. 2: 957-960.
    Mozziconacci S J L and Hermes D J (2000). Expression of emotion and attitude through temporal speech variations. In: Proc. ICSLP 2000, Beijing, China. 2: 373-378.
    Murray I, Arnott J L (1993). Towards the simulation of emotion in synthetic speech: a review of the literature on human vocal emotion. Journal of the Acoustic Society of America, 93(2): 1097-1108.
    Nakatsu R, Solomides A and Tosa N (1999). Emotion recognition and its application to computer agents with spontaneous interactive capabilities. In: Proc. IEEE Int. Conf. Multimedia Computing and Systems, Florence, Italy. 2: 804-808.
    Navas E, Hernaez I, Castelruiz A, Sanchez J and Luengo I (2004). Acoustical analysis of emotional speech in standard Basque for emotions recognition. In: Sanfliu A eds. CIARP 2004, LNCS 3287. 386-393.
    Nello C and John S-T(2004).支持向量机导论.北京:电子工业出版社.
    Nicholson J, Takahashi K, Nakstsu R (1999). Emotion recognition in speech using neural networks. In: Proc. 6th International Conference on Neural Information Processing. 495-501.
    Niimi Y, Kasamatu M L, Nishimoto T and Araki M (2001). Synthesis of emotional speech using prosodically balanced VCV Segments. In: Proc. 4th ISCA tutorial and Workshop on research synthesis, Scotland.
    Nogueiras A, Moreno A, Bonafonte A and Marino J B (2001). Speech emotion recognition using hidden markov models. In: Proc. Eurospeech 2001, Scandinavia.
    Noll A M (1967). Cepstrum pitch determination. Journal of the Acoustical Society of America, 41(2): 293-309.
    Nwe T L, Foo S W, De Silva L C (2001). Speech based emotion classification. In: Proc. IEEE Region 10 International Conference on Electrical and Electronic Technology. 297-301.
    Nwe T L, Foo S W and De Silva L C (2003). Speech emotion recognition using hidden markov models. Speech Communication, 41: 603-623.
    Oatley K and Jenkins J M (1996). Understanding Emotions. Cambridge, MA: Blackwell.
    Ohala J J (1996). Ethological theory and the expression of emotion in the voice. In: International Conference on Spoken Language Processing, Philadelphia, USA.
    Ortony A and Turner T J (1990). What's basic about basic emotionis. Psychological Review, 315-331.
    Osgood C E, Suci J G and Tannenbaum P H (1957). The Measurement of Meaning. Urbana: University of Illinois Press. 31-75.
    Pao T-L, Chert Y-T, Yeh J-H and Lu J-J (2004). Detecting emotions in mandarin speech. In: Proc. ROCLING ⅩⅥ. 365-373.
    Pao T-L, Chen Y-T, Yeh J-H and Liao W-Y (2005), Detecting Emotions in Mandarin Speech. International Journal of Computational Linguistics and Chinese Language Processing. 10(3): 347-362.
    Paeschke A, Sendlmeier W F (2000). Prosodic characteristics of emotional speech: measurements of fundamental frequency movements. In: Proc. ISCA-Workshop on Speech and Emotion.
    Pereira C (2000). Dimensions of emotional meaning in speech. In: Proc. ISCA Workshop on Speech and Emotion: A conceptual framework for research, Belfast. 25-28.
    Petrushin V A (1999). Emotion in speech recognition and application to call centers. In: Proc. ANNIE 1999.7-10.
    Petrushin V A (2000). Emotion recognition in speech signal: experimental study, development and application. In: ICSLP 2000, Beijing, China.
    Picard R W (1997). Affective Computing. Cambridge, MA: MIT Press.
    Picar R W (2000). Toward computers that recognize and respond to user emotion. IBM Technical Journal, 38(2): 7.5-719.
    Pinto N B (1990). Unification of perturbation measures in speech signals. Journal of the Acoustical Society of America, 87(3): 1278-1289.
    Pittam J, Scherer K R (1993). Vocal expression and communication of emotion. In: Lewis M, Haviland J Meds. Handbook of emotions. New York: Guilford Press. 185-198.
    Plumpe M D, Quatieri T F and Reynolds D A (1999). Modeling of the glottal flow derivative waveform with application to speaker identification. IEEE Trans. Speech and Audio Processing. 7(5): 569-586.
    Polzin Y S and Waibel A H (1998). Detecting emotions in speech. In: Proc. CMC 1998.
    Quast H, Schreiner O, Schroeder M R (2002). Robust pitch tracking in the car environmen. In: Proc. IEEE ICASSP-02. 1: 353-356.
    Rabiner L R (1977). On the use of autocorrelation analysis for pitch detection. IEEE Trans. on Acoustics, Speech and Signal Processing, 25(1): 24-33.
    Rabiner L R and Schafer R W (1981). 语音信号数字处理.北京:科学出版社.
    Reeves B and Nass C (1996). Media Equation. Center for the Study of Language and Information. Stanford University.
    Riegelsberge E L and Krishnamurthy A K (1993). Glottal source estimation: methos of applying the LF-model to inverse filtering. In: Proc. ICASSP-93. Minneapolis, MN, USA.
    Santos R (2002). Emotional Speech Recognition. Master Thesis. Sony International Europe GmbH.
    Scherer K R (1974). Acoustic concomitants of emotional dimensions: judging affect from synthesised tone equence. In: Weitz E S eds. Non verbal communication: Readings with commentary. New York: Oxford University Press. 105-11.
    Scherer K R (1981). Speech and emotional states. In: Darby J eds. The Evaluation of Speech in Psychiatry and Medicine. New York: Grune and Stratton. 189-220.
    Scherer K R (1986). Vocal affect expression: A review and a model for future research. Psychological Bulletin 99, 143-165.
    Scherer K R (1989). Vocal measurement of emotion. In: Plutchik R, Kellerman H eds. Emotion: Theory, Research, and Experience. San Diego: Academic Press. 4:233-259.
    Scherer K R (2000a). A cross-cultural investigation of emotion inferences from voice and speech: Implications for speech technology. In: Proc. ICSLP 2000, Beijing, China.
    Scherer K R (2000b). Emotion effects on voice and speech: paradigms and approaches to evaluation. In: Proc. ISCA Workshop on Speech and Emotion, Belfast, invited paper.
    Scherer K R and Banziger T (2004). Emotional expression in prosody: a review and an agenda for future research. In: Speech Prosody, Nara, Japan. 359-366.
    Schiel F, Steininger Silke, Turk Ulrich (2002). The Smartkom Multimodal Corpus at BAS. In: Proc. Language Resources and Evaluation, Canary Islands, Spain.
    Schroder M (2000). Experimental study of affect bursts. In: Proc. ISCA Workshop (ITRW) Speech and Emotion: A conceptual framework for research, Belfast. 132-137.
    Schroder M and Grice M (2003). Expressing vocal effort in concatenative synthesis. In: Proc. 15th Int. Conf. Phonetic Sciences, Barcelona, Spain.
    Schuller B, Rigoll G and Lang M (2003). Hidden markov model-based speech emotion recognition. In: Proc. ICASSP-03. Ⅱ:1-4.
    Stibbard R M (2001). Vocal Expression of Emotions in Non-laboratory Speech: An Investigation of the Reading/Leeds Emotion in Speech Project Annotation Data. Ph.D. Thesis. UK: University of Reading.
    Strik H, Cranen B and Boves L (1993). Fitting a LF-model to inverse filter signals. In: Proc. EUROSPEECH 1993, Berlin. 1: 103-106.
    Tao J (2003). Emotion control of Chinese speech synthesis in natural environment. In: EUROSPEECH 2003.2349-2352.
    Tao J and Kang Y (2005). Features importance analysis for emotional speech classification. In: ACII 2005, LNCS 3784. 449-457.
    Tao J, Wang J, Kang Y (2005). An expressive Mandarin speech corpus. In: Proc. The International Committee for the Co-ordination and Standardization of Speech Databases and Assessment Techniques, Bali Island, Indonesia.
    Tao J, Kang Y and Li A (2006). Prosody conversion from neutral speech to emotional speech. IEEE Trans. Audio, Speech, and Language Processing, 14(4): 1145-1154.
    Tato R, Santos R, Kompe R, Pardo J M (2002). Emotion space improves emotion recognition. In: Proc. ICSLP 2002, Denver, Colorado. 3:2029-2032.
    Tchong C, Toen J, Kacic Z, Moreno A and Nogueiras A (2000). Emotional speech synthesis database recordings. Tech. Rep. IST-1999-No 10036-D2, INTERFACEProject.
    Tickle A (2000). English and Japanese speakers'emotion vocalisation and recognition: a comparison highlighting vowel quality. In: ICSA Workshop on Speech and Emotion, Northern Ireland. 104-109.
    Trask R L (1996). A Dictionary of phonetics and Phonology. London: Routledge.
    van Bezooijen R (1984). Characteristics and Recognizability of Vocal Expressions of Emotion. Foris Publications, Dordrecht.
    van Kesteren A-J, op den Akker R, Poel M, Nijholt A (2002). Simulation of emotions of agents in virtual environments using neural networks. In: Learning to Behave: Internalising Knowledge. Porc. Twente Workshops on Language Technology 18.
    Ververidis D and Kotropoulos C (2003). A state of the art review on emotional speech databases. In: Proc. 1st Richmedia Conference, Lausanne, Switzerland. 109-119.
    Ververidis D and Kotropoulos C (2006). Emotional speech recognition: Resources, features, and methods. Speech Communication, 48:1162-1181.
    Wang Z, Zhao L, Zou C (2003). Support vector machines for emotion recognition in Chinese speech. Journal of Southeast University (English Edition), 19(4):307-310.
    Wendt B and Scheich H (2002). The Magdeburger Prosodie-Korpus. In: Proc. Speech Prosody Conf. 2002, Aix-en-Provence, France. 699-701.
    Williams C E, Stevens K N (1972). Emotions and speech: some acoustical correlates. Journal of the Acoustical Society of America, 52:1238-1250.
    Xie B, Chen L, Chen G-C and Chen C (2005). Statistical feature selection for mandarin speech emotioin recognition. In: ICIC 2005, LNCS 3644. 591-600.
    谢波,韦璇,陈根才,陈纯(2003).普通话情感语音数据库及其韵律特征的统计分析.见: 第一届中国情感计算及智能交互学术会议论文集,北京,中国.221-225.
    Yang L (2001). Linking form to meaning: the expression and recognition of emotions through prosody. In: Proceedings on fourth ISCA Workshop on Speech Synthesis, 2001.
    You M, Chen C, Bu J, Liu J, Tao J (2006). A hierarchical framework for speech emotion recognition. In: IEEE ISIE-06, Montreal, Quebec, Canada. 515-519.
    Yu F, Chang E, Xu Y-Q, Shum H-Y (2001). Emotion detection from speech to enrich multimedia content. In: Proc. 2na IEEE Pacific-Rim Conference on Multimedia.
    Yuan, J (2002). The acoustic realization of anger, fear, joy and sadness in Chinese. In: Proc. ICSLP 2002.3:2025-2028.
    Ziolko B, Manandhar S and Wilson R C(2006). Phoneme segmentation of speech. In: Proc. 18th ICPR.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700