基于音节网格的汉语语音文档检索方法研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
随着计算机技术和多媒体技术的发展,被人们记录并保存在计算机中的语音数据越来越多。为了更高效地访问、管理和利用这些语音资源,必须实现基于语义内容的语音文档检索技术。所谓语音文档检索是指,根据用户输入的查询请求,在语音资源中搜索和返回与之相关联的语音段或语音文档的处理过程。语音文档检索技术与语音识别技术紧密相关,它总是利用语音识别技术为资源库建立语义层级的索引。然而,语音识别结果中普遍存在的较高的错误率和对词表外词的误识等问题也直接困扰着检索性能,研究者不得不求助于子词网格(Lattice)形式的语音识别结果,通过子词规避词表外词问题,通过Lattice这种多候选形式向检索者提供更准确的索引内容。在汉语语音文档检索研究中,采用基于音节Lattice的检索技术,业已成为了研究者们的共识。
     语音文档检索是一个未成熟的且极具潜力的研究领域,还存在很多问题需要解决。其中的一个核心问题就在于,Lattice并不是一个易于索引的数据形式,它的有向图结构,以及正确信息与错误信息相混杂的特点,不但直接导致了传统的检索方法性能不佳,而且也需要较大的存储开销和搜索时间。因而,研究适合音节Lattice特点的,且能够同时兼顾检索精度、索引尺寸、检索速度三方面性能指标要求的汉语语音文档检索方法,就有着非常重要的理论意义和实用价值。
     本文针对音节Lattice的特点,首先研究了三种实现机理不同、性能各有侧重的汉语语音文档检索方法,然后针对Lattice识别结果的错误率下界制约检索精度进一步提高的问题,研究了两种能够改善Lattice错误率下界的有效方法。论文的具体研究内容如下:
     1)提出了依赖词检出实现的语音文档检索方法,直接保存音节Lattice作为索引,并采用词检出技术来实现检索任务。提出了置信测度和发生频次相结合的相关度计算方法,提出了将传统的词检出技术拆分为离线和在线两个阶段的分解方案,从而提高了在线阶段的检索速度。该方法取得较好的检索精度,其值相当接近于在Lattice的最优候选上所得到的检索精度,但由于必须存储和搜索Lattice索引,因而索引尺寸和检索速度指标都还需要进一步的提升。针对Lattice索引尺寸较大,冗余较多的现象,提出了基于音节后验概率直方图的Lattice有效成分分析方法,研究了保留有效成分去除冗余成分的索引去冗余方法。实验结果表明,该方法能够以检索精度小幅度的下降为代价,大规模的去除索引中的冗余信息。
     2)提出了基于音节倒排索引的语音文档检索方法,利用倒排索引形式的特点,在保留音节Lattice主要内容的前提下,有效缩减索引尺寸。研究了通过放松匹配过程中的路径约束条件来提高检索精度的匹配机制,提出了两种有效的匹配机制:时间匹配机制和位置匹配机制。在采用位置匹配机制的检索方法中,将音节Lattice解释为具有特定位置标号的若干竞争集的级联,给出了相应的搜索匹配方法,以及匹配路径处于特定位置的后验概率值的计算方法。研究了根据音节候选在其竞争集中的名次来修正文档相关度的加权方法。实验结果表明,两种匹配机制都使检索精度有小幅度的提升,其中位置匹配机制提升更明显,且名次加权方法又进一步提高了该检索精度。提出了能够灵活控制检索速度的基于后验概率门限的剪枝方法。
     3)提出基于邻接音节后验概率矩阵的语音文档检索方法,旨在通过建立文档层级的索引,大规模地提升索引尺寸和检索速度指标,为实现面向大规模语音资源库的检索系统创造条件。提出了K步邻接音节对的概念,以刻画索引中音节间长距离的关联性,利用Lattice的邻接后验概率矩阵来表示Lattice的内容,进而综合各Lattice的邻接矩阵,计算邻接音节对在语音文档中的后验概率值,存储语音文档的邻接音节后验概率矩阵作为文档级索引。实验结果表明,虽然检索精度有5%左右的下降,但索引尺寸和检索速度指标都基本达到了文本检索技术的水平。研究了利用语音中韵律信息来修正文档相关度的方法,初步尝试了三种韵律加权方法。其中能量加权方法最有效,检索精度提升了约2.7%。
     4)分析了制约检索精度的根本原因。提出了两种基于更低Lattice错误率下界的检索精度提高方法:一种是基于扩充Lattice的方法,另一种是基于词片语言模型的方法。前者在语音识别技术的框架之外,通过建立识别结果和识别错误之间关联关系的统计模型,并基于Dempster-Shafe证据理论,估计特定音节被识别器遗漏的概率,研究了扩充Lattice的生成方法。实验结果表明,扩充Lattice相比于原始Lattice,错误率下界下降了1.7%,检索精度提高了约4%。后者在语音识别框架内部,通过引入词片基元来改善语音识别结果的准确性,讨论了词片的概念,研究了基于最大互信息准则的词片自动选择算法,通过实验证明了引入词片有助于改善语音识别系统的识别率和检索系统的检索精度。
With the development of information technology and multimedia technology, more and more speech data are avaliable worldwide via the internet. For the rapidly growing need to efficiently organize and analyze those data, context based spoken document retrieval technology is a key issue. The task of spoken document retrieval (SDR) can be described like this: according to the queries given by a user, all the files or pieces including relevant speech contexts are found and listed from a large collection of multimedia documents. In spoken document retrieval, speech recognition is always adopted to index documents, however, its high error rate and missing out of vocabulary (OOV) words in recognition results also limit retrieval performance. Thus, subword lattice based retrieval methods are investigated to avoid the problem of OOV words and compensate retrieval performance loss resulted by recognition error. For Chinese, syllable lattice based retrieval technology is widely used by researchers.
     A key problem of syllable lattice based approach is that lattice is difficult to be indexed. Its directed graph structure and mixed contents consist of correct candidates and wrong candidates, not only result in very low retrieval accuary for traditional retrieval methods, but also need much more index space and searching time. Thus, the retrieval methods, which are suitable for syllable lattice and have balance performance for retrieval accuracy, indices size and retrieval speed, will be valuable and important research work.
     Three Chinese spoken document retrieval methods with different indexing and searching technology are firstly proposed in this thesis, in order to develop different performance bias. Then considering that retrieval performance is also restricted by error rate low-bound of lattice, two accuracy improvement methods based on lower error rate bound are studied. Concretely speaking, this thesis is arranged as follows:
     1)Word Spotting based retrieval method is proposed, in which syllable lattice is directly stored as indices, word spotting algorithm is separated to an online part and an offline part to implement retrieval tasks, and word frequence and word confidence score are combined in similarity measure. Though higher accuracy is acquired, which is even closed to the retrieval accuracy on the best alternatives of lattice, but indices size and retrieval speed are not good enough to afford the retrieval tasks of large collection. A removing redundancy method is also proposed, which can distinguish useful information from redundant information by a syllable posterior probability histogram and then remove redundancy from lattice indices. Experiment shows that smaller indices size and faster searching time are acquired by using the removing redundancy method.
     2)Syllable inverted index based retrieval methods are proposed, in which indices size can be effciently reduced. In order to improve accuracy, two matching methods that can relax path limitation in searching stage are investigated: time based matching method and position based matching method. In position based matching method, syllable lattice is explained as a sequence of some competition sets and then position specific posterior probability is calculated for all candidates. According to rank lists in the competition sets, a similarity weighting method is studied. Experiment shows that two matching methods both improve accuracy a little, in which position based matching method is better and rank weighting can improve accuracy more. A posterior probability based prunning method is also present to speed the retrieval process.
     3)In order to build indices in document level , a neighbor syllable posterior probability matrix based retrieval method is proposed, which can improve index size and retrieval speed substantially so as to meet the need of the SDR tasks with large-scale corpus. K step neighbor syllable pairs is introduced to represent long distance correlation and neighbor posterior probability matrix is adopted to represent the contents of lattices. Posterior probability of neighbor syllable pairs in documents is calculated and a neighbor syllable posterior probability matrix built in document level is taken as document index. Experiment shows that though accuracy fall 5%, its peformance of index size and retrieval speed is comparable to text retrieval approach. Prosody is adopted to weight similarity measure. Three prosodic weighting methods are investigate, in which energe based weighting method get the best result, 2.7% of accuracy is improved.
     4)The limitation of accuracy improvement is explored and two accuracy improvement methods based on lower lattice error rate bound are proposed, one is based on extended lattice, the other is based on word fragment language model. Extended lattice based approach improve lattice error rate by estimating the probability of the syllables lost by recognizer, by which lattice error rate falls 1.7% and 4% of accuracy is improved. Word fragment based approach improve lattice error rate by introducing higer semantic level unit to speech recognizer.
引文
1 R. Baeza-Yates, B. Ribeiro-Neto. Modern Information Retrieval. Addison Wesley, 1999
    2 S.W. Smoliar, J.D. Baker, T. Nakayama, and L. Wilcox. Multimedia Search: An Authoring Perspective. Proceedings of the First International Workshop on Image Databases and Multimedia Search, IAPR, August 1996:1~8
    3 A. Hauptmann, H. Wactlar. Indexing and Search of Multimodal Information. Proceedings of IEEE International Conference on Acoustics Speech and Signal Processing, Munich, Germany, 1997:195~198
    4 M. Brown, J. Foote, G. Jones, K. Jones, and S. Young. Open-vocabulary Speech Indexing for Voice and Videomail Retrieval. Proceedings of ACM Multimedia, Hong-Kong, November 1996: 307~316
    5 D.A. James. The Application of Classical Information Retrieval Techniques to Spoken Documents. PhD thesis, Downing College, UK, 1995
    6 J. Garofolo, G. Auzanne, and E. Voorhees. The TREC Spoken Document Retrieval Track: A Success Story. Proceedings of the Recherche d'Informations Assiste par Ordinateur: ContentBased Multimedia Information Access Conference, April 2000
    7 J. Hirschberg, M. Bacchiani, D. Hindle, P. Isenhour, A. Rosenberg, L. Stark, L. Stead, S. Whittaker and G.Zamchick. SCANMail: Browsing and Searching Speech Data by Content. Proceedings of Eurospeech, Aalborg, 2001:1299~1302
    8 J.M.V. Thong, P.J. Moreno, B. Logan, B. Fidler, K. Maffey, and M. Moores. Speechbot: An Experimental Speech-based Search Engine for Multimedia Content in the Web. IEEE Transactions.on Mutimedias, 2002,3(4):88~96.
    9 J. Makhoul, F. Kubala, T. Leek, D. Liu, L. Nguyen, R. Schwartz and A. Srivastava. Speech and Language Technologies for Audio Indexing and Retrieval. Proceedings of the IEEE, August 2000, 88(8): 1338~1353
    10 A.G. Hauptmann and M.J. Witbrock. Informedia:News-on-demand Multimedia Information Acquisition and Retrieval. Intelligent Multimedia Information Retrieval, 1997:213~239
    11 S. Renals, D. Abberley, D. Kirby, and T. Robinson. Indexing and Retrieval ofBroadcast News. Speech Communication, 2000, 32:5~20
    12 P.D. Wellner, M. Flynn and M. Guillemot. Browsing Recordings of Multi-party Interactions in Ambient Intelligence Environments. Proceedings of CHI Workshop Lost in Ambient Intelligence, Vienna, Austria, 2004
    13 P.C. Woodland, S.E. Johnson, P. Jourlin, and K. Sp¨arck Jones. Effects of Out of Vocabulary Words in Spoken Document Retrieval. Proceedings of the 18th Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval, 2000:372~374
    14 X. Huang, A. Acero, and H.W. Hon. Spoken Language Processing: A Guide to Theory, Algorithm and System Development. Prentice Hall PTR, 2001
    15 J.Mamou, D. Carmel, R. Hoory. Spoken Document Retrieval from Call-Center Conversations. Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Seattle, Washington, USA, 2006:51~58
    16 B. Chen, H.M. Wang, L-S. Lee.Discriminating Capabilities of Syllable-Based Features and Approaches of Utilizing Them for Voice Retrieval of Speech Information in Mandarin Chinese. IEEE Transactions on Speech and Audio Processing, 2002, 10(5):303~314
    17 M. Saraclar, R. Sproat. Lattice-based Search for Spoken Utterance Retrieval. Proceedings of HLT-NAACL, Boston, Massachusetts, USA, 2004: 129~136
    18 L. Mangu, E. Brill, and A. Stolcke. Finding Consensus in Speech Recognition: Word Error Minimization and Other Applications of Confusion Networks. Computer Speech and Language, 2000, 14(4):373~400
    19 D. Hakkani-Tur, G. Riccardi. A General Algorithm for Word Graph Matrix Decomposition. Proceedings of the IEEE Internation Conference on Acoustics,Speech and Signal Processing, Hong-Kong, 2003:596~599
    20 C. Chelba, A. Acero. Indexing Uncertainty for Spoken Document Search. Proceedings of Interspeech 2005, Lisbon, Portugal, 2005:61~64
    21 C. Chelba, J. Silva, A. Acero. Soft Indexing of Speech Content for Search in Spoken Documents. Computer Speech and Language, 2007, 21(3):458~478
    22 B. Logan, P. Moreno, J.M.V. Thong, and E. Whittaker. An Experimental Study of An Audio Indexing System for the Web. Proceedings of International Conference on Spoken Language Processing, 1996
    23 B. Logan, P. Moreno, and J.M.V. Thong. Approaches to Reduce the Effects of OOV Queries on Indexed Spoken Audio. IEEE Transactions on Multimedia, 2005, 7(5):899~906
    24 B. Logan, P. Moreno, and O. Deshmukh. Word and Subword Indexing Approaches for Reducing the Effects of OOV Queries on Spoken Audio. Proceedings of HLT, 2002
    25 P. Schauble and M.Wechsler. First Experiences with a System for Content Based Retrieval of Information from Speech Recordings. Proceedings of IJCAI Workshop: Intelligent Multimedia Information Retrieval, August 1995: 59~69
    26 M. Wechsler, P. Schauble. Speech Retrieval Based on Automatic Indexing. Proceedings of the Final Workshop on Multimedia Information Retrieval, University of Glasgow, September 1995
    27 K. Ng, V.W. Zue. Subword-based Approaches for Spoken Document Retrieval. Speech Communication, 2000,32:157~186
    28 S.E. Johnson, P. Jourlin, G.L. Moore, K.S Jones, and P.C. Woodland. The Cambridge University Spoken Document Retrieval System. Proceedings of the IEEE Internation Conference on Acoustics, Speech and Signal Processing, 1999:49~52
    29 K.M. Knill, S.J. Young. Low-cost Implementation of Open Set Keyword Spotting. Computer Speech and Language, 1999, 13:243~266
    30 C.H. Wu, Y.J Chen. Multi-keyword Spotting of Telephone Speech Using A Fuzzy Search Algorithm and Keyword-driven Two-level CBSM. Speech Communication, 2001, 33:197~212,
    31 C. Ng, R. Wilkinson, J. Zobel. Experiments in Spoken Document Retrieval Using Phoneme N-grams. Speech communication 2000, 32:61~77
    32 A. Ljolje, F. Pereira, M. Riley. Efficient General Lattice Generation and Rescoring Proceedings of the 6th European Conference on Speech Communication and Technology, 1999,3: 1251~1254
    33 S. Ortmanns, H. Ney.. A Word Graph Algorithm for Large Vocabulary Continuous Speech Recognition. Computer Speech and Language, 1997, 11:43~72
    34 D.A. James, S.J. Young. A Fast Lattice-based Approach to Vocabulary-independent Word Spotting. Proceedings of the IEEE Internation Conference onAcoustics, Speech and Signal Processing, 1994, 1:377~380
    35 D. A. James. A System for Unrestricted Topic Retrieval from Radio News Broadcasts. Proceedings of the IEEE Internation Conference on Acoustics, Speech and Signal Processing, Atlanta, GA, USA, May 1996:279~282
    36 J.T. Foote, S.J. Young, G. J. F. Jones and K. Sparck Jones. Unconstrained Keyword Spotting using Phone Lattices with Application to Spoken Document Retrieval, Computer Speech and Language, 1997, 2:207~224
    37 M. Clements, P. S. Cardillo, and M. S. Miller. Phonetic Searching vs. LVCSR: How to Find What You Really Want in Audio Archives. International Journal of Speech Technology, January 2002, 5(1):9~22
    38 F. Seide, Peng Yu, Chengyuan Ma, E. Chang. Vocabulary-Independent Search in Spontaneous Speech. Proceedings of the IEEE Internation Conference on Acoustics, Speech and Signal Processing, 2004, 1:253~256
    39 Peng Yu and F. Seide. A Hybrid Word / Phoneme-based Approach for Improved Vocabulary-Independent Search in Spontaneous Speech. Proceedings of INTERSPEECH-2004, 2004: 293~296
    40 J.S. Olsson, J. Wintrode, M. Lee. Fast Unconstrained Audio Search in Numerous Human Languages. Proceedings of the IEEE Internation Conference on Acoustics, Speech and Signal Processing, 2007, 4:77~80
    41 B.R. Bai, B. Chen and H.M. Wang. Syllable-based Chinese Text/Spoken Document Retrieval. International Journal of Pattern Recognition and Artificial Intelligence,2000, 14(5):603~616
    42 H.M.Wang. Experiments in Syllable-Based Retrieval of Broadcast News Speech in Mandarin Chinese.Speech. Communication, 2000, 32:49~60
    43 J. Mamou, B. Ramabhadran, O. Siohan. Vocabulary Independent Spoken Term Detection. Proceedings of SIGIR2007, 2007:615~622
    44 T. Hori, I.L.Hetherington, T.J. Hazen, J.R. Glass. Open Vocabulary Spoken Utterance Retrieval Using Confusion Networks. Proceedings of the IEEE Internation Conference on Acoustics, Speech and Signal Processing, 2007, 4:73~76
    45 V.T. Turunen, M. Kurimo. Indexing Confusion Networks for Morph-based Spoken Document Retrieval. Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 2007:631~638
    46 G. Salton, C. Buckley. Term-Weighting Approaches in Automatic Text Retrieval. International Journal of Information Processing and Management, 1998, 24(5):513~523
    47 J.T. Foote, G.J.F. Jones, K.S. Jones, and S.J. Young. Talker-Independent Keyword Spotting for Information Retrieval. Proceedings of International Conference on Speech Communication Technology, 1995:2145~2148
    48 Y.C. Hsieh, Y.T. Huang, C.C. Wang, L.S. Lee. Improved Spoken Document Retrieval With Dynamic Key Term Lexicon and Probabilistic Latent Semantic Analysis (PLSA). Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, 2006:961~964
    49 Y. Gong, X. Liu. Generic Text Summarization Using Relevance Measure and Latent Semantic Analysis. Proceeding of ACM SIGIR Conference on R&D in Information Retrieval, 2001:19~25
    50 M.W. Koo, C.H. Lee, B.H. Juang. Speech Recognition and Utterance Verification Based on A Generalized Confidence Score. IEEE Transactions on Speech and Audio Processing, 2001, 9(8):821~832
    51 J. Hui, C.H. Lee. A New Approach to Utterance Verification Based on Neighborhood Information in Model Space. IEEE Transactions on Speech and Audio Processing, 2003: 11(5):425~434
    52 R.A. Sukkar, C.H. Lee. Vocabulary Independent Discriminative Utterance Verification for Non Keyword Rejection in Subword Based Speech Recognition. IEEE Transactions on Speech and Audio Processing, 1996, 4(6):420~429
    53 F. Wessel, R. Schluter, K. Macherey, H. Ney. Confidence Measures for Large Vocabulary Continuous Speech Recognition. IEEE Transactions on Speech and Audio Processing, 2001, 9(3):288~298
    54 F.K. Soong, W.K. Lo, and S. Nakamura. Generalized Word Posterior Probability (GWPP) for Measuring Reliability of Recognized Words. Proceeding of SWIM2004, 2004:127~128
    55 T. Kemp, T. Schaaf. Estimating Confidence Using Word Lattices. Proceeding of EUROSPEECH 97, Rhodos, Greece, September 1997:827~830
    56郝杰,李星.汉语连续语音识别中关键词可信度的贝叶斯估计.声学学报, 2002, 27(5):393~397
    57 H.M. Wang. Mandarin Spoken Document Retrieval Based on Syllable Lattice Matching. Pattern Recognition Letters, 2000, 21:615~624
    58 P. Liu, F.K. Soong, J.L. Thou. Divergence-Based Similarity Measure for Spoken Document Retrieval. Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, 2007, 4:89~92
    59 0. Siohan and M. Bacchiani. Fast Vocabulary-Independent Audio Search Using Path-based Graph Indexing. Proceedings of INTERSPEECH, 2005:53~56
    60 B. Chen. Exploring the Use of Latent Topical Information for Statistical Chinese Spoken Document Retrieval. Pattern Recognition Letters, January 2006, 27(1):9~18
    61 B. Chen, H.M. Wang, L.S. Lee.Adiscriminative HMM/N-gram Based Retrieval Approach for Mandarin Spoken Documents. ACM Transactions on Asian Language Information Processing, 2004, 3(2): 128~145
    62 T. Mills, K. Moody, and K. Rodden. Corba: A New Approach to IR System Design. Proceedings of RIAO’97, 1997:131~136
    63 W B. Croft,. J. Lafferty. Language Modeling for Information Retrieval. Kluwer, Amsterdam. 2003
    64 W.K. Lo, H. Meng, P.C. Ching. Cross-Language Spoken Document Retrieval Using HMM-Based Retrieval Model with Multi-Scale Fusion. ACM Transactions on Asian Language Information Processing, 2003, 2(1):1~26
    65 B. Chen, Y.T. Chen. Extractive Spoken Document Summarization for Information Retrieval. Pattern Recognition Letters, March 2008, 29(4): 426~437
    66 Y. Gong, X. Liu. Generic Text Summarization Using Relevance Measure and Latent Semantic Analysis. Proceedings of ACM SIGIR Conference on R&D in Information Retrieval, 2001:19~25
    67 K. Ng. Information Fusion for Spoken Document Rerrieval. Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, 2000:2405~2408
    68 E. Chang,Y. Shi, J. Zhou, and C. Huang. Speech Lab in a Box: A Mandarin Speech Toolbox to Jumpstart Speech Related Research, Proceedings of Eurospeech 2001, Aalborg, Denmark, 2001, 3:2779~2782
    69 S.W.Shu, Common Knowledge in Putonghua Phonetics. Joint Publishing, Hong Kong, 1998
    70 E. Chang, J.L. Zhou, S. Di, C. Huang, and K.F. Lee. Large Vocabulary Mandarin Speech Recognition with Different Approaches in Modeling Tones. Proceedings of the 6th International Conference on Spoken Language Processing, Beijing, China, 2000:983~986
    71 C. Huang, Y. Shi, J.L. Zhou, M. Chu, T. Wang and E. Chang. Segmental Tonal Modeling for Phone Set Design in Mandarin LVCSR. Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, 2004:901~904
    72 B.R. Bai, B. Chen, H.M. Wang, L.F. Chien, and L.S. Lee. Large Vocabulary Chinese Text/Speech Information Retrieval using Mandarin Speech Queries. Proceedings of the First Intermational Symposium on Chinese Spoken Language Processing, Singapore, 1998:284~289
    73 B.Chen, H.M. Wang, and L.S. Lee Retrieval of Broadcast News Speech in Mandarin Chinese Collected in Taiwan using Syllable-level Statistical Characteristics. Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, 2000:1771~1774
    74 B. Chen, H.M. Wang, and L.S. Lee. Retrieval of Mandarin Broadcast News using Spoken Queries. Proceedings of the Sixth International Conference on Spoken Language Processing, Beijing, China, 2000, 1:520~523
    75 B. Chen, H.M. Wang, and L.S. Lee. An HMM/N-gram-based Linguistic Processing Approach for Mandarin Spoken Document Retrieval. Proceedings of the 7th European Conference on Speech Communication and Technology, Alborg, Denmark, 2001, 2:1045~1048
    76 Y.C. Li, W.K. Lo, H.M. Meng, and P.C. Ching. Query Expansion using Phonetic Confusions for Chinese Spoken Document Retrieval. Proceedings of the Fifth International Workshop on Infromation Retrieval with Aisan Languages, Hong Kong, China, 2000:89~93
    77 H.M. Meng, W.K. Lo, Y.C. Li and P.C. Ching. Multi-scale Audio Indexing for Chinese Spoken Document Retrieval. Proceedings of the Sixth International Conference on Spoken Language Processing, Beijing, China, 2000, 4:101~104
    78 H.M. Meng, P.Y. Hui. Spoken Document Retrieval for the Languages In Hong Kong. Proceedings of the International Symposium on Intelligent Multimedia, Video and Speech Processing, 2001:201~204
    79 P.Y. Hui, W.K. Lo, H.M. Meng. Two Robust Methods for Cantonese Spoken Document Retrieval. Proceedings of the ISCA Workshop on Multilingual Spoken Document Retrieval (MSDR), Hong Kong SAR, China, April 2003:7~12
    80 H.M. Meng, B. Chen, S. Khudanpur, G.A. Levow, W.K. Lo, D. Oard, P. Schone, K. Tang, H.M. Wang, and J.Q. Wang. Mandarin-English Information (MEI): Investigating Translingual Speech Retrieval. Proceedings of the 2001 Human Language Technology Conference, 2001.
    81 H.M. Wang, H.M. Meng, P. Schone, B. Chen, and W.K. Lo. Multi-scale Audio Indexing for Translingual Spoken Document Retrieval. Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, 2001, 1:605~608
    82 W.K. Lo, P. Schone, and H.M. Meng, Multi-scale Retrieval in MEI: an English-Chinese Translingual Speech Retrieval System. Proceedings of the Seventh European Conference on Speech Communication and Technology, 2001, 2:1303~1306
    83 B. Chen. Voice Retrieval of Mandarin Broadcast News Speech. International Journal of Pattern Recognition and Artificial Intelligent, February 2006, 20(1):91~109
    84 Z.Y. Zhou, P. Yu, C. Chelba, and F. Seide. Towards Spoken- Document Retrieval for the Internet: Lattice Indexing For Large-Scale Web Search Architectures. Proceedings of HLT-NAACL, New York City, USA, June 2006:415~422
    85 W.K. Lo, H.M. Meng, and P.C. Ching. Multi-Scale Spoken Document Retrieval for Cantonese Broadcast News: Special Double Issue on Chinese Spoken Language Technology. International Journal of Speech Technology, April 2004, 7: 203~219
    86 S.B. Davis, P. Mermelstein. Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Sentences. IEEE Transactions on Signal Processing. 1980, 28:357~366
    87 H. Hermansky. Perceptual Linear Predictive (PLP) Analysis of Speech. Journal of the Acoustical Society of America. 1990, 87(4):1738~1752
    88 J. Picone. Signal Modeling Techniques in Speech Recognition. Proceedings of the IEEE, 1993, 81(9):1215~1248
    89 L.R. Rabiner. A Tutorial on Hidden Markov Models and Selected Applicationsin Speech Recognition. Proceedings of the IEEE, 1989, 77(1):257~286
    90 A.P. Dempster, N.M. Laird and D.B. Rubin. Maximum Likelihood from Incomplete Data via the EM Algorithm. Journal of the Royal Statistical Society (Series B). 1977, 39(1):1~38
    91 B.H. Juang, S. Katagiri. Discriminative Learning for Minimum Error Classification. IEEE Transactions Acoustics, Speech and Signal Processing. 1992, 40(12):3043~3054
    92 V. Valtchev, J.J. Odell, P.C. Woodland and S.J. Young. Lattice-based Discriminative Training for Large Vocabulary Speech Recognition. Proceedings of the International Conference on Acoustics, Speech and Signal Processing, 1996:605~608
    93 D.B. Paul. The Lincoln Tied Mixture HMM Continouous Speech Recognizer. Proceedings of DARPA Speech and Natural Language Workshop, 1990:332~336
    94 M.Y. Hwang, X. Huang. Shared Distribution Hidden Markov Models for Speech Recognition. IEEE Transations on Speech and Audio Processing. 1993, 1(4):414~420
    95 S.J. Young, P.C. Woodland and W.J. Byrne. State Clustering in HMM-based Continuous Speech Recognition. Computer Speech and Language. 1994, 8(4):369~384
    96 H. Ney, U. Essen and R. Knesser. On Structuring Probabilistic Dependencies in Stochastic Language Modeling. Computer Speech and Language. 1994, 8(1):1~38
    97 R. Beutler, T. Kaufmann and B. Pfister. Using Rule-based Knowledge to Improve LVCSR. Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005:830~833
    98 J.R. Bellegarda. Large Vocabulary Speech Recognition with Multi-span Statistical Language Models. IEEE Transactions on Speech and Audio Processing. 2000, 8(1):76~84
    99侯珺,王作英.一种词义与词的混合语言模型及其应用.中文信息学报. 2002, 15(6):7~13
    100 D.B. Paul. Algorithms for an Optimal A* Aearch and Linearizing the Search in the Stack Decoder. Proceedings of IEEE International Conference on Acoustics,Speech, and Signal Processing, 1991:693~696
    101 S.Young, G. Evermann, et al. The HTK Book (for HTK 3.3). Dec. 2005, at http://htk.eng.cam.ac.uk
    102 S. Ortmanns, H. Ney, and X Aubert. Word Graph Algorithm for Large Vocabulary Continuous Speech Recognition. Computer Speech and Language, 1997, 11:43~72
    103 L. Besacier, P. Mayorga, J.F. Bonastre, C. Fredouille and S. Meignier. Overview of Compression and Packet Loss Effects in Speech Biometrics. Proceedings of IEEE Conference on Vision, Image and Signal Processing, Dec. 2003, 15(6):372~376
    104 B. Milner, S. Semnani. Robust Speech Recognition over IP Networks. Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, June 2000, 3:1791~1794
    105 Y.L. Chen, B.S. Chen. Model-based Multirate Reperesentation of Speech Signals and Its Application to Recovery of Missing Speech Packets. IEEE Transactions on Speech and Audio Processing, May 1997, 15(3):220~231
    106 D. Harman. User-friendly Systems Instead of User-friendly Front-ends. Journal of the American Society Information Science, 1992, 43(2):64~174
    107 S.E. oberson, J.K. Sparck. Relevance Weighting of Search Terms. Journal of the American Society Information Science, 1976, 33(4):29~304
    108 X. Liu, and W.B.Croft. Passage Retrieval Based on Language Models. Proceedings of the llth International Conference on Information and Knowledge Management, 2002:375~382
    109 G. Salton. The SMART Retrieval System. Englewood Cliffs, N,J, Prentice-Hall, Inc. 1971
    110 G. Salton. Introduction to Modern Information Retrieval, McGraw-Hill, 1983
    111王蓓.汉语韵律知觉的研究.博士学位论文.北京:中国科学院心理研究所,2002
    112 Y. Sagisaka, N. Campbell, N. Higuchi. Computing prosody. New york: Springer-Verlag, 1997
    113 C. W. Wightman and M. Ostendorf. Automatic Labeling of Prosodic Patterns. In IEEE Transactions on Speech and Audio Processing, 1994, 2:469~481
    114仲晓波,王蓓,王芳.普通话韵律词重音知觉的实验研究.心理学报,2001,33(6):481~488
    115罗志增,叶明.用证据理论实现相关信息的融合.电子与信息学报, 2001, 23(10):970~974

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700