基于网格的中文语音文件检索技术的研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
语音文件检索技术,可以有效地帮助人们从海量的语音信息资源中找到与自己需求相关的信息,是解决信息爆炸问题最有效的技术手段。随着语音识别技术的不断发展,将语音识别技术与传统的文本信息检索技术相结合来进行语音文件检索已经成为一个趋势。然而,语音识别系统的效果,将严重影响语音文件检索的性能。在大多数情况下,由于模型不匹配或者语料噪声的影响等,使得语音识别的效果往往不能令人满意。
     针对如何将语音识别技术与信息检索技术有效结合这一问题,本文从语音文件的表示形式及信息检索模型两方面进行考虑,提出了一种新的中文语音文件检索方法。一方面,对于语音文件的表示形式,采用Syllable-lattice结构。Lattice可以提供语音识别的多候选结果,它能够一定程度上减轻语音识别的误识对信息检索系统的影响。同时,基于子词的索引策略—Syllable(音节),可以有效地解决查询请求中的OOV词的问题。另一方面,对于信息检索模型,本文研究了信息检索相关技术,在传统的查询似然信息检索模型中引入了文件长度先验概率。
     实验表明,基于Syllable-lattice的检索系统的检索效果大大优于传统的One-best,其中,在信息检索模型中引入文件长度先验概率信息,可使基于Syllable-lattice的语音文件检索系统的检索效果达到最优,比基线检索模型提高了约30%。实验证明了所提方法是正确的、可行的、有效的。
Spoken document retrieval technology can be effective in helping people find relevant information from the flood of information resources. With the advances in speech recognition technology, integrating the information retrieval technology and speech recognition together to realize spoken document retrieval system has become a trend. However, in most cases, because of the mismatch of the model, or the impact of noise, the best results of speech recognition are often unsatisfactory to be used in the spoken document retrieval system.
     To solve this problem, in this paper, the effects of both retrieval source and retrieval model are considered, combine them effectively to realize a new Mandarin spoken document retrieval method. For the retrieval source, the syllable-lattice providing multiple hypothesis is adopted, which can ameliorate the effect of speech recognition error on information retrieval. At the meanwhile, the syllable-based approach can effectively solve the out-of-vocabulary problem in the query. For the retrieval model, the document length prior is combined with the traditional query likelihood retrieval model.
     Experimental results show that the retrieval performance of lattice-based method outperforms that of one-best method. Further more, in the retrieval model with the document length prior, lattice-based approach can achieve the best performance, it can improve about 30%. The new method is proved to be correct, feasible and effective by the experiments.
引文
[1]ZHENG Tieran, HAN Jiqing. Study on Syllable Based Indexing Methods in Mandarin Speech Retrieval [A]. Proceedings of National Conference on Man-Machine Speech Communication[C].2005:419-423.
    [2]陈伟,李成荣,浦剑涛.基于LVCSR的关键词检测技术的研究[A].第六届全国人机语音通讯学术会议[C].2005:134-138.
    [3]BAIBR, CHEN B L, WANG H M. Syllable based Chinese text/spoken document retrieval [A]. International Journal of Patter Recognition and Artificial Intelligence[C].2000,14(5):603-616.
    [4]李蔚.网络环境下中文信息全文检索系统的研究[D].济南:山东师范大学,2001.
    [5]B. Chen, H.M. Wang, and L.S. Lee. A discriminative HMM/N-gram-based retrieval approach for Mandarin spoken documents [A], ACM Trans. Asian Lang. Inform. Processing[C].20043(2):128-145.
    [6]丁国栋.基于统计语言建模的信息检索及相关技术研究[D].北京:中国科学院,2005.
    [7]吴丹,齐和庆.信息检索模型及其在跨语言信息检索中的应用进展[J].现代情报,2009,29(7):215-219.
    [8]丁国栋,白硕,王斌.文本检索的统计语言建模方法综述[J].计算机研究与发展,2006,43(5):769-776.
    [9]赵正文,康耀红.统计语言模型在信息检索中的应用[J].计算机工程与应用.2006,36(04):158-161.
    [10]Berlin Chen. Chinese Spoken Document Retrieval and Organization[R]. Taiwan:Information Engineering National Taiwan Normal University, 2006.
    [11]B. Chen, H.M. Wang, and L.S. Lee. Discriminating capabilities of syllable-based features and approaches of utilizing them for voice retrieval of speech information in Mandarin Chinese [A]. IEEE Trans. Speech Audio Processing[C].2002,10(5):303-314.
    [12]Peng Yu and Frank Seide, A hybrid word/phoneme-based approach for improved vocabulary-independent search in spontaneous speech [A]. ICSLP 8th International Conference on Spoken Language Processing [C], 2004:293-296
    [13]B. Chen. Exploring the use of latent topical information for statistical Chinese spoken document retrieval [A]. Pattern Recognition. Letters[C]. 2006,27(1):9-18.
    [14]JAMES, D.A. AND YOUNG. A fast lattice-based approach to vocabulary independent word spotting [A]. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing [C].1994: 337-380.
    [15]Matthew A. Siegler and Richard Stern. Integration of continuous speech recognition and information retrieval for mutually optimal performance [D]. Pittsburgh:Carnegie Mellon University,1999.
    [16]Hsin-min Wang. Mandarin spoken document retrieval based on syllable Lattice matching [A]. Pattern Recognition. Letters[C].2000:615-624.
    [17]Robertson, S.E. and Sparck Jones, K. Simple, proven approaches to text-retrieval [R].England:Cambridge University Computer Laboratory, 1996.
    [18]A. Singhal. Modern information retrieval:a brief overview [A]. IEEE Data Engineering Bulletin [C].2001:35-43.
    [19]Joshua T. Goodman. A bit of progress in language modeling [J]. Computer Speech and Language.2001.15(4):403-434P.
    [20]Chengxiang Zhai and John Lafferty. A study of smoothing methods for language models applied to Ad Hoc information retrieval [A]. Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval [C].2001:334-342.
    [21]Hidden Markov Model toolkit [EB]. Http://htk.eng.cam.ac.uk.2007.9.25
    [22]付跃文,陈国平,刘浩杰.基于Word-Lattice结构的语音识别置信度算法[J].计算机工程与应用.2006,36(04):51-54.
    [23]T. K. Chia, H. Li, and H. T. Ng. A statistical language modeling approach to lattice-based spoken document retrieval [A].In Proceedings of EMNLP-CoNLL[C].2007:810-818.
    [24]Zhengyu Zhou, Peng Yu, Ciprian Chelba and Frank Seide. Towards spoken-document retrieval for the internet:lattice indexing for large-scale web-search architectures [A]. Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics [C].2006:415-422.
    [25]Peng Yu, Kaijiang Chen, Lie Lu and Frank Seide. Searching the audio notebook:keyword search in recorded conversations [A]. Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing[C].2005:947-954.
    [26]S.L. Lee and B. Chen. Spoken document understanding and organization [J]. IEEE Signal Processing Magazine.2005,22(5):42-60.
    [27]James DA and Young SJ. A fast lattice-based approach to vocabulary independent word spotting[A]. Proceedings of ICASSP[C].1994:377-380.
    [28]G.J.F., Foote, J.T., Jones and K.S. Retrieving spoken documents by combining multiple index sources [A]. Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval[C].1996:30-39.
    [29]Matthew A. Siegler and Richard Stern. Integration of continuous speech recognition and information retrieval for mutually optimal performance [D]. Pittsburgh:Carnegie Mellon University,1999.
    [30]Ciprian Chelba and Alex Acero. Position specific posterior lattices for indexing speech[A].Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics[C].2005:443-450.
    [31]Jonathan Mamou, David Crmel and Ron Hoory. Spoken Document Retrieval from call-center conversations [A].In Proceedings of SIGIR [C].2006:51-58.
    [32]Tee Kiah Chia and Khe Chai Sim. A lattice-based approach to query-by-example spoken document retrieval [A]. Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval[C].2008:362-370.
    [33]Murat Saraclar and Richard Sproat. Lattice-based search for spoken utterance retrieval [A]. Proceedings of HLT-NAACL[C].2006:129-136.
    [34]A. Hatch, B. Peskin and A. Stolcke. Improved Phonetic SpeakerRecognition Using Lattice Decoding [A].Proceedings of ICASSP[C].2005:169-172.
    [35]R. Blanco and A. Barreiro. Probabilistic document length priors for language models [A].Proc. of ECIR[C].2008:394-405.
    [36]J. Peng and I. Ounis. Combination of document priors in web information retrieval [A].In Proc. of ECIR[C].2007:732-736.
    [37]David E. Losada and Leif Azzopardi. An analysis on document length retrieval trends in language modeling smoothing [J]. Information Retrieval, 2008, 11(2):109-138.
    [38]Hillard D and Ostendorf M. Compensation forward posterior estimation bias in confusion networks [A].Proceedings of ICASSP[C], France,2006: 1153-1156.
    [39]Javier Parapar and David E. Losada. Compression-Based Document Length Prior for Language Models [A]. Annual ACM Conference on Research and Development in Information Retrieval [C].2009:652-653.
    [40]马俊,语音识别技术研究[D].哈尔滨:哈尔滨工程大学,2004.
    [41]W. Kraaij and T. Westerveld. TNO/UT at TREC-9:How different are web documents? [A]. In Voorhees and Harman TREC9[C].2002:665-671.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700