网络课件的多粒度信息提取
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
现代远程教育蓬勃发展的今天,于教学资源的建设显得更加重要。在教学资源建设中,能否对已有资源进行快速准确的检索,从而达到合理的共享与重用,是资源能否最大限度地发挥作用的关键。信息提取技术是信息检索和信息共享与重用的基础。目前网络上已经累积了大量的网络课件,如何从现有课件中准确的提取有用的资源,用于课件建设或其他应用,成为解决此类问题的关键。网络课件的特点是其内容主要由多媒体信息构成,而传统的信息提取技术多为针对文本信息的提取,因此需要设计新的信息提取算法来提取多媒体信息。另一方面传统信息提取技术往往将整篇或者整段文字作为结果返回给用户,而在信息查询、信息共享以及信息重用中,往往需要更为精确、范围更为多样化的信息。如果有一种信息提取方法,能实现对信息提取的结果提供多种粒度的选择,将有助于提高信息提取的准确度和信息的重用程度。有鉴如此,论文提出了一种网络课件的多粒度信息提取方法。
    随着网络远程教育日益普及,与之相关的规范化、标准化也成为国际国内广泛关注的问题。国家有关机构正在制定与网络课件相关的一系列规范和标准,在这些规范和标准中提出了网络课件应该遵循的基本模型。由于在这些规范中规定了网络课件的描述语言是XML语言,因此遵循标准的网络课件文档都将是结构化文档。为了与标准兼容,网络课件的多粒度信息提取实际上成为了对网络课件结构化文档的多粒度信息提取。以网络课件标准模型为基础,论文提出了适用于信息提取的网络课件信息提取模型.信息提取模型对标准课件模型进行了改进,突出了信息提取所需要的信息结构。基于该提取模型设计了课件的多粒度提取过程和相关算法。多粒度信息提取算法主要从两个方面考虑,即多媒体内容的提取和多粒度的信息提取。在多媒体内容的信息提取中,采用了对多媒体内容加权的方法来计算其相关度;而在多粒度信息提取中,引入了图论的概念,对文本内容的逻辑结构和语义结构进行了分析,定义了对应的概念与相关度计算方法。为了实现课件多粒度信息提取的构想和检验所提出的方法,设计开发了一个实验系统。在该系统中对算法中可能对结果产生影响的主要参数进行了测试和评估,并根据实验结果提出了改进建议与未来研究的方向。
With the fast development of web-based education, the construction of tech resources became the more important factor. In the construction of tech resources, whether carry through fast and exact searches and sharing and reuse or not, which became key problem of resources take maximum function. Retrieval of information is the base of information search, information sharing and reuse. Since there have been a lot of courseware on the web, how to retrieve useful information from the large amount of courseware for the using of information in the construction of other courseware or other applications became the key problem of these tasks. Because of the particularity of the network courseware, its main contents are made up of amounts of multimedia information. Traditional information retrieval methods are mostly based on the study of retrieval of text information, so to design the new algorithm to retrieve multimedia information is necessary. Traditional information methods always return the whole document or a full paragraph to the users as results. But in course of query, information sharing and information reusing, users usually require the result to be more exact. If the information retrieve method can provide multi-granularity choose of the information retrieve result, it can give user more efficient information, shorten the time of re-query the information from the results and provide more suitable results for different user or applications. This paper proposes a multi-granularity information retrieve method for web-based courseware. With the prevalence of remote education, the standardization problem has become the hotpot problem and attracted more and more attention over the world. Some related institutions are making serials of rules and standardizations. In our state’s related standardization, the standard courseware model has been proposed and XML has been the stated description language for web-based courseware. In the recently future, all web-based courseware will follow the standardization, and the documents of courseware will be structured documents. In order to be compatible with the future standardization, multi-granularity information retrieval for web-based courseware become the multi-granularity information retrieval for structured document problem in fact. Based on the web-based courseware standard model, the paper proposes the retrieval model for the information retrieval. The retrieval model has changed the standard model at some degree. Based on the retrieval model, designs the
    
    
    multi-granularity retrieval procedure and related algorithm for web-based courseware. The algorithm is mainly about of two aspects: multi-granularity and multi-media retrieval. In the multi-media retrieval, a threshold is used, and in multi-granularity we make reference to graphic theory methods. For implement and evaluation of the multi-granularity information retrieval, an experiment system has been developed. Some experiments have been done to find out the result of changing main variables. The experiment shows the method can get more exact results and some suggestions are made for improvement in future research.
引文
教育部现代远程教育资源建设委员会,《现代远程教育资源建设技术规范》(试行),2000-5
    吴中伟.廉慧珍著. 高性能混凝土. 中国铁道出版社,1999年
    K.Kuhlmamn,H.Paschmamn. Beitrage zur Okologislhen von Zement und Beton. Zement-Kalk-Gips. Nr.1,1997
    中国教育部教育信息化技术标准委员会,现代远程教育技术标准(CELTS)体系结构标准(草案), 2002
    R Coldman, J McHugh, J Widom. From semistructured data to: XML:Migrating the lore data model and query language. In: Proc of the 2nd In’t Workshop on the Web and Databases(WebDB’99). Philadelphia,1999,25-30
    T Lahirk, S Abitebout, j Widom. Ozone:Integration structured and semistructured data. 1998. http://www-db.stanford.edu/lore/pubs/data.html
    郑仕辉,周傲英,季文云等,基于SQL的XML查询的有效实现。计算机研究与发展,2001,38(4):422-429
    D Florescu, D Kossmann. Storing and Querying XML data using an RDBMS. Bulletin of the Technical Committee on Data Engineering,1999,22(3):27-34
    Wang Xiao-Ling, Wen Ji-Rong, Luan Jin-Feng, Ma Wei-Ying, Dong Yi-Sheng. A New Method to Query Document Database by Content and Structure. Journal of Software, Vol.13.No.4
    Wang Xiao-Ling, Wen Ji-Rong, Liu Wen-Yin, Dong Yi-Sheng. Enhancive Index for Document Retrieval. RIDE’02,2002,IEEE
    吕晓辉,Web信息提取技术研究,西北工业大学,硕士学位论文,万方数据库
    Heather Williamson.XML技术大全.机械工业出版社,2002
    Holger Meuss, Logical Tree Matching with Complete Answer Aggregates for Retrieving Structured Documents,Tag der Einreichung :12.4.2000
    Dieter Scheffner, Johann-Christoph Freytag, Access Support Tree&TextArray:A Data Structure for XML Document Storage&Retrieval,SSDRBM‘02
    Mounia Lalmas,Ekaterinl Moutogianni, A Dempster-Shafer indexing structured document retrieval:Implementation and experiments on a web museum collection. IEE, Lodon WC2R,1999
    Georg Lausen,Adaptive Evaluation Techniques for Querying XML-based E-Catalog.RIDE02’

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700