网络多媒体教育资源数据库检索研究

英文题名：Research on Retrieval of Network Multimedia Educational Resources Database
作者：原佳丽
论文级别：硕士
学科专业名称：教育技术学
中文关键词：多媒体主题词 ; 索引数据库 ; 中文分词 ; 相似度
英文关键词：Multimedia Keyword ; Index Database ; Chinese Word Segmentation ; Similarity
学位年度：2009
导师：孟祥增
学科代码：040110
学位授予单位：山东师范大学
论文提交日期：2009-04-10

摘要

不断向前发展的社会对教育提出的要求越来越高。作为一种现代化的教学手段,多媒体教学有效地促进了教育的信息化,积极地推动了教育的改革和发展。多媒体教学的开展离不开多媒体教育资源,目前,网络已成为全球最大的多媒体教育资源库。搜索引擎是人们从网上获取信息的亲密助手,但通用的搜索引擎多采用基于关键词的检索,利用它们从网上检索教学和学习所需的多种媒体资源的效率往往不高。本文在研究基于内容的多媒体检索的基础上,完善了一个面向基础教育的网络多媒体数据库检索系统,以期为中小学教师和学生等相关用户提供高效的、专业的网络多媒体资源检索服务。
     本文以中小学教材为依据组织基础教育多媒体主题词,从网上搜索、下载与主题词相关的多媒体教育资源。然后分析、提取多媒体的相关属性,建立多媒体教育资源属性索引数据库。对基于内容的图像、动画(Flash)、视频和音频数据库检索进行了研究,以ASP技术为支持实现了一个网络多媒体教育资源数据库检索系统。
     检索系统是本篇论文的主要内容,检索一开始,系统要对用户提交的多媒体内容和颜色这两项查询文本进行处理。论文提出了一种新的中文分词算法——快速双向分词算法,并根据该算法开发了一个分词模块,用于对内容描述查询文本进行中文分词。将中文分词所得结果中没有实际意义的词语和系统设定的缺省词语过滤掉,即可得到描述目标多媒体内容的关键信息。系统将根据该信息计算目标多媒体和数据库多媒体的内容描述相似度。另一方面,系统还需将颜色查询文本中的颜色名转换成HSI颜色模式值,以便于计算目标多媒体和数据库多媒体的颜色相似度。
     图像、动画、视频和音频四种多媒体类型各有其特征和属性,检索系统根据它们的主要属性设置检索条件,这些检索条件是和数据库多媒体表中的主要字段对应的。论文用相似度来衡量目标媒体和库中媒体之间的差距,系统通过比较用户根据检索条件提供的查询信息与数据库表中记录的相应字段值,计算目标媒体和库中媒体的相似度。不同的多媒体检索条件的相似度计算方法不同:对于格式和大小等简单的检索条件,系统采用布尔检索计算其相似度,即只有当用户提供的和库中存储的严格匹配时相似度才为1,否则为0。而对于内容和颜色等相对较复杂的检索条件,系统采用模糊检索计算其相似度,不同的检索条件的模糊算法不同。比如,系统比较处理后的内容描述查询文本与数据库表中记录的内容描述字段值,把它们的同义词比率定义为目标媒体和库中媒体的内容相似度。多媒体总相似度等于多媒体各相似度之积。
     为了提高检索效率,系统为多媒体数据库各表中的内容描述字段建立了索引,索引的使用加快了检索系统在多媒体内容描述检索条件上的检索速度。在为用户输出结果之前,检索系统将结果记录集放入了缓存,缓存的使用缩短了用户在输出页面进行翻页的时间。另外,本文还研究了如何提高ASP的执行效率,对检索系统的程序代码进行了改善。
     用户登录网络多媒体教育资源数据库检索系统后描述目标媒体,发出检索请求,然后由系统自动处理查询信息,计算各数据库媒体与目标媒体的相似度,把满足条件的记录资源的预览图和相似度等相关信息返回给用户。初步实验结果表明,对于多媒体数据库表中属性信息标注准确、详实的记录,系统检索结果的准确率较高,索引和缓存等的使用使系统检索的速度明显提高。
The society develops continuously, and it makes increasing demands on education. As a modern teaching method, multimedia instruction has promoted the informatization, reform and development of education actively and effectively. Multimedia educational resources are necessary for the application of multimedia instruction. At present, the Internet has become the biggest library of multimedia educational resources. Search engines are people’s good helpers when they try to obtain information from the Internet. However, most general search engines are based on keywords, and they are not very efficient in searching for a variety of teaching and learning media more often than not. After studying content-based retrieval of multimedia, this paper improvs a retrieval system of network multimedia databases in elementary education, with a view to provide teachers, students and other related users with efficient, professional retrieval service of network multimedia resources.
     This paper organizes multimedia keywords according to primary and secondary school textbooks, and then searches and downloads multimedia educational resources from the Internet. Then, it analyses and extracts the properties of those multimedia resources, and establishes index databases of multimedia educational resources. It does some research on content-based retrieval of image, flash, video and audio databases, and develops a retrieval system of network multimedia educational resources database with ASP.
     The retrieval system is the main content of this thesis. At the beginning of retrieval, the system deals with content and color descriptions of target media. This paper brings forward a new algorithm for Chinese word segmentation - fast and two-way algorithm, and develops a Chinese word segmentation module to split the text of content description. After that, the system filters out useless words and finally gets the key information of content description. On the other hand, the names of colors in color description need to be converted to values of HSI color model.
     Image, flash, video and audio have their own characteristics and properties. The retrieval system sets retrieval conditions, which correspond to the main fields of multimedia database tables, according to the main properties of multimedia. Similarity is used to measure the gap between target media and library media. The system calculates similarities between target media and library media by comparing query information and the values of corresponding fields. Different conditions have different methods of similarity calculation. There are some simple retrieval conditions, such as format and size. As for these conditions, the system calculates by means of Boolean, that is, the similarity is 1 only when target media and library media match strictly, and otherwise it is 0. As for the other conditions, such as content and color, the system calculates by means of fuzzy. Different conditions have different methods of fuzzy similarity calculation. The total similarity of multimedia is the product of each similarity of multimedia.
     In order to improve the efficiency of the retrieval, the system indexes content fields of the tables of multimedia databases. The use of index speeds up the retrieval of multimedia content description. Before the system provides results to users, it puts results into cache. The use of cache can reduce the time of pageing. In addition, this paper discusses how to improve ASP’s efficiency and program codes are improved.
     Users describe target media and give retrieval requests after they log in. Then, it is up to the retrieval system to deal with query information automatically, and then compute similarities of library media. At last, results are provided to users, including previews, similarities and other related information. As is shown by experimental results, as for those records whose index information is accurate and detailed, the results of has higher accuracy. At the same time, the use of index and cache speeds up the retrieval obviously.

引文

[1]李海辉.多媒体教学的特点与运用[J].江南大学学报,2008(3):92-94.
    [2]沈玮.多媒体技术与现代教育的紧密结合[J].中国科教创新导刊,2008:153.
    [3]徐宝文,张卫丰.搜索引擎与信息获取技术[M].北京:清华大学出版社,2003.
    [4]刘峰.通用中英文专业搜索引擎技术的研究及应用[D].大连理工大学,2004.
    [5]徐险峰.基于内容的多媒体信息检索技术[J].现代情报,2005(3):134-136.
    [6] M.S. Kankanhalli,B.M. Mehtre,J.K. Wu.Cluster-Based Color Matching for Image Retrieval[J].Pattern Reeognition,1996(4):701-708.
    [7]邓娟.基于内容的图像数据库检索中关键技术的研究[D].东华大学,2005.
    [8]周洞汝,胡宏斌等.视频数据库管理系统导论[M].北京:科学出版社,2000.
    [9]高文,刘峰,黄铁军等.数字图书馆——原理与技术实现[M].北京:清华大学出版社, 2000.
    [10] S.F. Chang,W. Chen,H.J. Meng,H. Sundaram,D. Zhong.VideoQ:An Automated Content Based Video Search System Using Visual Cues[J].In Proceedings of ACM Multimedia,1997(11):313-324.
    [11]刘磊.Flash动画的内容分析与特征提取研究[D].山东师范大学,2008.
    [12]韩圣龙.基于内容的音频音乐自动分析和检索技术研究[J].情报科学, 2007(3):440-444.
    [13]李国辉,李恒峰.基于内容的音频检索:概念和方法[J].小型微型计算机系统, 2000(11):1173-1177.
    [14]李育贤.从宏观层面探析多媒体网络教学[J].教学与管理,2008(6):94-95.
    [15]孟祥增.多媒体网络教学资源的内容特征提取与搜索研究[J].电化教育研究, 2007(12):33-37.
    [16] A.K. Jain,S. Prabhakar,H. Lin.A Multichannel Approach to Fingerprint Classification[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,1999(4):348-359.
    [17]谭俊明.自然语言的理解综述[J].科技广场,2008(5):253-256.
    [18]朱德熙.语法讲义[M].北京:商务印书馆,1982.
    [19]周文帅,冯速.汉语分词技术研究现状与应用展望[J].山西师范大学学报(自然科学版),2006(3):25-29.
    [20]汤艳莉.汉语自然语言检索及其用户提问处理[D].北京师范大学,2003.
    [21]原佳丽,杨仁广,孟祥增.快速双向中文分词算法[J].山东师范大学学报(自然科学版),2009(1):75-77.
    [22]李庆虎,陈玉健,孙家广.一种中文分词词典新机制——双字哈希机制[J].中文信息学报,2003(4):13-18.
    [23]张华,张雁,贾志娟等.ASP项目开发实践[M].中国铁道出版社,2006(6).
    [24]赵胜杰,杨磊.基于HSI彩色模型的彩色图像自动调节方法[J].科技信息, 2008(2):102-104.
    [25]刘浩一.基于中文自然语言查询的多媒体数据库检索系统[D].山东师范大学, 2006.
    [26]李海霞.基于自然语言的图像数据库检索技术研究[D].山东师范大学,2004.
    [27]韦娜,耿国华,周明全.基于内容的图像检索系统性能评价[J].中国图象图形学报, 2004(11):1271-1276.
    [28]衡量搜索引擎性能:查全率(Recall)和查准率(Precision)[EB/OL].中国搜索门户,http://www.cnsousuo.com/html/08/n-708.html,2009.
    [29]李晓明,闫宏飞,王继民.搜索引擎——原理、技术与系统[M].北京:科学出版社, 2005.
    [30]魏春燕,基础教育多媒体网络教学资源检索研究[D].山东师范大学,2008.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700