一种原生XML数据库-Xindice的研究与改进
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
随着万维网的飞速发展,XML的应用范围不断扩大,支持XML的数据库成为众人瞩目的焦点。关系型数据库不能很好地支持XML。关系型数据库将XML转换成二维表的形式存储,但XML与二维表存储形式的转换存在性能问题。
     原生(Native)XML数据库是当前数据库领域的研究热点之一。XML文档在Native XML数据库中的存储和查询策略,是十分重要的问题。本文以一个开放源代码的Native XML数据库产品—Xindice为对象,深入分析了它的分层存储模型,研究了它的页面存储策略以及B树索引结构。在此基础上,并阐明了Xindice存储策略和查询策略存在的不足。针对这些不足,本文作了以下分析研究工作。
     在存储策略方面,当集合中加入了新的XML文档时,Xindice数据库分配“空闲”页面用于存储文档的数据,如果没有空闲页面,则创建新的页面;但是系统在删除XML文档时,仅仅将其占用的页面标记为“空闲”而并不释放其占用的空间。随着文档的插入和删除,页面文件占用的磁盘空间将会不断的增大。本文在实验的基础之上,分析了Xindice数据库存储策略在“空闲”页面管理上存在不足的原因,提出并实现了尾部页面截断策略和页面移动策略,释放了“空闲”页面占用的磁盘空间,提高了系统对磁盘资源的利用率。而针对页面移动策略页面移动次数过多,本文又提出了尾部页面移动策略,减少了释放“空闲”页面所需要的时间。
     在查询策略方面,Xindice数据库的查询语言是XPath,不支持XQuery查询语言。而XPath的查询功能有限,如不能分组、排序、连接等,不能对多个文档进行联合查询,影响了Xindice数据库查询上的灵活性。本文设计了XQuery表达式的文法,利用JavaCC工具和JJTree工具对XQuery表达式的文法生成词法语法分析器,用来识别输入的XQuery表达式的语法结构,并生成相应的语法树。根据此语法树,对构成该查询的XQuery表达式的各子句分别进行相应的查询处理,得到XML文档的最终查询结果,实现了XQuery查询,提高了Xindice数据库的查询上的灵活性。最后本文通过实例验证了本文设计的XQuery查询在Xindice数据库中的有效性。
With the rapid development of the World Wide Web, the application scope ofXML is expanding, as well as XML databases are becoming the focus of public.However, Relational Database Management System (RDBMS) couldn't support XMLvery well. Because it transforms XML to table, while the table transformation bringsperformance issue.
     Native XML database is the current hot issue in the research field of XMLdatabases. And in Native XML database, query and storage strategy is very important.In this paper, writer studies an open XML database—Xindice, analyze its multi-layerstorage model as well as its page storage strategy and B tree index structures. On thisbasis, the deficiencies of the Xindice storage strategy and the query strategy arepresented. Against its deficiencies, the paper made the following further research.
     In the storage strategy, when adding a new XML document in a collection,Xindice database allocates "idle" pages to store data files. If there are no spare pages,create new ones; however, when deleting XML documents from the system, thedatabase only marks "idle" in the pages to be deleted, without really releasing itsoccupied space. In this paper, based on experiments, In order to resolve this problem,the paper designs two new strategies, tail page truncating strategy and page movingstrategy, and implements them, which help Xindice release the disk space held by thefree page and increase the utilization rate of the disk resources. Also, in order toprevent moving pages quite frequently in page moving strategy, the paper bringsforward tail page moving strategy to reduce time cost in releazing free page.
     In the query strategy, the query language of Xindice is XPath, not supportingXQuery language. XPath is quite limited in query, which is not able to support group,sort, connect functions and can not query connected tables. This deficiency bringsdown the flexibility in Xindice query. This paper designed XQuery expressiongrammar, use JJTree and JavaCC tool for XQuery expression grammar to generatelexical syntax analyzer, which is used to identify imported XQuery expression syntaxstructure, and generate the corresponding syntax tree. According to the syntax tree, the clauses of the query XQuery expression were dealt respectively and then the ultimateXML query result was generated. The efficiency of XQuery expression in this paperhas been proved in experiments.
引文
[1] Tim Bray, Jean Paoli, C. M.Sperberg-McQueen etc. Extensible Markup Language (XML) 1.0(Fourth Edition)[EB/OL]. http://www.w3.org/TR/2006/REC-xml-20060816, 2006.
    [2] James Clark. Comparison of SGML and XML[EB/OL]. http://www.w3.org/TR/NOTE-sgml-xml-971215, 1997.
    [3] 於志勇,杨志义,於志文,李长德.XML数据存储方式的性能评价研究[J].计算机工程与应用,2006,17(42):(171—173).
    [4] 了解Oracle XML数据库[EB/OL]. http://www.oracle.com/technology/global/cn/sample_code/tutorials/cmsxdb/maintoc.htm,2003.
    [5] Ronald Bourret. XML and Databases[EB/OL]. http://www.rpbourret.com/xml/XMLAndDatabases.him, 2005.
    [6] Kimbro Staken. Introduction to Native XML Databases[M]. O'Reilly, 2001.
    [7] 渠本哲,王潜平.Native XML数据库关键技术综述[J].计算机工程与设计,2007,28(1):(24-28).
    [8] James Clark, Steve DeRose. XML Path Language (XPath) Version 1.0[EB/OL]. http://www.w3.org/TR/1999/REC-xpath-19991116, 1999.
    [9] John Cowan, Richard Tobin. XML Information Set (SecondEdition)[EB/OL]. http://www.w3.org/TR/xml-infoset/, 2000.
    [10] Arnaud Le Hors, Philippe Le Hegaret, Lauren Wood, Gavin Nicol, Jonathan Robie, Mike Champion, Steve Byme. Document Object Model (DOM) Level3 Core Specification Version 1.0[EB/OL]. http://www.w3.org/TR/2004/REC-DOM-Level-3-Core-20040407/,2000.
    [11] David Browned. SAX2[M]. O'Reilly, 2002.
    [12] Jonathan Robie etc. XQL: XML Query Language[EB/OL]. http://www.ibiblio.org/xql/, 1997.
    [13] Andreas Laux, Lars Martin. XML:DB Initiative: XUpdate-XML Update languate[EB/OL].http://xmldb-org.sourceforge.net/xupdate/xupdate-wd.html, 2000.
    [14] Apache Xindice [EB/OL]. http://xml.apache.org/xindice/,2001. Open Source Native XML Database[EB/OL]. http://exist.sourceforge.net/, 2007.
    [15] 邹红艳,易平.Xindice数据库分析[J].计算机辅助工程,2004,(1),60-63.
    [16] Open Source Native XML Database[EB/OL]. http://exist.sourceforge.net/, 2007.
    [17] Software AG Portfolio[EB/OL]. http://www.soffwareag.com/Corporate/products/default.asp,2007.
    [18] 司功闪,王鸿谷,徐捷.以XML为核心的Tamino数据库的研究与分析[J].计算机工程,2004,30(16),78-79,123.
    [19] Readex, a Leading Publisher of Historical Collection[EB/OL]. http://www.readex.com/readex/,2007.
    [20] Ronald Bourret. XML Database Products[EB/OL]. http://www.rpbourret.com/xml/XMLDatabaseProds.htm, 2005.
    [21] 四个主流的Native-XML数据库[EB/OL]. http://www.sqlite.com.cn/bbs/topicdisp.asp?tid=496, 2006
    [22] Scott Boag, Don Chamberlin, Mary E Fernandez, Daniela Florescu, Jonathan Robie, Jerome Simeon. XQuery 1.0: An XML Query Language[EB/OL]. http://www.w3.org/TR/xquery/,2007.
    [23] Xindice 1. I Internals Guide[EB/OL]. http://xml.apache.org/xindice/dev/guide-internals.html,2007
    [24] 何炎祥等.操作系统原理[M].上海科学技术文献出版社,1999.
    [25] 蔡子经,施伯乐.数据结构教程[M].复旦大学出版社,2004.
    [26] James Clark. XSL Transformations (XSLT) Version 1.0[EB/OL]. http://www.w3.org/TR/xslt,1999.
    [27] Steve DeRose, Eve Maler etc. XML Pointer Language (XPointer) Version 1.0[EB/OL].http://www.w3.org/TR/xptr/,2002.
    [28] David C. Fallside, Priscilla Walmsley. XML Schema Part 0: Primer Second Edition[EB/OL].http://www.w3.org/TR/xmlschema-0/,2004.
    [29] Henry S. Thompson, David Beech, Murray Maloney, Noah Mendelsohn. XML Schema Part Ⅰ:Structures Second Edition[EB/OL]. http://www.w3.org/TR/xmlschema-1/, 2004.
    [30] Paul V Biron, Kaiser Permanente, Ashok Malhotra. XML Schema Part 2: Datatypes Second Edition[EB/OL]. http://www.w3.org/TR/xmlschema-2/, 2004.
    [31] Jerome Simeon, Mary Fernanadez, Philip Wadler etc. Galax: XML Algebra & Query Processor[EB/OL]. http://www.galaxquery.org/, 2007.
    [32] Fatdog software. XQEngine[EB/OL]. http://www.fatdog.com, 2005.
    [33] Kenneth C. Louden etc.编译原理及实践[M].机械工业出版社,2000.
    [34] javacc: JavaCC Home[EB/OL]. https://javacc.dev.java.net/, 2007.
    [35] 姚砺,束永安.用JavaCC构造编译器的方法[J].计算机工程,2003,29(9):(39-41).