一种基于流的XML查询算法的设计与实现
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
XML在信息管理、电子商务、个性化出版、移动通信、网络教育、电子文档交换等诸多领域得到了广泛应用,已经开始成为Internet上数据描述和交换的事实上的标准。随着XML技术的不断发展及其应用领域的不断扩展,越来越多的数据开始采用XML进行描述、存储、交换和表现。传统的信息管理技术由于XML文档的出现而正面临新的挑战,因此增强面向XML文档信息查询能力变得越来越重要。
     通过对现有的XML文档查询算法分析发现:算法的实现都是把被查询文档全部载入内存之后再进行处理,因此要消耗大量内存,尤其是在XML文档很大以致于无法全部载入内存的情况下,现有的算法就无能为力了。针对这一问题,本文设计并实现了一种新的查询算法。该算法根据XPath查询表达式,生成一个查询自动机;将查询条件隐含在查询自动机的结构和状态中;XML流经过解析转化为事件流,这些事件作为查询自动机的输入,触发状态转换。查询自动机依据不同的输入事件,例如元素开始事件、文本事件和元素结束事件等,在各个状态之间进行转换。文档尽可能少地占用内存,一旦确认某一部分文档完全匹配查询表达式,就输出查询结果。
     论文中详细地介绍了由查询表达式构造查询自动机的步骤;实现了一个基于流的XML文档查询系统的原型,它可以在对XML流的一次单向读取过程中处理XPath,输出查询结果。论文中还对基于内存的XML查询算法和基于流的XML查询算法进行测试、比较,并对结果进行了分析。
     基于流的XML查询算法是为了满足一些数据密集型应用对数据查询处理的需求而引入的,这类应用处理的数据不宜用持久稳定的关系建模,而应采用数据流建模。这类应用的领域包括金融服务,网络监控,电信数据管理,生产制造,传感检测等。本论文的研究对这类实际应用将具有一定的理论意义和使用价值。
In many fields such as Information management, E-business, Personalize publication. Mobile communication, Online Education, and Electronic data interchange, XML has been put to extensive use and has become the defacto standard for data description and exchange. As the evolution of XML technology and the spread of XML application, more and more information has been described, stored, exchanged and presented in XML. Conventional information management technology meets the challenge of XML. It becomes more important to develop the technology of querying information from XML document.
    After investigating the existing algorithm, we find that the implementation is based on the idea that whole XML document must be loaded into memory. While XML document is too large or cann't be loaded into memory, algorithm is of no effect. To solve this problem, we present a different method for query processing in this paper. According to the query expression, a query automaton is build, whose struct and states imply the query predicate. The XML stream is parsed into element tag and text events stream. Those events such as startelement event, textevent or endelement event trigger states transition of query automaton. To reduce memory requirement, the document fragment should be put into output immediately as soon as it meets query expression.
    We present a method of building query automaton and an implementation of XML stream query system. This implementation could evaluate the XPath expression in one-pass scan of xml stream. Finally we make a comparative experiment to investigate the memory use of memory-based algorithm and the steam-based algorithm.
    XML stream querying problem is introduced to meet the querying requirement of some data-intensive application. These applications adapt to stream data model instead of persistent data model. Stream data model can be used in many fields such as financial service, network monitor, communication data management, manufacture, sensor network. Research in this paper contributes to the theory and practice of this kind of applications.
引文
[1] Simon ST.Laurent.XML基础教程.第一版.北京:电子工业出版社,2000.3-68
    [2] Derick Wood. SGML and XML Document Grammars and Exceptions. Information and Computation, 2001, 169(2): 230-251
    [3] 孙一中等.XML 理论和应用基础.第一版.北京:北京邮电大学出版社.2000.173-319
    [4] Steven Holzner.XML使用详解.第二版.北京:机械工业出版社,1999.225-472
    [5] Milo Tova, Suciu Dan, Vianu Victor. Typechecking for XML transformers. Journal of Computer and System Sciences, 2003, 66(1): 66-97
    [6] Tim Bray, Jean Paoli, C.M. Sperberg-McQueen, Eve Maler. Extensible Markup Language (XML) 1.0 (Second Edition). World Wide Web Consortium, 2000. http://www. w3. org/TR/2000/ REC-xml-20001006
    [7] Michael J.Young.XML学习指南.第一版.北京:机械工业出版社,2001.89-131
    [8] Ceri Stefano, Comai Sara, Damiani Ernesto, Fraternali Piero Paraboschi Stefano, Tanca Letizia. XML-GL: a graphical language for querying and restructuring XML documents. Computer Networks, 1999, 31(11-16): 1171-1187
    [9] Charles F. Goldfarb, Paul Prescod. XML 手册.第四版.北京:电子工业出版社 2003. 552-571
    [10] J. Clark, S. DeRose. XML Path Language (XPath)Version 1.0. World Wide Web Consortium. 16 November 1999. http://www, w3. org/TR/xpath
    [11] D. Chamberlin, J. Robie, D. Florescu. Quilt: An XML query language for heterogeneous data sources. In Proceedings of WebDB, 2000. 53-62
    [12] D. Chamberlin. XQuery: An XML query language. IBM SYSTEMS JOURNAL, 2002, 41(4):597-615
    [13] Goldman R, McHugh J, Widom J. From semistructured data to XML: migrating the lore data model and query language. In proceedings of the 2nd International Workshop on the Web and Databases, 1999. 12-20
    [14] A. Deutch, M. Fern andez, D. Florescu, k. Levy, D. Suciu. XML-QL: A query language for XML. Computer Networks, 1999, 31(1116). 1155-1169
    [15] J. Robie, J. Lapp, D. Schach. XML Query Language (XQL).In proceedings of the W3C Query Language Workshop (QL-98), 1998. 1-55
    [16] 张璞,庄成三.XML 查询语言技术与实例分析.计算机应用研究,2000,17(05).109-111
    [17] D. Florescu, D. Rossman. Storing and querying XML data using an RDBMS. IEEE Data Engineering Bulletin, 1999, 22(3): 27-34
    [18] 郑仕辉,周傲英.基于SQL的XML查询的有效实现.计算机研究与发展,2001,38(04):422-429
    
    
    [19] 李京,庄成三.用XML对数据库查询的方法.计算机应用,2000,20(10):21-24
    [20] 施伟斌,孙未未,施伯乐,XML 数据的结构化处理方法.计算机研究与发展,2002,39(7):819-826
    [21] 展霄峡,黄上腾.XML 查询语言 XML-QL 及其查询优化.计算机工程,2002,28(3):111-113
    [22] 袁国栋等.建立特殊索引实现 XML 文档的查询优化.计算机工程,2002,28(3):114-116
    [23] 程雷,朱茂盛.XQuery的实现机制.计算机工程与应用,2002,38(4):79-85
    [24] S. Babu, J. Widom. Continuous queries over data streams. SIGMOD Record, 2001, 20(3):109-120
    [25] B. Babcock, S. Babu, M. Datar, R. Motwani, J. Widom. Models and Issues in Data Stream Systems. In PODS, 2002. 1-16
    [26] Mehmet Altinel, Michael J. Franklin. Efficient filtering of XML documents for selective dissemination of information. In Proceedings of the 26th VLDB Conference, 2000. 53-64
    [27] C. Chan, P. Felber, M. Garofalakis, R. Rastogi. Efficient filtering of XML documents with XPath expressions. In Proceedings of ICDE, 2002. 235-244

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700