XML数据查询的关键技术研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
XML的全称是Extensible Markup Language(可扩展标识语言)由于具有简单、可扩展、互操作性强,开放性强等特点,正迅速成为一种与技术无关的数据交换的标准和传输格式,并逐渐成为当前网络应用中事实的数据表达、交换的标准。鉴于XML在诸多领域有广泛的应用前景,许多关于XML的研究都是前沿和热点课题。例如在数据库领域,从某种意义上说XML作为数据库使用可以自然地表示嵌套型数据,比关系型数据库具有更强的表达能力,但是对XML数据的查询还有很多不完善的地方,其查询准确性与查询速度都需进一步的提高。
     XML数据管理系统主要解决XML数据的存储管理、查询处理、访问控制、数据更新等。XML查询处理与优化包括XML查询代数、查询处理、查询优化等。XML数据查询是XML数据管理一个非常重要的组成部分,是当前学术界研究的一个热点方向。XML查询根据其查询模式的不同可以分为两类:XML Query查询方式和XML IR查询方式。而XML IR方式又可以细分为三类:XML IR/keyword方式、XML IR/query方式和XML IR/fragment方式。
     本文主要研究XML数据集成查询过程中碰到的一些问题,以及所采取的相应解决方案。其中主要包括三部分的内容:第一,由于XPath是当前流行的XML数据查询语言XQuery和XSLT的基础,我们针对XPath语言中的复杂路径表达式,设计了一种路径表达式的最优化方法,用以提高在对XML进行查询时的执行效率;第二,基于当前比较流行的一种查询代数OrientXA,基于代数表达等价原则,设计了一系列的等价转化方法,简化了XML查询路径表达式的代数表示,优化了XML数据的查询效率;第三,针对多XML数据源的集成查询,由于查询过程往往涉及到对多个XML片段中相似重复信息的处理,而我们有时候需要对多XML片段中的共同信息进行提取,由此,本文提出一种XML有向标记树模型,并在此模型上设计了一种相似匹配算法来对共同信息进行挖掘。实验显示,该算法具有很高的可行性及使用价值。
XML, which stands for Extensible Markup Language, with the advantages of simplicity,scalability,interoperability, and strong opening characteristics, is fast becoming a kind of standards for data exchange which is unrelated to the technology and transmission format. In view of XML have broad application prospects in many fields, and many studies about XML query there are forefront and hot topics. For example, in the field of database, in a sense XML as a database can be naturally usto represent nested data, and it have a more stronger ability. but there are many deficiency in XML data query field, the efficiency and speed need to be improved largely.
     XML data management system mainly solve the XML data storage management, query processing, access control, data updates. XML data storage management technology include data storage,data encoding, indexing and other methods. XML query processing and optimization, including XML query algebra, query processing, and query optimization. XML data query is a very important component of XML data management and it is a hot topic of the current academe. According to the different query patterns XML Query can be divided into two main parts:XML Query query mode and XML IR queries. XML IR can also be devided into three parts:XML IR/keyword method, XML IR/query methods and XML IR/fragment method.
     This paper studies some of the key technologies of XML queries, consisting mainly three parts:first, introducing briefly the basic concepts of XML data query, and designing a optimization mehod for Xpath expressions. Secondly, we analyzed the concept and representation of the current popular XML query algebra-OrientXA, then we proposed a new query optimization approach based on this query algebra; Finally, because XML query process often involves several similar XML fragments, and we sometimes want to find out same information in different XML Data, thus, this paper presents a tree model of XML Data with labels, based on this model, we design a matching algorithm, some experiments show that the algorithm has high feasibility.
引文
[1]W3C. Extensible Markup Language(XML)1,0(Third Edition)[EB/ OL]. Http://www.w3.org/TR/REC-XML
    [2]Bray T and Paoli J, sPerberg-McQueen CM, MalerE, Eds.Extcnsible markup Language(XML)1.0(2nd Edition). w3cReCommendation,2000.
    [3]Meng XF, Zhou LX, Wang S. State of the art and trends in database research. Journal of Software,2004,15(12):1822-1836 (in Chinese with English abstract). http://www.jos.org.cn/1000-9825/15/1822.htm
    [4]J.Clark and S. DeRose. XML Path Language (XPath). W3C Recommendation, http://www.w3.org/TR/xpath/,1999.
    [5]张婷,常致全,肖厚新。XPath语义特性及其对XML数据操作的应用研究[J].信息技术,2006,3(05):94-97
    [6]D.Chen, R.Wong.Optimizing the Lazy DFA Approach for XML stream processing[C].In:Porc.of Conference on Database.Ausrtalaisan,2004,10: 131-140.
    [7]孔令波,唐世渭,杨冬青,王腾蛟,高军。XML数据的查询技术Journal of Software, Vol.18, No.6, June 2007, pp.1400-1418
    [8]Deutsch A, Fernandez M, Florescu D, Levy A, Suciu D. A query language for XML. Computer Networks,1999,31(11-16):1155-1169.
    [9]Donald D. Chamberlin, Jonathan Robie, Daniela Florescu, Quilt:An XML Query Language for Heterogeneous Data Sources, Selected papers from the Third International Workshop WebDB 2000 on The World Wide Web and Databases, p.1-25, May 18-19,2000
    [10]Don Chamberlin, Daniela Florescu, Jonathan Robie, Jerome Simeon, and Mugur Stefanescu. XQuery:A Query Language for XML. Working draft, World Wide Web Consortium, February 2001.
    [11]Beech D, Malhotra A, Rys M. A formal data madel and algebra for XML. In:Beech D, alhotra A, Rys M, eds, Note to the W3C XML Query Working Group.1999..1-26.
    [12]Fernandez M, Simeon J, Suciu D, Wadler P. A data model and algebra for XML query.1999.
    [13]Kay M. XSL transformation(XSLT), Version 1.0. W3C Recommendation, 1999. www.w3.org/TR/xslt
    [14]Dietz P F. Maintaining Order in a Linked List. In:Lewis H R et al Eds. Proceedings of the 14th Annual ACM Symposium on Theory of Computing (STOC'82). San Francisco, California, USA. May 5-7,1982. New York:ACM Press,1982.122-127
    [15]Zhang C, Naughton J, Dewitt D, et al. On Supporting Containment Queries in Relational Database Management Systems. In:Meng Xiaofeng et al Eds. Proceedings of the 20th ACM SIGMOD International Conference on Management of Data. Santa Barbara, California, USA. May 21-24,2001. NewYork:ACM Press,2001.426-437
    [16]Li QZ, Moon B. Indexing and querying XML data for regular path expressions. In:Apers PMG, Atzeni P, Ceri S, Paraboschi S, Ramamohanarao K, Snodgrass RT, eds. Proc. of the 27th Int'l Conf. on Very Large Data Bases (VLDB). Roma:Morgan Kaufmann Publishers,2001.361-370.
    [17]S. Ceri, S. Comai, E. Damiani, P. Fraternali, S. Paraboschi and L. Tanca, XML-GL:a graphical language for querying and restructuring XML documents (electronic version), in:Proc. of the Int. World Wide Web Conference (WWW8), Comput. Networks ISDN Syst.31 (11-16) (1999) 1171-1187.
    [18]A. Deutsch, M.F. Fernandez, D. Florescu, A.Y. Levy and D. Suciu, A query language for XML (electronic version), in:Proc. of the Int. World Wide Web Conference (WWW8), Comput. Networks ISDN Syst.31 (11-16) (1999) 1155-1169.
    [19]Yoshikawa M, Amagasa T. XRel:A path-based approach to storage and retrieval of XML documents using relational databases. ACM Trans, on Internet Technology,2001, 1(1):110-141.
    [20]Chen Y, Davidson SB, Zheng YF. BLAS:An efficient XPath processing system. In:Weikum Q Konig AC, DeBloch S, eds. Proc. of the ACM SIGMOD Int'l Conf. on Management of Data (SIGMOD). Paris:ACM Press,2004.47-58.
    [21]Wang W, Wang HZ, Lu HJ, Jiang HF, Lin XM, Li JZ. Efficient processing of XML path queries using the disk-based F&B index. In:Bohm K, Jensen CS, Haas LM, Kersten ML, Larson P, and Ooi BC. eds. Proc. of the 31st Int'l Conf. on Very Large Data Bases (VLDB). Trondheim:ACM Press,2005.145-156.
    [22]S. Chien, V. Tsotras, and C. Zaniolo. Efficient management of multiversion documents by object referencing.In VLDB, pages 291-300, Rome, Italy,2001.
    [23]C.E. Dyreson. Observing transaction-time semantics with TTXPath. In WISE, pages 193-202,2001.
    [24]M. Gergatsoulis and Y Stavrakas. Representing changes in XML documents using dimensions. In XSym, pages 208-222, Berlin, Germany,2003.
    [25]C. Gao and R. Snodgrass. Syntax, semantics and query evaluation in the τXQuery temporal XML query language. Time Center Technical Report TR-72, 2003.
    [26]Albeno O.Mendelzon, FlaVio Rizzolo, Alejandro Vaismall. Indexing Temporal XML Documents. In Proceedings of the 30th VLDB Conference,2004
    [27]Abdullah Uz Tansel, Temporal Relational Data Model, IEEE Transactions On Knowledge and Data Engineering, VOL.9, NO.3, MAY/JUNE 1997, 464-479
    [28]Milo T, Suciu D. Index structures for path expressions. In:Beeri C, Buneman P, eds. Proc. of the 1999 Int'l Conf. on Database Theory (ICDT). LNCS 1540, Jerusalem:Springer-Verlag,1999.277-295.
    [29]Kaushik R, Sheony P, Bohannon P, Gudes E. Exploiting local similarity for efficient indexing of paths in graph structured data. In:Agrawal R, Dittrich K, Ngu AHH, eds. Proc. of the 18th Int'l Conf. on Data Engineering (ICDE). San Jose:IEEE Computer Society,2002.129-140.
    [30]Chung C, Min J, Shim K. APEX:An adaptive path index for XML data. In: Franklin MJ, Moon B, Ailamaki A, eds. Proc. of the 2002 ACM SIGMOD Int'l Conf. on Management of Data (SIGMOD). Madison:ACM Press,2002. 121-132.
    [31]Polyzotis N, Garofalakis M. Statistical synopses for graph-structured XML databases. In:Franklin MJ, Moon B, Ailamaki A, eds. Proc. of the 2002 ACM SIGMOD Int'l Conf. on Management of Data (SIGMOD). Madison:ACM Press, 2002.358-369.
    [32]Chen Q, Lim A, Ong KW. D(k)-index:An adaptive structural summary for graph-structured data. In:Halevy AY, Ives ZG, Doan A, eds. Proc. of the 2003 ACM SIGMOD Int'l Conf. on Management of Data (SIGMOD). San Diego: ACM Press,2003.134-144.
    [33]S. Abiteboul. H. Kaplan,T. Milo. Compact Labeling Schemes for Ancestor Queries[A], In Proceedings ofthe Twelfth Annual Symposium on Discrete Algorithms(SODA)[C],2001:547-556.
    [34]E. Cohen,H. Kaplan,T. Milo. Labeling Dynamic XML Trees[A]。 In Proceedings of the Twenty-first ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems(PODS)[C],2002:271·281.
    [35]X. Wu,M. L. Lee, W. Hsu. A Prime Number Labeling Scheme for Dynamic Ordered XML Trees[A], In Proceedings of the 20th International Conference on Data Engineering (ICDE)[C],2004:66-78.
    [36]I. Tatarinov,S. Viglas. K. S. Beyer,J. Shanmugasundaram,E. J. She kita,C. Zhang. Storing and Querying Ordered XML Using a Relational Dambase System[A], In Proceedings of the 2002 ACM SIGMOD International Conference On Management of Data(SIGMOD)[C],2002:204-215.
    [37]R. Agrawal. A. Borgida. H. V. Jagadish. Efficient Management of Transitive Relationshipsin Large Data and Knowledge Bases[A], In In Proceedings of the 1989 ACM SIGMOD International Conference on Management ofData(SIGMOD)[C],1989:253·262.
    [38]Q. Li, B. Moon. Indexing and Querying XML Data for Regular Path Expressions [A], In the Proceedings of 27th International Conference on Very Large Data Bases(VLDB)[C],2001:361.370.
    [39]M. Duong, Y. Zhang. A New Labeling Scheme for Dynamically Ulxlating XML Data[A], In Proceedings of the 16thAustralasian Dambase Conference(ADC)[C],2005:185493.
    [40]T. Amagasa,M. Yoshikawa,s. Uemura. QRS:A Robust Numbering Scheme for XML Documents [A], In Proceedings of the 19th International Conference on Data Engineering (ICDE)[C],2003:705-707.
    [41]B. Catania,W. Q. Wang, B. C. Ooi, X. Wang. Lazy XML Updates: Laziness as aV'mue of Update and Structural Join Efficiency [A], In Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data(SIGMOD)[C],2005:515-526.
    [42]C. Li, T. W. Ling,M. HtL Efficient Processing of Updates in Dynamic XML Data[A], In Proceedings of the 22nd International Conference on Data Engineering (ICDE)[C],2006:13-13.
    [43]C. Li,T. W. Ling. QED:A Novel Quaternary Encoding tO Completely Avoid Re-labeling in XML Updates[A]. In Proceedings of the 2005 ACM CIKM International Conference on Information and Knowledge Management(CIKM)[C],2005:501-508.
    [44]A. Silberstein,H. He, K. Yi, J. Yang. BOXes:Efficient Maintenance of Order-Based Labeling for Dynamic XML Dam[A], In Proceedings ofthe 21 st International Conference on Data Engineering(ICDE)[C],2005:285-296.
    [45]Tolga Bozkaya, Meral Ozsoyoglu, Indexing Valid Time Intervals NSF grants IRI 92-24660
    [46]D. Draper, P. Fankhauser, M. Fernandex, A. Malhotra, K. Rose, M. Rys, J. Simeon and P. Wadler, "XQuery 1.0 and XPath 2.0 Formal Semantics", October 2004
    [47]D. Florescu and D. Kossmann, "Storing and Querying XML Data Using an RDBMS", Data Eng. Bulletin,22(3),1999
    [48]H. V. Jagadish, S. Al-Khalifa, A. Chapman, L. V. S. Lakshmanan, A. Nierman, S. Paparizos, J. M. Patel, D. Srivastava, N. Wiwatwattana, Y. Wu, C. Yu, TIMBER:A native XML database, The VLDB Journal-The International Journal on Very Large Data Bases, v.11 n.4, p.274-291, December 2002.
    [49]Vanja Josifovski, Marcus Fontoura, Attila Barta, Querying XML streams, The VLDB Journal-The International Journal on Very Large Data Bases, v.14 n.2, p.197-210, April 2005
    [50]Lipyeow Lim, Min Wang, Jeffrey Scott Vitter, CXHist:an on-line classification-based histogram for XML string selectivity estimation, Proceedings of the 31st international conference on Very large data bases, August 30-September02,2005
    [51]Christian Mathis, Theo Harder, Michael Haustein, Locking-aware structural join operators for XML query processing, Proceedings of the 2006 ACM SIGMOD international conference on Management of data, June 27-29, 2006
    [52]K. Beyer, R. Cochrane, M. Hvizdos, V Josifovski, J. Kleewein, G Lapis, G Lohman, R. Lyle, M. Nicola, F. Ozcan, H. Pirahesh, N. Seemann, A. Singh, T. Truong, R. C. Van der Linden, B. Vickery, C. Zhang, G Zhang, DB2 goes hybrid:integratng native XML and XQuery with relational data and SQL, IBM Systems Journal, v.45 n.2, p.271-298, January 2006
    [53]Peter Boncz, Torsten Grust, Maurice van Keulen, Stefan Manegold, Jan Rittinger, Jens Teubner, MonetDB/XQuery:a fast XQuery processor powered by a relational engine, Proceedings of the 2006 ACM SIGMOD international conference on Management of data, June 27-29,2006, Chicago, IL, USA
    [54]Andrey Balmin, Kevin S. Beyer, Fatma Ozcan, Matthias Nicola, On the path to efficient XML queries, Proceedings of the 32nd international conference on Very large data bases, September 12-15,2006, Seoul, Korea
    [55]Alexandru Berlea, On-the-fly tuple selection for XQuery, Proceedings of the 4th international workshop on XQuery implementation, experience and perspectives, p.1-6, June 15-15,2007, Beijing, China

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700