支持XML数据查询的F&B索引结构的研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
XML(可扩展标记语言),作为网络上数据表示和信息交换的工具,以其自描述性、独立于平台等特点,已经成为新一代的网络语言。随着XML的广泛应用,XML上的索引及其相关技术的研究就显得十分重要。
     本文以解决XML最重要的结构索引——F&B索引在实际应用中的问题为目标,就F&B索引的创建、存储、执行查询等问题进行了研究。本文的工作及主要贡献包括如下几个方面:
     首先,从节省内存空间的角度出发,针对XML树模型和有向无环图模型,分别提出了新的F&B索引创建算法SAJ和SAM。理论分析表明树模型上的SAJ算法的空间性能优于现有的算法,有向无环图模型上的SAM算法的时间、空间性能均优于现有的算法。实验结果表明这两个算法是正确、高效的,并且有良好的可扩展性。
     其次,着眼于F&B索引使用中占用内存空间过大的问题,基于聚簇的思想,提出了一种新的基于磁盘的F&B索引结构——EDF&B索引,大大节省了使用F&B索引所占用的空间代价。实验结果表明,该索引结构的冗余量很小适合实际应用。
     最后,将现有的F&B索引查询处理算法扩展到EDF&B索引上,提出了基于EDF&B索引的新的查询处理算法,并用实验验证了该算法的高效性和EDF&B索引的有效性。
XML(eXtensible Markup Language), as the tool of describing data and exchanging information online, has been a new language of Internet, because of its advantages of self-description, independence of platform, and so on. Since XML has been applied wildly, the research of indexes of XML and the related technology become very important.
     To solve the practical problem of F&B index, which is the most important structural index of XML, this paper studies several key problems, such as construction of F&B index, storage of F&B index and query processing. Main contributions of this paper include:
     First, to save the memory, new algorithms for building F&B Index of XML tree model and DAG model, SAJ and SAM, have been proposed. Theoretical analysis shows that SAJ saves more space cost than previous algorithm and SAM is better than previous one over time and space cost. Experiment results show that these algorithms are correct, efficient and well scaled.
     Second, considering the space problem caused loading F&B index into memory, EDF&B index, a new structure of F&B Index stored on disk by clustering, has been proposed, which saves much space cost. Experiment results show it’s feasible in practice.
     Finally, current algorithms of query processing on F&B index are extended to EDF&B index, and new algorithm based EDF&B index is proposed. Experiment results show the efficiency of new algorithm and EDF&B index.
引文
1 S. Abiteboul, S. Cluet, T. Milo. Querying and Updating the File. VLDB 1993: 73-84
    2 A. Renner. XML Data and Object Databases: A Perfect Couple. ICDE 2001: 143-148
    3 Klettke, M. Meyer. XML and Object-Relational Database Systems Enhancing Structural Mappings Based on Statistics. Int. Workshop on the Web and Databases (WebDB), Dallas, May. 2000. 2000:151-170
    4 T. Shimura, M. Yoshikawa, and S. Uemura. Storage and Retrieval of XML Documents Using Object-Relational Databases. In Database and Expert Systems Applications. Springer, 1999:206-217
    5 M. Carey, D. Florescu. XPERANTO: Publishing Object-Relational Data as XML. In Proc. of the Int. Workshop on Web and Databases (WebDB). 2000:105-110
    6 J. Shanmugasundaram, K. Tufte, and G. He. Relational Databases for Querying XML Documents: Limitations and Opportunities. VLDB Conference. September 1999:302-314
    7 Gerti Kappel and Elisabeth Kapsammer. X-Ray Towards Integrating XML and Relational Database Systems. International Conference on Conceptual Modeling. 2000: 339-353
    8 D. Florescu, D. Kossman. A Performance Evaluation of Alternative Mapping Schemes for Storing XML Data in a Relational Database. Rapport de Recherche No. 3680 INRIA, Rocquencourt, France, May 1999:31
    9 I. Tatarinov, Z.G. Ives, and A.Y. Halevy. Updating XML. In Proc. ACM SIGMOD Int. Conf. on Management of Data. 2001: 413-424
    10 J.Shanmugasundaram, E.Shekita. Efficiently Publishing Relational Data as Documents. VLDB, 2000:133-154
    11 J. McHugh, S. Abiteboul, and R. Goldman. Lore: A Database Management System for Semistructured Data. Simod Record26(3),54-66, September 1997
    12 Carl-Christian Kanne, Guido Moerkotte. Efficient Storage of XML Data. ICDE, 2000:198
    13 H. V. Jagadish, Shurug Al-Khalifa, and Laks Lakshmanan. Timber: A native XML database. Technical report, University of Michigan, April 2002
    14 Ning Zhang, M. Tamer Ozsu, Varun Kacholia. A Succinct Physical Storage Scheme for Single-Pass Evaluation of Next-of-Kin Path Queries in XML. ICDE, 2004:54
    15 T.Grust. Accelerating XPath Location Steps. Sigmod. June 2002:109-120
    16 Q.Li, B.Moon. Indexing and Querying XML Data for Regular Path Expressions. In Proceeding of 27th International Conference on Very Large Data Bases. September 2001:361-370
    17 David DeHaan, David Toman, and M. Tamer. A Comprehensive XQuery to SQL Translation Using Dynamic Interval Encoding. Sigmod. 2003:623-634
    18 Igor Tatarinov, Kevin Beyer, and Chun Zhang. Storing and Querying Ordered XML Using a Relational Database System. Sigmod. 2002:204-215
    19 Jiaheng Lu, Tok Wang Ling. From Region Encoding to Extended Dewey: On Efficient Processing of XML Twig Pattern Matching. VLDB. 2005:193-204
    20 R.Goldman, J.Widom. DataGuides: Enable Query Formulation and Optimization in Semistructured Databases. In Proceedings of 23rd International Conference on VLDB, 436-445, August 1997
    21 T.Milo and D.Suciu. Index Structures for Path Expressions. In Proceedings of 7th Conference on Database Theory. 1999: 277-295
    22 B.Cooper, N.Sample. A Fast Index for Semistructured Data. In Proceeding of 27th International Conference on Very Large Data Bases, September 2001:341-350
    23 Wei Wang, Haifeng Jiang, and Hongjun Lu. PBiTree Coding and Efficient Processing of Containment Joins. ICDE. 2003:391-402
    24 Qun Chen, Andrew Lim. D(K)-Index: An Adaptive Structural Summary for Graph-Structured Data. SIGMOD. 2003:134-144
    25 Hao He, Jun Yang. Workload-Aware Multiresolution Indexing for XML. ICDE. 2004:112
    26 R. Kaushik, P. Bohannon, J. F. Naughton. Covering Indexes for Branching Path Queries. Sigmod, Madison Wisconsin USA. ACM Press, 2002:133-144
    27 Wei Wang, Hongzhi Wang, and Jianzhong Li. Efficient Processing of XML Path Queries Using the Disk-based F&B Index. VLDB. 2005:145-156
    28 Raghav Kaushik, Philip Bohannon, Jeff Naughton. Updates for Structure Indexes. VLDB. 2002: 239-250
    29 Ke Yi, Hao He, Jun Yang. Incremental Maintenance of XML Structural Indexes. SIGMOD. 2004:491-502
    30 S. A. Khalifa, H. V. Jagadish, and Nick Koudas. Structural Joins: A Primitive for Efficient XML Query Pattern Matching. ICDE. 2002:263-274
    31 Chun Zhang, Jeffrey Naughton and David DeWitt. On Supporting Containment Queries in Relational Database Management Systems. SIGMOD. 2001:425-436
    32 T. Grust, M. Van Keulen, J. Teubner. Staircase Join: Teach A Relational DBMS to Watch Its (Axis) Steps. VLDB. 2003:524-525
    33 Yuqing Wu, J. M. Patel, and H. V. Jagadish. Structural Join Order Selection for XML Query Optimization. ICDE. 2003:443-454
    34 Wei Wang, Haifeng Jiang, Hongjun Lu. PBiTree Coding and Efficient Processing of Containment Joins. ICDE. 2003:391-402
    35 Shu-Yao Chien, Zografoula Vagena, Donghui Zhang. Efficient Structural Joins on Indexed XML Documents. VLDB. 2002:263-274
    36 H. Jiang, H. Lu. XR-Tree: Indexing XML Data for Efficient Structural Join. ICDE. 2003:253-263
    37 R. Kaushik, J. F. Naughton, R. Ramakrishnan. On the Integration of Structure Indexes and Inverted Lists. SIGMOD. 2004:779-790
    38 Nicolas Bruno, D. Srivastava, N. Koudas. Holistic Twig Joins: Optimal XML Pattern Matching. ACM SIGMOD. 2002:310-321
    39 Haifeng Jiang, Wei Wang, Hongjun Lu. Holistic Twig Joins on Indexed XML Documents. VLDB. 2003:273-284
    40 Haifeng Jiang, Hongjun Lu, Wei Wang. Efficient Processing of XML Twig Queries with OR-Predicates. SIGMOD. 2004:193-204
    41 A. Schmidt, F. Waas, M. J. Carey. XMark: A Benchmark for XML Data Management. VLDB. 2002:974-985
    42 DBLP XML records. Available at http://dblp.uni-trier.de/xml/
    43 TreeBank. Available at http://www.cs.washington.edu/research/xmldatasets/ data/Treebank
    44 R. Paige and R. E. Tarjan. Three Partition refinement algorithms. SIAM Journal on Computing, 16(6): 973-989, December 1987.
    45 Gene Ontology. http://www.geneontology.org.
    46 T. H. Cormen and et. al. Introduction to Algorithms (ISBN 0-262-530-910).MIT Press, 1994.
    47 XMark. The xml-benchmark project. http://www.xml-benchmark.org , Apr. 2001.
    48孔令波,唐世渭,杨冬青等.XML数据索引技术.软件学报.2005,16(12): 2063~2079
    49孟小峰,王宇,王小锋.XML查询优化研究.软件学报.2006,17(10): 2069~2086
    50孔令波,唐世渭,杨冬青.XML数据的查询技术.软件学报.2007,18(6):1400~1418
    51杨卫东,王清明,施伯乐.针对XML流数据的复杂twig Pattern查询处理.软件学报.2007,18(4):893~904
    52 G. Gou, R. Chirkova. Efficiently Querying Large XML Data Repositories: A Survey. IEEE Trans. Knowl. Data Eng. 2007, 19(10):1381–1403
    53 C. Gokhale, N. G. 0003, P. Kumar, et al. Complex Group-by Queries for XML. Proceedings of the 23nd International Conference on Data Engineering, ICDE 2007, April 15-20, 2007, The Marmara Hotel, Istanbul, Turkey. 2007:646–655
    54 N. Wiwatwattana, H. V. Jagadish, L. V. S. Lakshmanan, et al. X3: A Cube Operator for XML Olap. Proceedings of the 23nd International Conference on Data Engineering, ICDE 2007, April 15-20, 2007, The Marmara Hotel, Istanbul, Turkey. 2007:916–925

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700