分布式查询优化中基于数据压缩的全归约算法研究与设计
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
分布式查询优化的研究领域是分布式数据库中的研究热点。由于集中式数据库和分布式数据库的区别在于,分布式数据库需要站点间的数据传输。所以大多数研究分布式查询优化以减少通讯开销为目标。而分布式数据库中查询优化是NP完全问题,至今都没有得到彻底地解决,里面尚有许多问题值得研究和探讨。既有理论上的问题,也有实际应用中的问题。这些问题在当前显得尤为重要。我正是在前人研究的问题基础上,对分布式查询优化方面的算法进行了深入的研究,取得了显著效果。
     本文的研究主要集中于分布式查询优化策略。本文是以通信传输开销作为主要优化目标,以半联接运算作为主要手段。对于用传统的SDD-1算法来寻找全归约程序进行改进。本文试图从以下方面对这类算法进行改进:(1)在数据传输过程中,引进数据压缩技术,提出不同于现有的压缩编码模式-标记模式,并给出TAG算法的实现。我们只要仔细地想想就会发现数据库中的数据很多都是重复的,因为具有相同特征的数据都被设计到同一个属性段中。数据压缩技术不就是利用了数据的重复性而被提出的吗?数据压缩的有关算法,从最初的Shannon-Fano编码方法,到Huffman编码、Lempel-Ziv编码,直到向极限挑战的算术编码,让人们一次又一次地惊讶于它的压缩效果。把数据压缩的思想带到数据库中来,同样会带给你意想不到的效果。本文正是基于这样的出发点,提出在数据库中使用数据压缩技术,从而减少通讯代价;(2)在数据压缩的研究工作基础上,提出FRD算法,嵌入了TAG算法。FRD算法采用半联接程序,以构造一个全归约程序,最大可能地减少通讯代价。该算法不需要像SDD-1的算法那样去维护一个庞大的数据库静态特征表。更不用穷举所有可能的半联接程序以对其收益进行评估。也避免了由于半联接程序的启动的次序不同而引起的种种问题。
Distributed query optimization research is under way in universities,research laboratories and similar establishments. It is a very active research area in distributed database systems. There is difference between centralized database and distributed database,which distributed database needs data transportation. So most research of distributed query optimization aim at reducing communication cost. The distributed query optimization problem is the key point in distributed database systems,and it is an NP hard one. By keeping the above facts,I delved deeply in distributed query optimization algorithms,made some notable achievement.
    The paper focuses primarily on distributed query optimization. In distributed query optimization,this paper aims at how to reduce communication cost,adopting semi-join program. As for traditional SDD-1 algorithm,this paper tries to improve it in the following aspects:(1) During data transportation,this paper adopts data compression technology. Another encoding model -TAG model- is put forward here which is different from existing models. And TAG algorithm is given here to implement TAG model;(2) Based on above work,this paper also provides FRD algorithm by using TAG algorithm. FRD algorithm uses semi-join program,in order to find a full reducer.
引文
[1]Surajit Chaudhuri "An Overview of Query Optimization in Relational Systems''
    [2]Jo-Mei Chang "A Heuristic Approach to Distributed Query Processing"
    [3]Stephane Lafortune, Eugene Wong "A State Transition Model for Distributed Query Processing"
    [4]Donald Kossmann "The State of the Art in Distributed Query Processing" ACM Computing Surveys, Vol.32, No. 4, December 2000
    [5]S.Ceri, G. Pelagatti "Correctness of Query Execution Strategies in Distributed Databases" ACM. TODS, 8:4 (1983)
    [6]Arbee L.p. Chen, Victor O.K.Li "Optimizing Star Queries In a Distributed Database System"
    [7]Philip A.Bernstein, Nathan Goodman "Query Processing in a System for Distributed Databases(SDD-1)" ACM TODS Vol.6 Dec. 1981
    [8]Phlip A.Bernstein, Dah-Ming W Chiu "Using semi-joins to solve relational query" ACM 28:1 (1981)
    [9]Selinger, P. Astrahan, M.Chamberlin, D.Lorie Access path selection in a relational database management system In Proceeding of the ACM SIGMOD Conference on Management of Data (Bostom, MA, May)
    [10]S.B.Yao "Distributed Query Optimization on Local Area Network" ACM Tran Office Information Sys., Jan (1985)
    [11]G.M.Sacco, S.B.Yao "Query Optimizaiotn in Distributed Database System" in Advances in computer Vol.21 New York: Academic (1982)
    [12]P.M.G. Apers, A.R.Hevner, S.B.Yao "Optimization Algorithms for Distributed Queries" IEEE-TSE 9:1 (1983)
    [13]S.A.Mahmond et al "Distributed Database Partitioning and Query Processing" In Proc. IFIP 1979
    [14]S.Ceri, S.B.Navathe "A Methodology for the Distribution Design of Database" Proc. Compcon 83. San Francisco March 1983
    [15]Apers P "Data allocation in distributed DBMS" ACM Transactions on Database Systems 13,3 (Sept.) 1988
    [16]Du, W., Krishnamurthy, R., Shan, M.C. "Query Optimization in Heterogeneous DBMS" In Proceedings of the Conference on Very Large Databases (VLDB) (Vancouver, Canada,Aug) 1992
    [17]Ganguly S., Goel. A., Silberschatz A. "Efficient and Accurate Cost Models for Prallel Query Optimization" In Proceedings of the ACM SIGMOD/SIGACT Conference on Principles of Database Systems (PODS) (Montreal,Canada, June) 1996
    [18]Haas. L., Kossmann. D., Wimmers. E., Yang. J. "Optimizing Queries across Diverse Data Sources" In Proceedings of the Conference on Very Large Data Bases (VLDB) (Athens, Greece, Aug.) 1997
    [19]Mchugh. J., Widon J. "Query Optimizaton for XML" In Proceedings of the Conference on Very Large Data Bases (VLDB) (Edinburgh, GB, Sept.) 1999
    [20]P. Apers, A. Hevner, and S. B. Yao. Optimization algorithms for distributed queries. IEEE Trans. Software Eng., 9(1), 1983.
    [21]Ceri S., Pelagatti G. Distributed Databases-principles and Systems McGraw Hill Inc., New York, San Francisco, Washington, D.C. 1984
    
    
    [22]Graefe G., Mckenna W. The volcano optimizer generator: Extensibility and efficient search.In Proceeding of the IEEE conference on Data Engineering (Vienna, Austria, April) 209-218 1993
    [23]Graefe G. The cascades framework for query optimization IEEE Data Engineering Bulletin 18.3 (Sept) 1995
    [24]Graefe G., Dewitt D. The EXODUS optimizer generator In Proceedings of the ACM SIGNOD Conference on Management of Data (San Francisco, CA, May) 1987
    [25]Kossmann D., Stocker K. Iterative Dynamic Programming : A New Class of Query Optimization Algorithms ACM Transactions on Database Systems 25,1 (March) 2000
    [26]M.Steinbrunn. G. Moerkotte., A. Kemper. Heuristic and randomized optimization for the join ordering problem The VLDB Journal 6 (3) 1997
    [27]S.Ruth, P. Keutzer Databse Compression for Business Files Datamation, 18, Sept. 1972
    [28]P. Alsberg Space and Time Savings Through Large Database Compression and Dynamic Restructuring Proc. IEEE 63(8), Aug. 1975
    [29]B. Iyer, D. Wihite. Data Compression: A Method to Improve Price Performance for Databases Technical Report Under preparation 1994
    [30]M. Bassiouni Data Compression in scientific and Statistical Databases IEEE Trans. On Software Engineering, SE-11 (10), Oct. 1985
    [31]G. Graefe, L. Shapiro. Data Compression and Database performance Proc. Of ACM/IEEE Computer Science Symp. On Applied Computing, Kansas City, Apr. 1991
    [32]G. Cormack Data Compression in a Databse System Communications of the ACM, 28, 12, Dec. 1985
    [33]A.L.P. Chen, V.D.K.Li "Improvement algorithms for semi-join query processing programs in distributed database systems" IEEE Trans on computers c-33:11 (1984)
    [34]M. Jarke, J.Koch Query Optimization in Database Systems Computing Surveys, June 1984, 16 (2): 227-269
    [35]Zhiyuan Chen, Johannes Gehrke, Flip Korn Query Optimization in Compressed Database Systems ACM SIGMOD 2001 May 21-24
    [36]文思软件工作室 数据压缩教程 http://artpro.533.net/compress/content.htm
    [37]金重奇,王云枫“分布式数据库中等值联接查询优化的一个新算法” 计算机学报第9期 1987年9月
    [38]刘梦赤,石树刚,郑振媚 “分布式关系数据库系统的数据分布与查询处理” 计算机学报 第10期 1987年10月
    [39]郝忠孝 “分布式关系数据库系统物理查询中场地分配算法”计算机研究与发展 1991年第8期
    [40]洪伟“改进的分布式查询优化算法” 第五届全国数据库及第一届全国管理信息学术会议论文 1986年8月
    [41]何亚原“WDDBS全局查询到局部查询的转换与实现” 计算机研究与发展 1988年第9期
    [42]王以和,涂小平 《分布式数据库系统》 电子工业出版社 1988年6月
    [43]郑振媚,于戈,郭敏 《分布式数据库》 科学出版社 1999年7月
    [44]贾焰,王志英,韩伟红,李霖 《分布式数据库技术》 国防工业出版社 2000年7月
    [45]孟小峰,王珊 《数据库系统导论》机械工业出版社 2000年10月

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700