基于GPU的查询技术并行化研究

英文题名：The Parallelization Research of Query Technology on GPU
作者：黄玉龙
论文级别：博士
学科专业名称：计算机应用技术
中文关键词：GPU通用计算 ; 线性哈希表批量插入 ; B+树批量构建及插入 ; 并行XPath求值 ; 分段Top-k查询 ; 查询技术
英文关键词：GPGPU ; batch insertion for linear hash table ; B+tree batch build and insertion ; parallel XPath evaluation ; segmented top-k query ; query technology
学位年度：2013
导师：奚建清
学科代码：081203
学位授予单位：华南理工大学
论文提交日期：2013-10-08

摘要

作为数据管理领域一类非常重要的技术，数据库技术在很多行业中得到了广泛应用。其中，查询技术是数据库技术的核心部分。当前，查询技术领域存在如何加快索引更新以及如何设计满足查询任务的高效查询算法这两个研究方向。因此，利用GPU等新兴并行硬件改善索引更新效率以及大规模数据的查询性能受到了越来越多的关注。在索引更新过程中，插入和删除是两个基本并且互逆操作。与此同时，随着互联网应用以及个性化服务的兴起，XML查询以及top-k查询也逐渐成为研究热点。为此，在总结以及分析相关工作的基础上，本文主要研究了以下内容：
     总结了现有的索引构建及插入、大规模数据查询算法。结合本文的工作内容，详细阐述了GPU通用计算技术及其主流开发工具—CUDA。在此基础上，进一步介绍和分析了本文使用的并行操作原语。
     针对当前线性哈希表的并发插入需要加锁的现状，结合GPU在排序方面拥有的显著优势，提出了一种无锁批量插入算法GSInsert4LHT以及相应的存储结构。在批量插入过程中，该算法采用一个线程插入一条记录策略，从而极大地提升了线性哈希表的插入性能。在不同参数设置条件下，与4线程多核算法相比，GSInsert4LHT算法的性能提升了10.26~13.54倍。在此基础上，进一步阐述了如何利用该算法来加快Wetoband系统的资源操作权限判断过程。
     为了优化B+树的构建及插入性能，首先设计了一种键值空间分配策略，在此基础上进一步给出了基于GPU的批量构建及插入算法CUBPT。实验表明，与4线程批量建树算法相比，CUBPT算法的建树性能提升了21.76~25.1倍。此外，由于CUBPT算法在键值的批量插入过程中负载非常均衡，因此该算法在此阶段的性能比4线程PALM算法提升了9.49~19.2倍。由于性能优势明显，在此还给出了如何将该算法集成到Wetoband系统，从而提升资源对象的查找效率。
     研究了基于GPU的XPath并行查询算法。首先分析了当前XPath并行查询算法M2以及GPU-Twig等在存储空间以及线程资源利用率方面的不足。在此基础上，扩展了流标签存储机制中节点的Zhang编码结构，提出了一种基于关系数组的大规模XPath查询算法GXQ。在构建关系数组时，GXQ算法首先计算出与每个关系节点关联的第一个以及第二个XML节点下标。在此基础上，快速构建关系数组。在并行查询过程中，采用对每个步骤的返回结果标记策略，从而减少了空间分配代价以及去重代价。通过实验比较发现，在不考虑M2算法中关系矩阵构建时间的情况下，与4线程M~2算法相比，GXQ算法的性能加速比最高可达16.1倍。
     研究快速top-k查询算法。针对当前并行top-k查询算法存储空间利用率差以及不具通用性等方面的劣势，结合经典的NRA以及LARA算法，设计了一种基于GPU的快速查询算法CNRA。与传统算法一次处理一个有序列表层次不同， CNRA算法在增长以及收缩阶段均采用一次处理STRIDE个层次的策略，从而显著减少了每个对象的分值上下界更新时间以及top-k对象的判断比较时间。实验结果表明算法的性能有了明显提升，最高可达107.6倍。然而当STRIDE值增大时，算法的性能提升效果反而有所下降，这与算法在更新分值下界过程中使用的策略有关。
As a very important technology in database management field, database technology iswidely used in a lot of industries. In here, query technology is the core component. Currently,the query technology field exist two main research directions. One is to find the fast indexupdate algorithm.The other is to design an efficient query algorithm which fits the query task.For this reason, how to leverage the emerging parallel hardware such as GPUs to improve theupdate efficiency of index structure and the query performance of large-scale data set hasdrawn more and more attentions. In the update process of index structure, insertion anddeletion are two basic operations and both are opposite. At the same time, with the widelyapplication of the Internet and recommendation service, XML query and top-k query havegradually become two research hotspots. So, On the basis of summary and analysis of relatedwork, the main content of this paper as follows:
     Summarizes the existing build and insertion method of index structure and the querymethod of large-scale data set. According to the research work of this paper, the GPUgeneral computing technology and its mainstream development tool--CUDA are describedin detail. On this basis, the parallel primitives which used in this paper have been furtherintroduction and analysis;
     For the concurrent insertion of current linear hash table need to lock, a lock-free batchrecords insertion algorithm--GSInsert4LHT and its data structure are proposed in thispaper which combined with the significant advantage in GPU sorting. By using thestrategy in which one thread is responsible for one record insertion, this algorithm cangreatly enhance the insertion performance of linear hash table. Under the condition ofdifferent parameter settings, the performance of this algorithm is improved by10.26~13.54times compared with four threads batch algorithm.On this basis,how to usethis algorithm to improve the judgement efficiency of resource operation authority of“Wetoband” system is proposed in detail.
     For improving the build and insertion performance of B+tree, a space allocation strategyis introduced firstly. On this basis, a batch build and insertion algorithm--CUBPT isfurther proposed. The experimental results show that the build performance of CUBPT is improved by21.76~25.1times compared with4threads algorithm. Furthermore, In theprocess of batch keys insertion, the performance of CUBPT is improved by9.49~19.2times compared with four threads PALM; this is because the workload of CUBPT is verybalanced.Because of the performance advantages, how to use this algorithm to improvethe query efficiency of resources in “Wetoband” system is described in detail.
     In here, the parallel XPathquery method on GPU architecture has been researched. Firstly,the deficiencies of two parallel XPath query methods M2and GPU-Twig in storage andthread use ratio is analyzed. On this basis, the Zhang coding structure of the nodes instream labels storage mechanism have been expanded and an large-scale XPath querymethod GXQ based on relation array is further proposed. In the stage of building relationarray, GXQ firstly calculate the first and second node indexes which associated with eachrelation node. And on this basis, quickly build the relation array. In the stage of parallelquery, GXQ mark the results of each execution step, so the cost of storage allocation anddeleting duplicate nodes is greatly reduced. The experimental results show that withoutconsidering the building time of relation matrix, the performance speedup of GXQ is alsoup to16.1times compared with four threads M2method.
     According to the deficiencies ofcurrent parallel top-k query methods in storage utilizationand generality, a fast top-k query algorithm on GPU architecture CNRA is proposed whichbased on the combination of NRA and LARA algorithm. Different with the traditionalalgorithms which process a data level of the ordered lists in one time, CNRA processSTRIDE data levels in one time both in growth and shrinkage stage, So the upper boundand lower bound updating time of aggregation scores and the comparing time of top-kcandidate objects are greatly reduced. The experimental results show that the performanceof CNRA is greatly improved. But when the value of STRIDE is larger, the performanceof this algorithm has declined. This is related to the updating score lower bound strategyused in this algorithm.

引文

[1] Sergey B, Lawrence P.The Anatomy of a Large-Scale Hypertextual Web Search Engine[J].ComputerNetworks and ISDN Systems,1998,30(1-7):107-117.
    [2] Wei Chu, Seung-Taek Park. Personalized Recommendation on Dynamic Content Using PredictiveBilinear Models[C], In Proceedings of WWW, Madrid, Spain: ACM,2009,691-700.
    [3] Ning Z,Cox AJ.Mullikin JC.SSAHA:a fast search method for large DNA databases[J].Genome Res11:1725-1729.
    [4] Surajit C,Umeshwar D.An Overview of Data Warehousing and OLAP Technology[J].ACM SIGMODRecord,1997,26(1):65-74.
    [5]R.Ramakrishnan, J.Gehrke. DatabaseManagementSystems(3rd Edition)[M],McGraw Hill,2007.
    [6]Samuel K. Moore. Multicore Is Bad News For SuperComputers[J]. IEEE Spectrum,2008,45(11):1-4.
    [7] NVIDIA Corporation. NVIDIA CUDA Programming Guide, version2.3[EB/OL], http://www.cs.unc.edu/~prins/Classes/633/Readings/CUDA_C_Programming_Guide_4.2.pdf,2009:7-15.
    [8] Sun X H. Scalable computing in the multicore era. In: Proceedings of the Inaugural Symposium onParallel Algorithms, Architechures and Programming,2008Sep16-18, Hefei. Hefei: University ofScience and Technology of China Press,2008,1-18.
    [9] J Owens, David L, N Govindaraju,et al. A Survey of General-Purpose Computation on GraphicsHardware[J]. Computer Graphics Forum,2007,26(1):80-113.
    [10]Ossama Y,Sonia F. Distributed Clustering in Ad-hoc Sensor Networks:A Hybrid, Energy-EfficientApproach[J]. IEEE Transactions on Mobile Computing,2004,3(4):366-379.
    [11]陈国良编著.并行计算--结构.算法.编程(修订版)[M].高等教育出版,2003.
    [12]Ruud van der Pas.An Introduction Into OpenMP[EB/OL].http://www.nic.uoregon.edu/iwomp2005/Iwomp2005_tutorial_openmp_rvdp.pdf,2013-5-15.
    [13] James Reinders. Intel Threading Building Blocks: Outfitting C++for Multi-core Processor Parallelism[M],O'Reilly Media,2007:78-79.
    [14] Amdahl, Gene. Validity of the Single Processor Approach to Achieving Large-Scale Computing Capabilities[C].In AFIPS Conference Proceedings,1967:483–485.
    [15]Mark Harris. Data-Parallel Algorithms on GPUs[EB/OL].http://gpgpu.org/static/sc2006/slides/02.harris.data_parallel_algorithms.pdf,2013-7-12.
    [16] W.Litwin. Linear hashing: a new tool for file and table addressing[C], in:International Conference onVery Large Databases,Montreal,1980:212-223.
    [17] Fagin R, Nievergelt J, Nicholas P, et al. Extendible Hashing-A Fast Access Method for Dynamic Files[J].ACM Transactions on Database Systems,1979,4(3):315–344.
    [18]Carla Ellis. Concurrency in linear hashing[J].ACM Transactions on Database Systems,1987,12(2):195-217.
    [19] S Narata, T Miura.On Inserting Bulk Data for Linear Hash Files[J].Communications in Computer andInformation Science,2011(136):409-422.
    [20]陈虎,唐海浩,廖江苗,等.面向批量插入优化的并行存储引擎MTPower[J],计算机学报,2010,33(8):1492-1499.
    [21] H Tang, H Chen,J Peng. Multithreaded Linear Hashing with Disk Buffer[C],in: First InternationalWorkshop on Database Technology,2009:435-439.
    [22] D Zhang,P Larson. LHlf: lock-free linear hashing[C].In: Proceedings of the17th ACM SIGPLANsymposium on Principles and Practice of Parallel Programming,2012:307-308.
    [23] Carla S. Ellis. Extendible hashing for concurrent operations and distributed data[C]. In: Proceedings ofthe2nd ACM SIGACT-SIGMOD symposium on Principles of database systems.1983:106-116.
    [24] V. Kumar. A concurrency control mechanism based on extendible hashing for main memory databasesystems[C].In: Proceedings of the17th conference on ACM Annual Computer Science Conference.1989:109-113.
    [25] Hirano Y, Miura F, Satoh T.Extendible Hashing for Concurrent Insertions and Retrievals[C]. InProceedings of PDP,1996:235-242.
    [26] J Gil,Y Matias.Simple Fast Parallel Hashing by Oblivious Execution[J].SIAM Journal on Computing,1998,27(5):1348-1375.
    [27] Steven van.V, Alfons L.A Parallel Compact Hash Table[C].In:Proceedings of the7th internationalconference on Mathematical and Engineering Methods in Computer Science.2011:191-204.
    [28] Maged M. Michael.High Performance Dynamic Lock-Free Hash Tables and List-Based Sets[C].In:Proceedings of the fourteenth annual ACM symposium on Parallel algorithms and architectures2002:73-82.
    [29] Jason Sanders,Edward Kandrot. CUDA by Example[M].Pearson Education,2010:258-276.
    [30] D.A.Alcantara, A Sharf, F Abbasinejad,et al. Real-Time Parallel Hashing on the GPU[J].ACMTransactions on Graphics.2009,28(5):1-9.
    [31] C.K.Wong,shi-Kuo Chang.Parallel Generation of Binary Search Trees[J].IEEE transactions oncomputers,1974,23(3):268-271.
    [32] J Feng, D.Q.Naiman, B Cooper.A Parallelized Binary Search Tree[J].Information Technology&Software Engineering.2011,1(1):1-5.
    [33] D Christensen,H.O Hansen, L Juul,et al. Efficient Implementation of CSStrees on GPGPUs[EB/OL].http://people.cs.aau.dk/~srba/courses/PDK-10/6/6.pdf,2013-6-23.
    [34] Fang R, He B,Yang K,et al.GPUQP: query co-processing using graphics processors[C]. In: Proc.SIGMOD.2007:1061–1063.
    [35] B. Bayer and E. M. McCreight.Organization and maintenance of large ordered indices[J]. Acta Inform.1972,1(3):173–189.
    [36] R. Bayer and M. Schkolnick. Concurrency of operations on B-trees[J]. Acta Inform.,1977,9(1):1–21.
    [37] P Bernstein, V Hadzilacos, N Goodman. Concurrency control and recovery in database systems[M].1987.
    [38] P. Lehman,S Yao. Efficient locking for concurrent operations on B-trees[J].ACM Transactions onDatabase Systems.1981,6(4):650–670.
    [39] J Sewall,J Chhugani,C Kim,et al.PALM: Parallel Architecture Friendly Latch-Free Modifications toB+Trees on ManyCore Processors[C].In:Processing of VLDB,2011:795-806.
    [40] K.Kaczmarski.Experimental B+-tree for GPU[C]. ADBIS2011Research Communications,2011:232-240.
    [41] Wilschut A.N,Apers P.M.G,Flokstra J.Parallel Query Execution in PRISMA/DB[C]. In: ParallelDatabase Systems, PRISMA Workshop,1990:24-26.
    [42] J Srivastava,G Elsesser.Optimizing Multi-Join Queries in Parallel Relation Databases[C]. In: Proceed-ings of the second international conference on Parallel and distributed information systems.1993:84-92.
    [43] H Lu, M Shan,K Tan.Optimization of Multi-Way Join Queries for Parallel Execution[C]. In:Proceedings of VLDB'91.1991:549-560
    [44] S Wu, F Li, S Mehrotra,et al.Query Optimization for Massively Parallel Data Processing for ParallelExecution[C]. In:Proceedings of the2nd ACM Symposium on Cloud Computing.2011.
    [45] P Bakkum,K Skadron.Accelerating SQL Database Operations on a GPU with CUDA[C]. In: Proceedings of the3rd Workshop on General-Purpose Computation on Graphics Processing Units.2010:94-103.
    [46] B He,K Yang,R Fang,et al.Relational joins on graphics processors[C].In:Proceedings of the ACMSIGMOD.2008:511-524.
    [47] X Wang, A Zhou, J He,et al.MQX: Multi-Query Engine for Compressed XML Data[C].In:theProceeding of ACM SIGIR,2007:897-897.
    [48] J Min,M Park, C Chung.XTREAM: An efficient multi-query evaluation on streaming XML data [J].Information Sciences,2007,177(17):3519-3538.
    [49] R. Bordawekar, L. Lim, O Shmueli. Parallelization of xpath queries using multi-core processors:challenges and experiences[C]. In Proceedings of the12th International Conference on ExtendingDatabase Technology: Advances in Database Technology.2009:180-191.
    [50] R. Bordawekar, L.Lim, A Kementsietsidis,et al.Statistics-based parallelization of xpath queries inshared memory systems[C]. In:Proceedings of the International Conference on Extending DatabaseTechnology.2010:159-170.
    [51]陈荣鑫,廖湖声.M2:一种有效的XPath求值方法[J].计算机科学,2011,38(2):160-165.
    [52] X Si, A Yin,X Huang,et al.Parallel optimization of queries in xml dataset using GPU[C]. In:FourthInternational Symposium on Parallel Architectures, Algorithms and Programming.2011:190-194
    [53] L Shnaiderman, O Shmueli. A Parallel Twig Join Algorithm for XML Processing using a GPGPU[C].In:Third International Workshop on Accelerating Data Management Systems Using Modern Processorand Storage Architectures,2012.
    [54]吴超,孙广中,陈国良.多核平台上Top-k查询的性能优化[J].小型微型计算机系统,2012,33(7):1489-1492.
    [55] T Luo,G Sun,G Chen.Performance Optimization of Top-k Queries on GPU[C].In:Fourth InternationalSymposium on Parallel Architectures, Algorithms and Programming.2011:9–13.
    [56] H Chang,T Qin, X Liu,et al.Top-k Queries ProcessingWith Uncertain Data on Graphics ProcessingUnits[C].In:2011Fourth International Symposium on Parallel Architectures, Algorithms andProgramming,310–314.
    [57]吴恩华.图形处理器用于通用计算的技术、现状与其挑战[J].软件学报.2004,15(10):1493-1504.
    [58] NVIDIA Corporation. NVIDIA CUDA Compute Unified Device Architecture ProgrammingGuide[M],2007:10-13.
    [59] W Fang, K. Lau,M Lu, et al. Parallel Data Mining on Graphics Processors[R]. Technical ReportHKUST-CS08-07, Hong Kong Univ. of Science and Technology (HKUST),2008
    [60]李海峰.基于GPU的闭合频繁项集挖掘方法[J].计算机工程,2011,37(14):59-61.
    [61] Robin M. Weiss. GPU-Accelerated Data Mining with Swarm Intelligence[EB/OL].http://metis logic.net/thesis.pdf,2013-5-25.
    [62] Z Luo, H Liu, X Wu.Artificial Neural Network Computation on Graphic Process Unit[C].In:Proceedings of the International Joint Conference on Neural Networks,2005:622-626
    [63] R.D Prabhu.GNeuron: Parallel Neural Networks with GPU[EB/OL]. http://hipc.org/hipc2007/posters/GNeuron.pdf2013-02-25.
    [64] D Cederman P Tsigas.GPU-Quicksort: A practical Quicksort algorithm for graphics processors[J].Journal of Experimental Algorithmics,2009,14(4):1-22.
    [65] E Sintorn,U Assarsson. Fast Parallel GPU-Sorting Using a Hybrid Algorithm[J].Journal of Parallel andDistributed Computing.2008,68(10):1381-1388.
    [66] L Luo, L juan.Parallel implementation of R-trees on the GPU[C].In:17th Asia and South PacificDesign Automation Conference.2012,353-358.
    [67] Justin Hensley. AMD CTM overview[C].In:proceeding of SIGGRAPH’07.2007.
    [68] C Kim, J Chhugani,N Satish,et al. FAST: Fast Architecture Sensitive Tree Search on Modern CPUsand GPUs[C]. In: Proceedings of the2010international conference on Management of data.2010:339-350.
    [69] J. Fix, A. Wilkes, K Skadron. Accelerating Braided B+Tree Searches on a GPU with CUDA[C]. InProceedings of the2nd Workshop on Applications for Multi and Many Core Processors: Analysis,Implementation, and Performance,2011:1-11.
    [70] T Kaldewey,J Hagen.Parallel search on video cards[C].Proceedings of the First USENIX conferenceon Hot topics in parallelism,2009:9-15.
    [71] J Gilger,J Barnickel, U Meyer. GPU-Acceleration of Block Ciphers in the OpenSSL CryptographicLibrary[C]. In:Proceedings of the15th international conference on Information Security.2012:338-353.
    [72] S Neves.Cryptography in GPUs[EB/OL].http://eden.dei.uc.pt/~sneves/gpucrypto.pdf.2013-7-23.
    [73] W. B. Langdon. PRNG Random Numbers on GPU[EB/OL]. http://www.essex.ac.uk/csee/research/publications/technicalreports/2007/ces-477.pdf.2013-6-25.
    [74] WM Pang,T Wong,P Heng.Generating Massive High-Quality Random Numbers using GPU[C].In:IEEE World Congress on Computational Intelligence.2008:841–847.
    [75] C.J. Fluke, D.G. Barnes, B.R Barsdell,et al.Astrophysical Supercomputing with GPUs: CriticalDecisions for Early Adopters [J].Publications of the Astronomical Society of Australia.2010,28(1)15–27.
    [76] J Michalakes,M Vachharajani. GPU Acceleration of Numerical Weather Prediction[C].In: IEEEInternational Symposium on Parallel and Distributed Processing,2008:1-7.
    [77] P D.Vouzis,N V.Sahinidis. GPU-BLAST: using graphics processors to accelerate protein sequencealignment[J]. BIOINFORMATICS.2011,27(2):182–188.
    [78] J.Hoverock,N.Bell.Thrust::A parallel template library,2011,Version1.4.0.
    [79] Douglas C. Schmidt.The C++Standard Template Library[EB/OL]. http://www.dre.vanderbilt.edu/~schmidt/PDF/stl4.pdf,2013-7-10.
    [80] N Bell,J Hoberock.Thrust: A Productivity-Oriented Library for CUDA[EB/OL].http://sbel.wisc.edu/Courses/ME964/Literature/thrustGPUgems2011.pdf,2013-9-10.
    [81] B He, N K.Govindaraju, Q Luo,et al.Efficient Gather and Scatter Operations on Graphics Processors[C]. In: Proceedings of the2007ACM/IEEE conference on Supercomputing.2007:46.
    [82] PerAke Larson.Dynamic Hash Tables[J].Communications of the ACM.1988,31(4):446–457.
    [83]严蔚敏,吴伟民.数据结构(C语言版)[M].清华大学出版社,2007:238-247.
    [84] Sang-Wook Kim,Hee-Sun Won. Batch-Construction of B+-Trees[C]. In:Proceedings of the2001ACM symposium on Applied computing,2001,231-235.
    [85] J Liao,H Chen,Y Yuan,et al.Parallel Batch B+-tree Insertion on Multi-core Architectures [C].In: FifthInternational Conference on Frontier of Computer Science and Technology,2010,30-35
    [86] L G. Valiant. A bridging model for parallel computation[J]. Communications of the ACM.1990:103-111.
    [87] Roger M,Robert H,Mariam S,et al. Efficient XML Path Filtering Using GPUs[C]. In:InternationalWorkshop on Accelerating Data Management Systems,2011.
    [88] N M.Josuttis.The C++Standard Library(second edition)[M].Addison-Wesley.2011.
    [89]孔令波,唐世渭,杨冬青,等.XML数据的查询技术[J].软件学报.2007,18(6):1400-1418.
    [90] Philip Wadler.XQuery:a typed functional language for querying XML[C].Advanced FunctionalProgramming,2003:188-212.
    [91] T Schwentick. XPath Query Containment[J]. ACM SIGMOD Record.2004,33(1):101-109.
    [92] S Wang.A Data Parallel Approach to XML Parsing and Query[C].In: IEEE International Conferenceon High Performance Computing and Communications,2011:520-527.
    [93] Y Zhang, Y Pan, Chiu K. A Parallel XPath Engine Based on Concurrent NFA Execution[C].In:16thInternational Conference on Parallel and Distributed Systems.2010:314-321.
    [94] Bruno N,Koudas N, Divesh S.Holistic twig joins:optimal XML pattern matching[C].In:Proceedings ofthe SIGMOD Conference,2002:310-321.
    [95] Lu Q,Jeffrey X.Y,Bolin D.TwigList:make twig pattern matching fast[C].In:Proceedings of theDASFAA Conference.2007:850-862.
    [96] R Fagin, A Lotem, M Naor.Optimal Aggregation Algorithms for Middleware[C].In: Proceedings ofthe twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems,2001:102-113.
    [97] Guntzer U, Tubingen U,Balke W.T,et al.Towards Efficient Multi-Feature Queries in HeterogeneousEnvironments [C]. In: Proceedings of the IEEE International Conference on Information Technology:Coding and Computing.2001:622–628.
    [98] Mamoulis N, Yiu M.H, K.H Cheng,et al.Effic ient Top-k Aggregation of Ranked Inputs[J]. ACMTransactions on Database Systems.2007,32(3):19.
    [99] Jing Yuan, Guang-Zhong Sun.Selective-NRA Algorithms for Top-k Queries[C].In:Proceedings ofWAIM Conference.2009:15-26.
    [100] Fagin Ronald, Kumar Ravi, Sivakumar D. Efficient similarity search and classification via rankaggregation[C].In: Proceedings of the ACM SIGMOD.2003:301-312.
    [101]C. Zhang, J. Naughton, D.DeWitt,et al. On supporting containment queries in relational databasemanagement systems[C]. In: Proceedings of ACM SIGMOD Conference,2001:425-436.
    [102] LINC Laboratory University of Pennsylvania.University of Pennsylvania Treebank Project[EB/OL].http://www.cs.washington.edu/research/xmldatasets/www/repository.html,2012-12-25.
    [103] Satish N,Harris M,Garland M,et al. Designing Efficient Sorting Algorithms for Manycore GPUs[C].In: IEEE International Symposium on Parallel&Distributed Processing,2009:1-10.
    [104]周国亮,冯海军,何国明等.图形处理器在数据管理领域的应用研究综述[J].计算机科学与探索.2010,4(4):289-303.
    [105] Harris M,S Sengupta.Parallel Prefix Sum (Scan) with CUDA[M]. In GPU Gems3,Addison-Wesley,2008:851-876.
    [106] Neil Immerman.Expressibility and parallel complexity[J].Siam Journal on Computing,1989,18(3):625-638.
    [107]S.Howley,J.Jones.A Non-Blocking Internal Binary Search Tree[C].In:Proceedings of the24th ACMsymposium on Parallelism in algorithms and architectures,2012:161-171.
    [108]M.Son,H.Im. Parallel Top-k Query Processing using MapReduce[EB/OL]. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.120.4167,2013-8-25.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700