云计算系统中索引与查询处理技术研究

英文题名：Research on Indexing and Query Processing in Cloud Computing Systems
作者：王金宝
论文级别：博士
学科专业名称：计算机软件与理论
中文关键词：云计算 ; 空间数据库 ; 查询处理 ; 聚集查询
英文关键词：cloud computing ; spatial database ; query processing ; aggregation
学位年度：2013
导师：高宏
学科代码：081202
学位授予单位：哈尔滨工业大学
论文提交日期：2013-06-01

摘要

云计算是一种新兴的计算模式，它隐藏了计算资源以及计算的执行过程，用户只需要通过浏览器或者应用程序界面提交计算任务或者服务请求，而不必考虑如何构建计算架构；如何组织、调度计算资源；如何使用计算资源完成计算任务或者服务请求。随着越来越多的数据和应用服务从超级服务器迁移到公有云计算系统或私有云计算系统中，如何在云计算系统中有效地进行数据管理并成为一项具有重要意义的研究工作。与传统的数据管理相比，云计算系统中的数据管理需要提供良好的可扩展性以及高效的数据存取能力。查询处理和查询优化也是云计算系统中的数据管理的核心技术，查询性能严重影响用户使用云计算系统的服务质量。索引技术在数据管理系统中能够有效地提高查询处理性能，减少查询使用的CPU时间、磁盘读取等操作，以此提高查询处理性能，云计算系统也需要构建有效的索引结构来提高查询处理性能。综上所述，云计算系统中的查询处理和索引技术是具有重要意义的研究课题。然而现有工作主要针对MapReduce计算框架下的并行数据分析操作，对于其他查询类型还缺乏研究成果。
     本文的研究工作针对云计算系统中的查询处理技术和索引技术，运用数据管理技术、计算复杂性和算法学的理论和知识，针对云计算系统中不同查询类型的处理算法和不同的数据类型的索引技术进行研究。本文的主要研究工作包括以下方面：
     首先，本文提出了云计算系统中的多维索引结构。现有研究工作主要针对单个计算节点或者服务器-客户端模型，在大量计算节点构成的云计算系统中，使用单个计算节点中的多维索引将造成系统性能的瓶颈，无法提供良好的可扩展性。本文提出了一种两层的索引结构，在云计算系统中支持多维数据查询，查询过程中索引结构在大量计算节点之间和单个计算节点的外存中同时为查询提供剪枝能力，提高系统的吞吐量。本文给出了该索引结构的构建方法、维护方法，并在索引维护过程中提出优化策略，进一步提高查询处理的吞吐量。本文针对云计算系统中的多维数据提出点查询、范围查询和k最近邻查询的处理算法，包括计算节点之间的分布式算法和计算节点内部的索引选择算法。真实云计算平台中的实验验证了本文提出的云计算多维索引结构的有效性。
     第二，本文提出了云计算系统中字符串相似性查询的算法。现有的字符串相似性查询技术都针对单个计算节点，在处理大规模字符串数据集合时将导致内存溢出和外存溢出两个问题。针对以上问题，本文提出一种分布式索引结构，在云计算系统中支持字符串相似性查询。为了获取更好的本地查询效率，本文将现有的字符串查询优化技术应用在外存环境中，设计了支持长度过滤器和位置过滤器的外存索引结构，及其构建方法和实现细节。在查询过程中，使用非对称的字符串概要模式，自适应地从查询字符串的数据概要集合中选择一部分元素，用于获取查询使用的倒排表。为了减少查询在系统中使用计算节点的数量，本文设计了基于字符向量的数据划分方法，用于划分字符串数据集合。该方法将相似的字符串划分到相同的计算节点中，并在查询处理过程中确定查询需要访问的计算节点集合。模拟实验结果验证了本文提出的字符串相似性查询算法的有效性。
     第三，本文提出了云计算系统中空间近似关键字查询算法。现有工作集中于单个计算节点中的索引结构，并提出了内存中的精确算法和外存中的近似算法。然而，由于单个计算节点的CPU计算能力和磁盘带宽有限，内存方法乃至外存方法都无法满足系统对性能的要求。本文设计了一种两层索引结构，支持空间近似关键字查询，提高系统相应查询的吞吐量。本文设计一种新颖的树形索引，在外存中支持空间近似关键字查询，并高效地返回完整的查询结果。本文的全局索引将整个空间划分成多个划分，全局索引维护在各个计算节点的内存中，用于加速查询处理过程。本文给出了全局索引选择方法，用于全局索引的初始化和周期性维护。在查询处理方面，本文给出了基于编辑距离的范围近似关键字查询和最近邻近似关键字查询的算法。在分布式集群中的实验结果验证了本文提出的索引结构的有效性。
     第四，本文提出了云计算系统中多维聚集查询处理算法。现有云计算中的研究工作MapReduce计算框架缺乏对多维数据中聚集操作的有效支持。另一方面，使用MapReduce计算框架需要启动大量计算节点，造成巨大的系统功耗。针对以上问题，本文提出了云计算系统中的多维聚集操作方案，通过两层索引结构减少参与查询的计算节点以及单个计算节点中聚集操作计算量。本文给出了使用两层索引结构处理多维聚集查询的算法框架，并在该框架中提出了性能优先模式和低功耗模式下的多维聚集查询算法。在两种模式下的多维聚集算法中提出了查询分配问题，并证明了两种模式中查询分配问题都是NP完全问题。本文给出解决两个NP完全问题的近似算法，并证明了两个近似算法的近似比。理论证明分析和模拟实验结果验证了本文提出的多维聚集查询方案的有效性。
Cloud computing is a recently developping computational framework. Clients ac-cess cloud resources and submit computing tasks through web browsers, with nothingtaken into accounts on how to build computing hardwares. As more and more dataset-s and services are delivered into cloud, building efcient data management systems incloud becomes an essential task for research community. Cloud systems provide scala-bility, which supports large scale data analysis and highly cocurrent transactions. As thatof existing data management systems, query processing is significant since it has large ef-fects on service level agreements. Query processing is also important for various servicesprovided in cloud, including IaaS(Infrastructure as a Service) and so on. Indexes are wellstudied for reducing CPU time, disk access operations for data management systems toimprove query performance. It is also expected to play the same role in cloud systems.As illustrated above, query processing and indexes are essential topics in cloud systems,however, existing works focus on key-value data in MapReduce framework, while otherare still in lack of attention.
     This thesis focuses on indexes and query processing in cloud systems, and designsindexes together with query processing algorithms based on data management techniques,computational theory and algorithmic technology. A wide range of data types and querytypes are considered in this thesis, and the following achievements are obtained.
     First, we propose a multi-dimensional index in cloud systems. Existing works main-ly focus on indexes in a single server or server-client structure. Unfortunately, such worksfail to provide efciency since performance bottleneck is introduced. This paper designsa two-layered index structure to prune search space among computing nodes for queryprocessing. The index reduces the number of involved computing nodes while query pro-cessing, and improves I/O efciency inside a single server. The initiation method andmaintenance methods are proposed for the index, together with optimization strategiesfor improving query throuphput. This thesis designs the point query algorithm, the rangequery algorithm and the kNN query algorithm for cloud systems, including distributed al-gorithms among computing nodes, and optimization strategies inside a single computingnode.
     Second, we target at string similarity search in cloud systems. Existing works fo-cus on query processing within a single server, and it incurs main memory overflow andexternal memory overflow while dealing with big data. For the above problems, we pro-poses a distributed index to support string similarity search in cloud environments. Toprovide efcient searching in a single node, an external memory index is designed, whichadopts multiple filtering techniques and optimizing strategies. The external memory res-ident index supports length filter, positional filter in disks. This paper proposes the indexconstruction method. During query processing, asymmetric q-gram is used to reduce thenumber of inverted lists read from disks. An adaptive algorithm is given to choose in-verted lists, and seek the tradeof between two aspects of query cost. The global indexpartitions the entire string dataset according the content of strings, and a char vector spacepartition method is proposed. In char vector space partition method, similar strings arepartitioned into the same computing nodes, thus the number of computing nodes involvedin a single query is reduced. The partition method is also adopted to determine necessarycomputing node set for a query to access. Simulation results validate the efciency andefectiveness of our proposed index.
     Third, we propose spatial approximate keyword query algorithms for cloud systems.Existing work targets on single server solutions, and an exact algorithm is given in mem-ory while another approximate algorithm is given for disk resident datasets. However, asingle server fails to provide reasonable throughput due to the limited CPU time and diskbandwidth. Facing the above challenges, this paper gives a two-layered index consistingof global index and local index, which works in a shared nothing cluster for larger querythroughput. This paper designs a novel external memory index as local index, which re-turns exact answer within disks efciently. It is equiped with keyword set signature andmultiple optimizing strategies to reduce I/O cost. The global index partitions the entirespatial space, and each computing node in system maintains a partition. A global indexselection algorithm is given. This paper also provides spaital approximate keyword queryalgorithms based edit distance, including range and the nearest neighbor spatial condi-tions. Experiments in a shared nothing cluster illustrates the efciency and efectivenessof our proposed index and query algorithms.
     Fourth, we propose multi-dimensional aggregation algorithms for cloud systems.Existing works focus on key-value data in MapReduce framework, and multi-dimensionaldata is not well considered. What is more, MapReduce framework launches a large num- ber of computing nodes for a single query and costs huge amounts of energy. For theabove problems, this paper designs a solution for answering aggregation queries in cloudthrough a two-layerd index, which reduces the number of computing nodes involved in asingle query and eliminates unnecessary computing time in a single server. The construc-tion and maintenance methods are given, together with a query framework. Followingthe given query framework, this paper proposes two aggregation algorithms under perfor-mance priori mode and low-power mode. Two sub-query scheduling problems are definedfor each of aggregation algorithms given above, and they are proved to be NP-Complete.Then, approximate algorithms with reasonable lower bounds are designed. Theoreticalanalysis and simulations validate the efciency and efectiveness of aggregation algo-rithms proposed in this paper.

引文

[1]李伟,虎嵩林,刘冬梅,等.云计算环境下基于社区聚集的绿色消息系统[J].计算机学报,2012,35(6):1321–1336.
    [2]王元卓,靳小龙,程学旗.网络大数据：现状与展望[J].计算机学报,2013,36(6):1–15.
    [3]史英杰,孟小峰.云数据管理系统中查询技术研究综[J].计算机学报,2013,36(2):209–225.
    [4] Ghemawat S, Gobiof H, Leung S T. The Google file system[C]//Proceedings ofthe nineteenth ACM symposium on Operating systems principles. New York, NY,USA: ACM,2003:29–43.
    [5] Chang F, Dean J, Ghemawat S, et al. Bigtable: A Distributed Storage System forStructured Data[J]. ACM Trans. Comput. Syst.,2008,26(2):1–26.
    [6] Dean J, Ghemawat S. MapReduce: Simplified Data Processing on Large Cluster-s[J]. Commun. ACM,2008,51(1):107–113.
    [7] DeCandia G, Hastorun D, Jampani M, et al. Dynamo: Amazon’s Highly Avail-able Key-Value Store[C]//Proceedings of twenty-first ACM SIGOPS symposiumon Operating systems principles. New York, NY, USA: ACM,2007:205–220.
    [8] Brantner M, Florescu D, Graf D, et al. Building a Database on S3[C]//Proceedingsof the2008ACM SIGMOD international conference on Management of Data. NewYork, NY, USA: ACM,2008:251–264.
    [9] Cooper B F, Ramakrishnan R, Srivastava U, et al. PNUTS: Yahoo!’s Hosted DataServing Platform[J]. Proc. VLDB Endow.,2008,1(2):1277–1288.
    [10] Group C S. Community Systems Research at Yahoo![J]. SIGMOD Rec.,2007,36(3):47–54.
    [11] Olston C, Reed B, Srivastava U, et al. Pig Latin: A Not-So-Foreign Language forData Processing[C]//Proceedings of the2008ACM SIGMOD international confer-ence on Management of Data. New York, NY, USA: ACM,2008:1099–1110.
    [12] Isard M, Budiu M, Yu Y, et al. Dryad: Distributed Data-Parallel Programs fromSequential Building Blocks[C]//Proceedings of the2nd ACM SIGOPS/EuroSysEuropean Conference on Computer Systems2007. New York, NY, USA: ACM,2007:59–72.
    [13] Chaiken R, Jenkins B, Larson P A, et al. SCOPE: Easy and Efcient ParallelProcessing of Massive Data Sets[J]. Proc. VLDB Endow.,2008,1(2):1265–1276.
    [14] DeWitt D J, Paulson E, Robinson E, et al. Clustera: An Integrated Computationand Data Management System[J]. Proc. VLDB Endow.,2008,1(1):28–41.
    [15] Weil S A, Brandt S A, Miller E L, et al. Ceph: A Scalable, High-Performance Dis-tributed File System[C]//Proceedings of the7th symposium on Operating systemsdesign and implementation. Berkeley, CA, USA: USENIX Association,2006:307–320.
    [16] Yang H c, Dasdan A, Hsiao R L, et al. Map-Reduce-Merge: Simplified RelationalData Processing on Large Clusters[C]//Proceedings of the2007ACM SIGMODinternational conference on Management of Data. New York, NY, USA: ACM,2007:1029–1040.
    [17] Gates A F, Natkovich O, Chopra S, et al. Building a High-Level Dataflow Sys-tem on Top of Map-Reduce: The Pig Experience[J]. Proc. VLDB Endow.,2009,2(2):1414–1425.
    [18] Panda B, Herbach J S, Basu S, et al. PLANET: Massively Parallel Learning of TreeEnsembles with MapReduce[J]. Proc. VLDB Endow.,2009,2(2):1426–1437.
    [19] Friedman E, Pawlowski P, Cieslewicz J. SQL/MapReduce: A Practical Approachto Self-Describing, Polymorphic, and Parallelizable User-Defined Functions[J].Proc. VLDB Endow.,2009,2(2):1402–1413.
    [20] Ranger C, Raghuraman R, Penmetsa A, et al. Evaluating MapReduce for Multi-core and Multiprocessor Systems[C]//Proceedings of the2007IEEE13th Interna-tional Symposium on High Performance Computer Architecture. Washington, DC,USA: IEEE Computer Society,2007:13–24.
    [21] Stonebraker M, Abadi D, DeWitt D J, et al. MapReduce and Parallel DBMSs:Friends or Foes?[J]. Commun. ACM,2010,53(1):64–71.
    [22] Babu S. Towards Automatic Optimization of MapReduce Program-s[C]//Proceedings of the1st ACM symposium on Cloud computing. New York,NY, USA: ACM,2010:137–142.
    [23] Jiang D, Ooi B C, Shi L, et al. The Performance of MapReduce: An In-depthStudy[J]. Proc. VLDB Endow.,2010,3(1-2):472–483.
    [24] Vernica R, Carey M J, Li C. Efcient Parallel Set-Similarity Joins Using MapRe-duce[C]//Proceedings of the2010ACM SIGMOD International Conference onManagement of Data. New York, NY, USA: ACM,2010:495–506.
    [25] Abadi D J. Data Management in the Cloud: Limitations and Opportunities[J].IEEE Data Eng. Bull.,2009,32(1):3–12.
    [26] Pavlo A, Paulson E, Rasin A, et al. A Comparison of Approaches to Large-ScaleData Analysis[C]//Proceedings of the2009ACM SIGMOD International Confer-ence on Management of Data. New York, NY, USA: ACM,2009:165–178.
    [27] Abouzeid A, Bajda-Pawlikowski K, Abadi D, et al. HadoopDB: An ArchitecturalHybrid of MapReduce and DBMS Technologies for Analytical Workloads[J]. Proc.VLDB Endow.,2009,2(1):922–933.
    [28] Silberstein A, Cooper B F, Srivastava U, et al. Efcient Bulk Insertion into a Dis-tributed Ordered Table[C]//Proceedings of the2008ACM SIGMOD internationalconference on Management of Data. New York, NY, USA: ACM,2008:765–778.
    [29] Vigfusson Y, Silberstein A, Cooper B F, et al. Adaptively Parallelizing DistributedRange Queries[J]. Proc. VLDB Endow.,2009,2(1):682–693.
    [30] Logothetis D, Yocum K. Ad-hoc Data Processing in the Cloud[J]. Proc. VLDBEndow.,2008,1(2):1472–1475.
    [31] Agrawal P, Silberstein A, Cooper B F, et al. Asynchronous View Maintenancefor VLSD Databases[C]//Proceedings of the2009ACM SIGMOD InternationalConference on Management of Data. New York, NY, USA: ACM,2009:179–192.
    [32] Unterbrunner P, Giannikis G, Alonso G, et al. Predictable Performance for Unpre-dictable Workloads[J]. Proc. VLDB Endow.,2009,2(1):706–717.
    [33] Rao J, Shekita E J, Tata S. Using Paxos to Build a Scalable, Consistent, and HighlyAvailable Datastore[J]. Proc. VLDB Endow.,2011,4(4):243–254.
    [34] Calheiros R N, Ranjan R, Beloglazov A, et al. CloudSim: A Toolkit for Modelingand Simulation of Cloud Computing Environments and Evaluation of ResourceProvisioning Algorithms[J]. Softw., Pract. Exper.,2011,41(1):23–50.
    [35] Wu S, Jiang D, Ooi B C, et al. Efcient B-tree Based Indexing for Cloud DataProcessing[J]. Proc. VLDB Endow.,2010,3(1-2):1207–1218.
    [36] Wu S, Wu K L. An Indexing Framework for Efcient Retrieval on the Cloud[J].IEEE Data Eng. Bull.,2009,32(1):75–82.
    [37] Vo H T, Chen C, Ooi B C. Towards Elastic Transactional Cloud Storage with RangeQuery Support[J]. Proc. VLDB Endow.,2010,3(1-2):506–514.
    [38] Kossmann D, Kraska T, Loesing S. An Evaluation of Alternative Architectures forTransaction Processing in the Cloud[C]//Proceedings of the2010ACM SIGMODInternational Conference on Management of Data. New York, NY, USA: ACM,2010:579–590.
    [39]于戈,谷峪,鲍玉斌.云计算环境下大规模图数据处理技术[J].计算机学报,2011,34(10):1753–1767.
    [40] Aguilera M K, Golab W, Shah M A. A Practical Scalable Distributed B-tree[J].Proc. VLDB Endow.,2008,1(1):598–609.
    [41] Zhang X, Ai J, Wang Z, et al. An Efcient Multi-Dimensional Index for CloudData Management[C]//Proceedings of the first international workshop on Clouddata management. New York, NY, USA: ACM,2009:17–24.
    [42] Chen G, Vo H T, Wu S, et al. A Framework for Supporting DBMS-like Indexes inthe Cloud[J]. PVLDB,2011,4(11):702–713.
    [43] Hellerstein J M, Haas P J, Wang H J. Online Aggregation[C]//Proceedings ofthe1997ACM SIGMOD international conference on Management of Data. NewYork, NY, USA: ACM,1997:171–182.
    [44] Haas P J. Large-Sample and Deterministic Confidence Intervals for Online Ag-gregation[C]//Proceedings of the Ninth International Conference on Scientific andStatistical Database Management. Washington, DC, USA: IEEE Computer Soci-ety,1997:51–63.
    [45] Haas P J, Hellerstein J M. Ripple Joins for Online Aggregation[C]//Proceedings ofthe1999ACM SIGMOD international conference on Management of Data. NewYork, NY, USA: ACM,1999:287–298.
    [46] Luo G, Ellmann C J, Haas P J, et al. A Scalable Hash Ripple Join Algorith-m[C]//Proceedings of the2002ACM SIGMOD international conference on Man-agement of Data. New York, NY, USA: ACM,2002:252–262.
    [47] Jermaine C, Dobra A, Arumugam S, et al. A Disk-Based Join with ProbabilisticGuarantees[C]//Proceedings of the2005ACM SIGMOD international conferenceon Management of Data. New York, NY, USA: ACM,2005:563–574.
    [48] Wu S, Jiang S, Ooi B C, et al. Distributed Online Aggregations[J]. Proc. VLDBEndow.,2009,2(1):443–454.
    [49] Condie T, Conway N, Alvaro P, et al. Online Aggregation and Continuous QuerySupport in MapReduce[C]//Proceedings of the2010ACM SIGMOD InternationalConference on Management of Data. New York, NY, USA: ACM,2010:1115–1118.
    [50] Pansare N, Borkar V R, Jermaine C, et al. Online Aggregation for Large MapRe-duce Jobs[J]. PVLDB,2011,4(11):1135–1145.
    [51] Shi Y, Meng X, Wang F, et al. You Can Stop Early with COLA: Online Processingof Aggregate Queries in the Cloud[C]//Proceedings of the21st ACM internationalconference on Information and knowledge management. New York, NY, USA:ACM,2012:1223–1232.
    [52] Wagner R A, Fischer M J. The String-to-String Correction Problem[J]. J. ACM,1974,21(1):168–173.
    [53] Gravano L, Ipeirotis P G, Jagadish H V, et al. Approximate String Joins in aDatabase (Almost) for Free[C]//Proceedings of the27th International Conferenceon Very Large Data Bases. San Francisco, CA, USA: Morgan Kaufmann PublishersInc.,2001:491–500.
    [54] Chaudhuri S, Ganti V, Kaushik R. A Primitive Operator for Similarity Joins inData Cleaning[C]//Proceedings of the22nd International Conference on Data En-gineering. Washington, DC, USA: IEEE Computer Society,2006:5–16.
    [55] Xiao C, Wang W, Lin X. Ed-Join: An Efcient Algorithm for Similarity Joins withEdit Distance Constraints[J]. Proc. VLDB Endow.,2008,1(1):933–944.
    [56] Li C, Lu J, Lu Y. Efcient Merging and Filtering Algorithms for ApproximateString Searches[C]//Proceedings of the2008IEEE24th International Conferenceon Data Engineering. Washington, DC, USA: IEEE Computer Society,2008:257–266.
    [57] Sarawagi S, Kirpal A. Efcient Set Joins on Similarity Predicates[C]//Proceedingsof the2004ACM SIGMOD international conference on Management of Data. NewYork, NY, USA: ACM,2004:743–754.
    [58] Behm A, Ji S, Li C, et al. Space-Constrained Gram-Based Indexing for EfcientApproximate String Search[C]//Proceedings of the2009IEEE International Con-ference on Data Engineering. Washington, DC, USA: IEEE Computer Society,2009:604–615.
    [59] Hadjieleftheriou M, Koudas N, Srivastava D. Incremental Maintenance of LengthNormalized Indexes for Approximate String Matching[C]//Proceedings of the2009ACM SIGMOD International Conference on Management of Data. New York, NY,USA: ACM,2009:429–440.
    [60] Qin J, Wang W, Lu Y, et al. Efcient Exact Edit Similarity Query Processing withthe Asymmetric Signature Scheme[C]//Proceedings of the2011ACM SIGMODInternational Conference on Management of Data. New York, NY, USA: ACM,2011:1033–1044.
    [61] Zhang Z, Hadjieleftheriou M, Ooi B C, et al. Bed-tree: An All-Purpose IndexStructure for String Similarity Search Based on Edit Distance[C]//Proceedings ofthe2010ACM SIGMOD International Conference on Management of Data. NewYork, NY, USA: ACM,2010:915–926.
    [62] Behm A, Li C, Carey M J. Answering Approximate String Queries on Large DataSets Using External Memory[C]//Proceedings of the2011IEEE27th InternationalConference on Data Engineering. Washington, DC, USA: IEEE Computer Society,2011:888–899.
    [63] De Felipe I, Hristidis V, Rishe N. Keyword Search on Spatial Databas-es[C]//Proceedings of the2008IEEE24th International Conference on Data En-gineering. Washington, DC, USA: IEEE Computer Society,2008:656–665.
    [64] Cong G, Jensen C S, Wu D. Efcient Retrieval of the Top-k Most Relevant SpatialWeb Objects[J]. Proc. VLDB Endow.,2009,2(1):337–348.
    [65] Chen Y Y, Suel T, Markowetz A. Efcient Query Processing in Geographic WebSearch Engines[C]//Proceedings of the2006ACM SIGMOD International Confer-ence on Management of Data. New York, NY, USA: ACM,2006:277–288.
    [66] Hariharan R, Hore B, Li C, et al. Processing Spatial-Keyword (SK) Queries inGeographic Information Retrieval (GIR) Systems[C]//Proceedings of the19th In-ternational Conference on Scientific and Statistical Database Management. Wash-ington, DC, USA: IEEE Computer Society,2007:16–27.
    [67] Vaid S, Jones C B, Joho H, et al. Spatio-textual indexing for geographical search onthe web[C]//Proceedings of the9th international conference on Advances in Spatialand Temporal Databases. Berlin, Heidelberg: Springer-Verlag,2005:218–235.
    [68] Zhang D, Ooi B C, Tung A K H. Locating mapped resources in Web2.0[C]//Proceedings of the22nd International Conference on Data Engineering.Washington, DC, USA: IEEE Computer Society,2010:521–532.
    [69] Zhou Y, Xie X, Wang C, et al. Hybrid Index Structures for Location-based WebSearch[C]//Proceedings of the14th ACM International Conference on Informationand Knowledge Management. New York, NY, USA: ACM,2005:155–162.
    [70] Yao B, Li F, Hadjieleftheriou M, et al. Approximate String Search in SpatialDatabases[C]//Proceedings of the22nd International Conference on Data Engi-neering. Washington, DC, USA: IEEE Computer Society,2010:545–556.
    [71] Alsubaiee S, Behm A, Li C. Supporting Location-Based Approximate-KeywordQueries[C]//Proceedings of the18th SIGSPATIAL International Conference onAdvances in Geographic Information Systems. New York, NY, USA: ACM,2010:61–70.
    [72] Guttman A. R-trees: A Dynamic Index Structure for Spatial Search-ing[C]//Proceedings of the1984ACM SIGMOD International Conference on Man-agement of Data. New York, NY, USA: ACM,1984:47–57.
    [73] Fan J, Li G, Zhou L, et al. Seal: Spatio-Textual Similarity Search[J]. Proc. VLDBEndow.,2012,5(9):824–835.
    [74] Rivoire S, Shah M A, Ranganathan P, et al. JouleSort: A Balanced Energy-Efciency Benchmark[C]//Proceedings of the2007ACM SIGMOD InternationalConference on Management of Data. New York, NY, USA: ACM,2007:365–376.
    [75] Beckmann A, Meyer U, Sanders P, et al. Energy-Efcient Sorting Using SolidState Disks[C]//Green Computing Conference,2010International. New York, NY,USA: ACM,2010:191–202.
    [76] Fan X, Weber W D, Barroso L A. Power Provisioning for a Warehouse-SizedComputer[C]//Proceedings of the34th annual International Symposium on Com-puter Architecture. New York, NY, USA: ACM,2007:13–23.
    [77] Graefe G. Database Servers Tailored to Improve Energy Efcien-cy[C]//Proceedings of the2008EDBT Workshop on Software Engineering forTailor-Made Data Management. New York, NY, USA: ACM,2008:24–28.
    [78] Agrawal R, Ailamaki A, Bernstein P A, et al. The Claremont Report on DatabaseResearch[J]. SIGMOD Rec.,2008,37(3):9–19.
    [79] Rivoire S, Ranganathan P, Kozyrakis C. A Comparison of High-Level Full-SystemPower Models[C]//Proceedings of the2008Conference on Power Aware Comput-ing and Systems. Berkeley, CA, USA: USENIX Association,2008:3–13.
    [80] Harizopoulos S, Shah M, Ranganathan P. Energy Efciency: The New Holy Grailof Data Management Systems Research[C]//Proceedings of the Biennial Confer-ence on Innovative Data Systems Research2009. New York, NY, USA: ACM,2009.
    [81] Lang W, Patel J M. Towards Eco-friendly Database Management System-s[C]//Proceedings of the Biennial Conference on Innovative Data Systems Re-search2009. New York, NY, USA: ACM,2009.
    [82] Tsirogiannis D, Harizopoulos S, Shah M A. Analyzing the Energy Efciencyof a Database Server[C]//Proceedings of the2010ACM SIGMOD InternationalConference on Management of Data. New York, NY, USA: ACM,2010:231–242.
    [83] Xu Z, Tu Y C, Wang X. Exploring Power-Performance Tradeofs in Database Sys-tems[C]//Proceedings of the International Conference on Data Engineering. LongBeach, California: IEEE,2010:485–496.
    [84] Xu Z. Building a Power-Aware Database Management System[C]//Proceedings ofthe Fourth SIGMOD PhD Workshop on Innovative Database Research. New York,NY, USA: ACM,2010:1–6.
    [85] Rivoire S, Shah M A, Ranganathan P, et al. Models and Metrics to Enable Energy-Efciency Optimizations[J]. Journal of Computer,2007,40(12):39–48.
    [86] Meisner D, Gold B T, Wenisch T F. PowerNap: Eliminating Server Idle Pow-er[C]//Proceedings of the14th International Conference on Architectural Supportfor Programming Languages and Operating Systems. New York, NY, USA: ACM,2009:205–216.
    [87] Hamilton J. Cooperative Expendable Micro-slice Servers (CEMS): Low Cost, LowPower Servers for Internet-Scale Services[C]//Proceedings of the Biennial Confer-ence on Innovative Data Systems Research2009. Washington, DC, USA: IEEEComputer Society,2009.
    [88] Keys L, Rivoire S, Davis J D. The Search for Energy-Efcient Building Blocks forthe Data Center[C]//Proceedings of the2010International Conference on Comput-er Architecture. Berlin, Heidelberg: Springer-Verlag,2012:172–182.
    [89] Poess M, Nambiar R. Tuning Servers, Storage and Database for Energy EfcientData Warehouses[C]//Proceedings of the International Conference on Data Engi-neering. Long Beach, California: IEEE,2010:1006–1017.
    [90] Pinheiro E, Bianchini R. Energy Conservation Techniques for Disk Array-BasedServers[C]//Proceedings of the18th Annual International Conference on Super-computing. New York, NY, USA: ACM,2004:68–78.
    [91] Lang W, Harizopoulos S, Patel J M, et al. Towards Energy-Efcient DatabaseCluster Design[J]. Proc. VLDB Endow.,2012,5(11):1684–1695.
    [92] Lang W, Patel J M, Naughton J F. On Energy Management, Load Balancing andReplication[J]. SIGMOD Rec.,2010,38(4):35–42.
    [93] Leverich J, Kozyrakis C. On the Energy (in)Efciency of Hadoop Clusters[J].SIGOPS Oper. Syst. Rev.,2010,44(1):61–65.
    [94] Lang W, Patel J M. Energy Management for MapReduce Clusters[J]. Proc. VLDBEndow.,2010,3(1-2):129–139.
    [95] Chen Y, Alspaugh S, Borthakur D, et al. Energy Efciency for Large-Scale MapRe-duce Workloads With significant interactive analysis[C]//Proceedings of the7thACM European Conference on Computer Systems. New York, NY, USA: ACM,2012:43–56.
    [96] Chen Y, Keys L, Katz R H. Towards Energy Efcient MapReduce[R]. California,USA: EECS Department, University of California, Berkeley,2009.
    [97] Chen Y, Ganapathi A, Katz R H. To Compress or not to Compress-Computevs. IO Tradeofs for MapReduce Energy Efciency[C]//Proceedings of the FirstACM SIGCOMM Workshop on Green Networking. New York, NY, USA: ACM,2010:23–28.
    [98] Lin M, Wierman A, Andrew L L H, et al. Dynamic Right-Sizing for Power-Proportional Data Centers[M]. Vol. PP. Los Alamitos, CA, USA: IEEE ComputerSociety Press,2012:1–9.
    [99] Chen G, He W, Liu J, et al. Energy-Aware Server Provisioning and Load Dispatch-ing for Connection-Intensive Internet Services[C]//Proceedings of the5th USENIXSymposium on Networked Systems Design and Implementation. Berkeley, CA,USA: USENIX Association,2008:337–350.
    [100] Thereska E, Donnelly A, Narayanan D. Sierra: Practical Power-Proportionalityfor Data Center Storage[C]//Proceedings of the Sixth Conference on ComputerSystems. New York, NY, USA: ACM,2011:169–182.
    [101] Chou J, Kim J, Rotem D. Energy-Aware Scheduling in Disk Storage Sys-tems[C]//Distributed Computing Systems (ICDCS). Washington DC: IEEE,2011:423–433.
    [102] Guenter B, Jain N, Williams C. Managing Cost, Performance, and ReliabilityTradeofs for Energy-Aware Server Provisioning.[C]//INFOCOM. Washington D-C: IEEE,2011:1332–1340.
    [103]邓维,刘方明,金海.云计算数据中心的新能源应用：研究现状与走势[J].计算机学报,2013,36(3):582–598.
    [104]叶可江,吴朝晖,姜晓红,等.虚拟化云计算平台的能耗管理[J].计算机学报,2012,35(6):1262–1285.
    [105] Beckmann N, Kriegel H P, Schneider R, et al. The R*-tree: An Efcient and Ro-bust Access Method for Points and Rectangles[C]//Proceedings of the1990ACMSIGMOD International Conference on Management of Data. New York, NY, USA:ACM,1990:322–331.
    [106] Ciaccia P, Patella M, Zezula P. M-tree: An Efcient Access Method for SimilaritySearch in Metric Spaces[C]//Proceedings of the23rd International Conference onVery Large Data Bases. San Francisco, CA, USA: Morgan Kaufmann PublishersInc.,1997:426–435.
    [107] Bentley J L. Multidimensional Binary Search Trees Used for Associative Search-ing[J]. Commun. ACM,1975,18(9):509–517.
    [108] Berchtold S, Keim D A, Kriegel H P. The X-tree: An Index Structure for High-Dimensional Data[C]//Proceedings of the22th International Conference on VeryLarge Data Bases. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc.,1996:28–39.
    [109] DeWitt D, Gray J. Parallel Database Systems: the Future of High PerformanceDatabase Systems[J]. Commun. ACM,1992,35(6):85–98.
    [110] Li J, Srivastava J, Rotem D. CMD: A Multidimensional Declustering Methodfor Parallel Data Systems[C]//Proceedings of the18th International Conference onVery Large Data Bases. San Francisco, CA, USA: Morgan Kaufmann PublishersInc.,1992:3–14.
    [111] Ghandeharizadeh S, DeWitt D J, Qureshi W. A Performance Analysis of Alter-native Multi-Attribute Declustering Strategies[C]//Proceedings of the1992ACMSIGMOD International Conference on Management of Data. New York, NY, USA:ACM,1992:29–38.
    [112] Bitton D, Boral H, DeWitt D J, et al. Parallel Algorithms for the Execution ofRelational Database Operations[J]. ACM Trans. Database Syst.,1983,8(3):324–353.
    [113] Zobel J, Mofat A. Inverted Files for Text Search Engines[J]. ACM Comput. Surv.,2006,38(2).
    [114] Lazaridis I, Mehrotra S. Progressive Approximate Aggregate Queries with aMulti-Resolution Tree Structure[C]//Proceedings of the2001ACM SIGMOD In-ternational Conference on Management of Data. New York, NY, USA: ACM,2001:401–412.
    [115] Garey M R, Johnson D S. A Guide to the Theory of NP-Completeness[M]. NewYork: W. H. Freeman and Company,1979:222–223.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700