大数据服务若干关键技术研究

英文题名：Research on Some Key Technologies of Big Data-as-a-Service
作者：韩晶
论文级别：博士
学科专业名称：计算机科学与技术
中文关键词：大数据服务 ; 非结构化数据 ; 数据模型 ; 服务模型 ; 检索排名算法
英文关键词：Big data-as-a-Service ; unstructured data ; Data
英文关键词：Model ; Service Model ; Search ranking algorithm
学位年度：2013
导师：宋美娜
学科代码：0812
学位授予单位：北京邮电大学
论文提交日期：2013-04-25

摘要

大数据是现代信息技术的重要发展方向之一,实现大数据的共享和分析将带来不可估量的经济价值,同时也对社会产生巨大的推动作用。在大数据时代,对大数据进行统一表示,实现大数据处理、查询、分析和可视化是亟需解决的关键问题。大数据服务(Big Data-as-a-Service, BDaaS)是一种新的数据资源使用模式和一种新的服务经济模式,它通过将各类大数据操作进行封装,对服务消费者提供无处不在的、标准化的、随需的检索、分析与可视化服务交付。目前针对大数据服务的研究还处于概念讨论阶段,因此仍然面临四方面挑战：1)缺乏一种能够屏蔽数据资源和操作复杂性,面向用户体验的规范化大数据服务架构；2)缺乏体现用户行为特征的通用非结构化数据模型,使得非结构化大数据服务难以构建；3)已有数据服务模型仅描述服务接口规范,而覆盖大数据特征的大数据服务模型还未出现；4)在大数据检索、分析和可视化服务提供和服务能力优化方面,缺乏相应的解决方案。
     为了解决以上问题,需要对大数据服务的理论模型、服务模型、实现方法等进行系统地研究。因此,本论文研究大数据服务架构、大数据服务数据模型、大数据服务模型,以及大数据服务应用四方面关键技术。为了能够对大数据服务平台构建提供规范化架构方案,本文首先设计了面向用户体验的大数据服务架构(User Experience-oriented Big Data-as-a-Service Architecture,UE-BDaaSA);其次,在数据模型方面,为实现面向非结构化数据的大数据服务,设计了基于主体行为的非结构化数据模型；在大数据服务模型方面,通过进程代数建立了大数据服务及其组合的代数模型,并设计了基于扩展OWL-S语义本体的大数据服务；在大数据服务应用方面,详细阐述了检索、分析和可视化服务的处理流程,并通过提高检索服务准确度和服务效率两方面措施实现了大数据服务能力优化。
     本文研究中产生的主要创新点有：
     (1)针对已有非结构化数据模型难以满足大数据服务构建需求的问题,提出了一种基于主体行为的非结构化数据星系模型(Galaxy Data Model, GDM)。通过监控数据产生者行为和数据产生背景,设计覆盖用户行为、语义背景等全方位数据特征的通用非结构化数据模型,为实现非结构化大数据服务提供了数据模型基础。实例验证结果表明,GDM具有较好的通用性和全面性,还具有轻量级的实现和成熟易用的操作语言。除传统文件系统外,GDM还支持对HDFS中的非结构化数据建模和检索。此外,GDM已经在国家免费孕前优生健康检查管理信息系统中实际应用,验证了其可行性和实用性。(第三章)
     (2)针对缺乏能够涵盖大数据特征的服务模型的问题,提出了一种基于扩展OWL-S本体的大数据服务模型(Extended OWL-S based Big Data-as-a-Service, EO-BDaaS)。通过在OWL-S中扩展数据源、数据服务类型、数据服务操作等属性,实现检索、分析、可视化等多类型大数据服务的构建和动态组合。实例验证结果表明,与已有数据服务相比,EO-BDaaS在属性和操作描述方面更加完备,且具有较强的语义理解能力和自动服务组合能力,还将数据服务特有的组合运算无缝地融入大数据服务的实现中。(第四章)
     (3)针对大数据检索服务准确度较低的问题,提出了热度敏感的非结构化数据检索排名优化算法HotRank。通过非结构化数据属性和服务消费者任务属性的匹配度来计算检索结果的热度分值,并基于热度分值对检索结果进行排序,从而实现了检索结果优化,使检索结果更加符合用户偏好。仿真实验表明,HotRank的正确率-召回率优于Windows Search排名算法,因此HotRank能够很好的提高大数据服务检索结果的准确度,实现了通过提高用户体验来提高大数据服务能力。(第五章)
     (4)针对大数据服务中对服务快速响应的要求,本文提出了一种基于数据热度识别的混合预取算法(Hybrid Prefetch Algorithm, HPA)。通过分析用户数据操作记录建立数据热度判定规则,根据动态和静态预取规则获得预取候选数据,最后将预取数据置入缓存。仿真实验结果显示,HPA的预取平均命中率为55%,平均准确率为43%,这表明该算法具有很好的用户操作数据预测和优化能力,同时也从服务效率方面优化了大数据服务能力。同时,基于HPA的分布式持久化缓存存储架构已在国家免费孕前优生健康检查管理信息系统中进行了应用,验证了其有效性。(第五章)
     本论文的研究内容作为“十一五”国家科技支撑计划项目“安全可信的电信级生殖健康服务运营支撑体系关键技术研究”(编号：2008BAH24B04)和教育部-中国移动科研基金项目“面向互联网的业务支撑系统关键技术及方案研究”(编号：MCM20123031)的部分成果,己在实际运营的“国家孕前免费健康检查管理信息系统”中应用,帮助其实现了从人口计生领域数据采集到跨域人口计生大数据的共享和可视分析服务化的演进,为电子政务云计算国家工程实验室“电子政务云计算数据服务平台”建设提供了有效的解决方案和工程实践指导。
Nowadays, Big data has become an important direction of development of modern information technology, and sharing and analysis of big data would not only bring immeasurable economic value, but also play a significant role in promoting the development of society. Big Data-as-a-Service (BDaaS) is a new data resource usage pattern and a new form of service economy, by encapsulating heterogeneous data, it can provide ubiquitous service consumers, standardization, on-demand services, including search, analysis or visualization.
     Due to the research of BDaaS is in the conceptual discussion stage, it still faces four challenges:1)There is no standardized, user experience based BDaaS architecture which can shield the complexity of data sources and operations;2)The lack of generic unstructured data model which reflects user behavior characteristic, made BDaaS for unstructured data difficult to build;3)Existing data model follows the Web services model, however, so far, holistic BDaaS service model with the characteristics of big data has not yet appeared;4)There is no appropriate solution in providing data retrieval, analysis and visualization services and optimizing service capacity.
     In order to solve the above problems, four key technologies of BDaaS architecture, data model, BDaaS service model, as well as BDaaS applications will be in-depth study. Firstly, this paper designed a User Experience-oriented BDaaS Architecture, so as to provide a high level of standardization guidance for building a platform. Secondly, in terms of the data model, in order to unify description unstructured data, the user behavior-based unstructured data model has been designed. Thirdly, in terms of the service model, algebraic model has been established by using process algebra, and then extended OWL-S ontology-based BDaaS model and the service composition approach were designed. Finally, service processes of retrieval, analysis and visualization have been described in detail, and the two measures of improving the retrieval services accuracy and service efficiency have been used to optimize the BDaaS capacity.
     The main innovations points of this paper are show as follows:
     (1) As existing unstructured data models were difficult to meet the demand for BDaaS, the Galaxy Data Model (GDM) has been proposed, which is a user behavior based unstructured data model. By monitoring the behavior of data generator people, a generic model with fully attributes like user behavior, semantic background have been created, which is the basis for the realization of the BDaaS for unstructured data. The case study shows GDM not only has good versatility and comprehensiveness, but also has a lightweight, easy-to-use description language and operating language. In addition to the traditional file system, GDM also supports modeling and retrieval of unstructured data in HDFS. In addition, GDM has application in the National Pre-pregnancy Check Information Management System (NPCIMS) to verify its feasibility and practicality.(In chapter three)
     (2) Due to the holistic BDaaS service model with the characteristics of big data has not yet appeared, Extended OWL-S based Big Data-as-a-Service model(EO-BDaaS) has been proposed. By add properties of the data sources, data types, service operation in the OWL-S in order to build many types of BDaaS such as search, analysis, visualization, and to compose service dynamically. Case study shows, compared with the existing data services, EO-BDaaS with a more comprehensive description on attributes and operations. Besides, it has capabilities such as strong semantic comprehension and automatic service composition, and integrated the unique combination operations of BDaaS into the implementation of data services seamlessly.(In Chapter four)
     (3) To solve the problem of low accuracy of retrieval services, this paper presents the heat sensitive unstructured data retrieval ranking algorithm HotRank. First heat score was calculated, which is the match degree between the tasks attributes of search results and task attributes of services consumers, after that assigned the scores to each of the search results, and then sorted search results based on heat score. By using such means to make search results more in line with the preference of the user. The simulation results show that, the Precision-Recall of HotRank is better than Windows Search ranking algorithm. Therefore as the improving of retrieve accuracy, HotRank is able to optimize not only the user experience, but also the service capacity.(In Chapter five)
     (4) A data heat recognition-based Hybrid Prefetch Algorithm (HPA) has been proposed to meets the quickly respond requirements of the BDaaS. First, by analyzing the log of user data operation and develop data heat determine rules, then according the dynamic and static prefetch rules to get candidate data, finally prefetch data would be take into the cache. The simulation results show that average hit rate of HPA is55%, the average accuracy rate of HPA is43%, which indicates that the algorithm not only has good ability to predict user operation of data, but also to optimize the BDaaS capacity. In addition, HPA-based Hybrid Prefetch based Persistent Caching architecture has been applied in the National Pre-pregnancy Check Management Information System (NPCMIS) in order to verify its effectiveness.(In Chapter five)
     The research content of this thesis, as the academic achievements of National Key project of Scientific and Technical Supporting Programs "Research on a safe, reliable, carrier-class operation support system of reproductive health services"(No.2008BAH24B04) and Science Foundation of Ministry of Education of China-China Mobile Program "Research on key technologies and solutions of internet-oriented business support system"(NO.MCM20123031), has been applied in NPCMIS and help them achieve the evolution from data acquisition to BDaaS. In addition, it has provided "The National Cloud Computing E-Government BDaaS platform" of National Engineering Lab of Cloud Computing E-Government with an effective solution and project practice guidance.

引文

[1]. Nature. Big Data[EB/OL]. http://www.nature.com/news/specials/bigdata/index.html
    [2]. Science. Special Online Collection:Dealing with Data [EB/OL]. http://www.sciencemag.org/site/special/data/
    [3]. Manyika J, Chui M, Brown B, et al. Big data:The next frontier for innovation, competition, and productivity [J]. McKinsey Global Institute,2011:1-137.
    [4]. Big Data Across the Federal Government [EB/OL] http://www.whitehouse.gov/sites/default/files/microsites/ostp/big_data_fact_shee t_final_l.pdf
    [5]. Grobelnik, Marko. Big Data Tutorial [EB/OL] http://videolectures.net/eswc2012_grobelnik_big_data/
    [6]. Hamish Barwick. The'four Vs'of Big Data. Implementing Information Infrastructure Symposium[EB/OL]. http://www.computerworld.com.au/article/396198/iiis_four_vs_big_data/
    [7]. IBM. What is big data? [EB/OL]. http://www901.ibm.com/software/data/bigdata/
    [8]. Big data[EB/OL]..http://en.wikipedia.org/wiki/Big_data
    [9]. Gantz J, Reinsel D. Extracting value from chaos[J]. IDC iView,2011:1-12
    [10].中国互联网市场洞见：互联网大数据技术创新研究[EB/OL].http://www.idc.com.cn/prodserv/detail.jsp?id=NDU2
    [11]. Lohr S. The age of big data[J]. New York Times,2012,11.
    [12]. NoSQL数据库探讨之一为什么要用非关系数据库http://robbin.iteye.com/blog/524977
    [13].颜开,NoSQL数据库笔谈[EB/OL].http://www.yankay. com/
    [14]. Redis[EB/OL]. http://redis.io/
    [15]. Fitzpatrick B. Memcached:a distributed memory object caching system[J]. 2009.
    [16]. Cassandra [EB/OL].http://cassandra.apache.org
    [17]. Mongodb [EB/OL].http://www.mongodb.org/display/DOCS/Home
    [18]. Neo4j [EB/OL]. http://docs.neo4j.org.cn/
    [19]. VoltDB [EB/OL].http://voltdb.com/
    [20]. Marklogic[EB/OL].http://marklogic.com/
    [21]. xeround[EB/OL]. http://xeround.com/
    [22]. NuoDB[EB/OL]. http://www.nuodb.com/
    [23]. Hadoop[EB/OL]. http://hadoop. apache.org/.
    [24]. Ghemawat S, Gobioff H, Leung S T. The Google file system[C]//ACM SIGOPS Operating Systems Review. ACM,2003,37(5):29-43.
    [25]. Dean J, Ghemawat S. MapReduce:simplified data processing on large clusters[J]. Communications of the ACM,2008,51(1):107-113
    [26]. Hbase[EB/OL]. http://hbase. apache, org.
    [27]. Chang F, Dean J, Ghemawat S, et al. Bigtable:A distributed storage system for structured data[J]. ACM Transactions on Computer Systems (TOCS),2008, 26(2):4.
    [28]. Thusoo A, Sarma J S, Jain N, et al. Hive-a petabyte scale data warehouse using hadoop[C]//Data Engineering (ICDE),2010 IEEE 26th International Conference on. IEEE,2010:996-1005..
    [29]. Gates A F, Natkovich O, Chopra S, et al. Building a high-level dataflow system on top of Map-Reduce:the Pig experience[J]. Proceedings of the VLDB Endowment,2009,2(2):1414-1425.
    [30]. Holloway A L, DeWitt D J. Read-optimized databases, in depth[J]. Proceedings of the VLDB Endowment,2008,1(1):502-513..
    [31]. Thusoo A, Shao Z, Anthony S, et al. Data warehousing and analytics infrastructure at facebook[C]//Proceedings of the 2010 ACM SIGMOD International Conference on Management of data. ACM,2010:1013-1020.
    [32]. Abadi D J, Madden S R, Hachem N. Column-Stores vs. Row-Stores: How different are they really?[C]//Proceedings of the 2008 ACM SIGMOD international conference on Management of data. ACM,2008:967-980..
    [33]. Pavlo A, Paulson E, Rasin A, et al. A comparison of approaches to large-scale data analysis [C]//Proceedings of the 35th SIGMOD international conference on Management of data. ACM,2009:165-178.
    [34]. Greenplum [EB/OL].http://www.greenplum.com
    [35]. Isilon[EB/OL]. http://www.isilon.com/
    [36]. AWS Marketplace[EB/OL]. https://aws.amazon.com/marketplace
    [37]. Windows Azure MarketPlace[EB/OL]. http://datamarket.azure.com/
    [38]. Google BigQuery[EB/OL]. https://cloud.google.com/products/big-query
    [39]. SnapLogic[EB/OL]. http://www.snaplogic.com/
    [40].Carey M J, Onose N, Petropoulos M. Data services[J]. Communications of the ACM,2012,55(6):86-97.
    [41].周林华,陈华钧,毛郁欣.基于查询重写的数据服务组合方法[J].计算机集成制造系统,2009,15(4)：823-832.
    [42].Truong H L, Dustdar S. On evaluating and publishing data concerns for data as a service[C]//Services Computing Conference (APSCC),2010 IEEE Asia-Pacific. IEEE,2010:363-370.
    [43].Mackey A. Windows Communication Foundation[M]//Introducing. NET 4.0. Apress,2010:159-173.
    [44].Oracle and/or its affiliates. Oracle Data Services Integrator 10gR3(10.3), http://docs.oracle.com/cd/E13162_01/odsi/docs10gr3/datasrvc/Data%20in%20th e%2021 st%20Century.html
    [45].Horey J, Begoli E, Gunasekaran R, et al. Big data platforms as a service: challenges and approach[C]//Proceedings of the 4th USENIX conference on Hot Topics in Cloud Ccomputing. USENIX Association,2012:16-16.
    [46].EMC.2012. Big data-as-a-service:a market and technology perspective, [EB/OL].http://www.emc.com/collateral/software/white-papers/hl0839-big-data-as-a-service-perspt.pdf, July (accessed April 2013)
    [47]. Global Big Data-as-a-service Market 2012-2016[EB/OL]. http://www.technavio.com/content/global-big-data-service-market-2012-2016
    [1]. Carey M J, Onose N, Petropoulos M. Data services[J]. Communications of the ACM,2012,55(6):86-97.
    [2]. Mackey A. Windows Communication Foundation[M]//Introducing. NET 4.0. Apress,201.0:159-173.
    [3]. Oracle and/or its affiliates. Oracle Data Services Integrator 10gR3(10.3), [EB/OL].http://docs.oracle.com/cd/E 13162_01/odsi/docs10gr3/datasrvc/Data%2 0in%20the%2021st%20Century.html,2011
    [4]. Deriving new business insights with big data[EB/OL]. http.//www.ibm.com/developerworks/library/os-bigdata
    [5]. O'Reilly Strata Conference. [EB/OL].http://strataconf.com/strata2011/
    [6]. Agrawal D, Bernstein P, Bertino E, et al. Challenges and Opportunities with Big Data-A community white paper developed by leading researchers across the United States[J].2012.
    [7].李国杰,程学旗大数据研究：未来科技及经济社会发展的重大战略领域——大数据的研究现状与科学思考[J].中国科学院院刊,2012,(06)：647-657.
    [8].马帅,李建欣,胡春明大数据科学与工程的挑战与思考[J].中国计算机学会通讯,2012,8(9)：22-30.
    [9]. Google Big Query[EB/OL]. https://cloud.google.com/products/big-query
    [10]. StrikeIron[EB/OL]. http://www.strikeiron.com/strikeironservices.aspx
    [11]. Xignite [EB/OL]. http://www.xignite.com/Products/ProductDirectory.aspx
    [12].serviceobjects.NET http://www.serviceobjects.com/products/directory of webservices.asp
    [13]. WebserviceX[EB/OL]. http://www.webservicex.net/WCF/webSeivices.aspx
    [14].XWebServices[EB/OL]. http://www.xwebservices.com/Web Services/
    [15].Precog[EB/OL]. http://www.precog.com
    [16].Data.gov[EB/OL]. http://www.data.gov
    [17].Factual.com[EB/OL]. http://www.factual.com
    [18].数据堂[EB/OL]. http://www.datatang.com
    [19].国家免费孕前优生健康检查管理信息系统[EB/OL].http://mcm2.pc.e-health.org.cn/
    [20].University of Edinburgh, "Open Grid Services Architecture Data Access and Integration (OGSADAI),http://www.ogsadai.org.uk/," 2008.
    [21].赵佳璐,杨俊,韩晶,鄂海红.基于事务ID集合的带约束的关联规则挖掘算法[J].计算机工程与设计.2013.5
    [22].Han J, Haihong E, Le G, et al. Survey on NoSQL database[C]//Pervasive computing and applications (ICPCA),2011 6th international conference on. IEEE,2011:363-366
    [23].Wang W, Han J, Song M, et al. The design of a trust and role based access control model in cloud computing[C]//Pervasive Computing and Applications (ICPCA),20116th International Conference on. IEEE,2011:330-334.
    [24].Meng Y, Han J, Song M, et al. A carrier-grade service-oriented file storage architecture for cloud computing[C]//Web Society (SWS),2011 3rd Symposium on. IEEE,2011:16-20.
    [25].Miao X, Han J. The Design of a Private Cloud Infrastructure Based on XEN[C]//Distributed Computing and Applications to Business, Engineering and Science (DCABES),2011 Tenth International Symposium on. IEEE,2011: 160-164.
    [26].Cunchen Li, Jun Yang, Jing Han, Haihong E, The Distributed Storage System Based On MPP For Mass Data, APSCC.
    [1].孟小峰,慈祥.大数据管理：概念,技术与挑战[J].计算机研究与发展,2013,50(1)：146-169.
    [2]. Ferrucci D, Lally A. UIMA:an architectural approach to unstructured information processing in the corporate research environment[J]. Natural Language Engineering,2004,10(3-4):327-348.
    [3]. Rizzo, T. WinFS 101:Introducing the New Windows File System[EB/OL].March 17,2004. [2012-5-11]. http://msdn.microsoft.com/en-us/library/aa480687.aspx
    [4]. M. Cyran. Oracle Database Concepts, lOg Release 2 (10.2)[M].B 14220-02, Oracle, October,2005
    [5]. Microsoft SQL Server[EB/OL] http://www.microsoft.com/sqlserver/en/us/default.aspx
    [6]. IBM-DB2[EB/OL] http://www-01.ibm.com/software/data/db2/
    [7]. David Ferrucci. Unstructured Information Management Architecture (UIMA) Version 1.0 Working Draft 05 [S]. Billerica:Organization for the Advancement of Structured Information Standards (OASIS),2008
    [8]. UCAR Community Programs (UCP). NetCDF Online Document[EB/OL]. http://www.unidata.ucar.edu/software/netcdf/docs/
    [9]. Doan A, Naughton J F, Baid A, et al. The case for a structured approach to managing unstructured data[C]. In:Proceedings of the Fourth Biennial Conference on Innovative Data Systems Research. Asilomar,2009
    [10].Agichtein E, Gravano L. Snowball:Extracting relations from large plain-text collections[C]//Proceedings of the fifth ACM conference on Digital libraries. ACM,2000:85-94.
    [11].Jayram T S, Krishnamurthy R, Raghavan S, et al. Avatar information extraction system[J]. IEEE Data Engineering Bulletin,2006,29(1):40-48.
    [12].Siadat M, Soltanian-Zadeh H, Fotoub I F, et al. Data modeling for content-based support environment (C-BASE):application on epilepsy data mining[C]. In: Proceedings of the 7th IEEE International Conference on Data Mining.Omaha, Proceedings,2007.181-186
    [13].Chu E, Baid A, Chen T, et al. A relational approach to incrementally extracting and querying structure in unstructured data[C]. In:Proceedings of VLDB'07. Vienna,2007
    [14].Srivastava D, Velegrakis Y. Intentional associations between data and metadata[C]. In:SIGMOD'07. Beijing,2007
    [15].Amato G, Mainetto G, Savino P. An approach to a content-based retrieval of multimedia data[J]. Multimed Tools Appl,1998,7:9-36
    [16].Marcus S, Subrahmanian V S. Foundations of multimedia database systems[J]. Journal of the ACM (JACM),1996,43(3):474-523.
    [17].Li W, Lang B. A tetrahedral data model for unstructured data managemen[J]. Science China Information Sciences,2010,53(8):1497-1510.
    [18].李国杰,程学旗.大数据研究：未来科技及经济社会发展的重大战略领域——大数据的研究现状与科学思考[J].中国科学院院刊,2012,27(006)：647-657.
    [19].Crockford D. Introducing json[J]. Available:json. org,2011.
    [20].Beyer K S, Ercegovac V, Gemulla R, et al. Jaql:A scripting language for large scale semistructured data analysis[C]//Proceedings of VLDB Conference.2011.
    [21].韩晶,鄂海红,宋美娜,宋俊德.基于主体行为的非结构化数据模型[J].计算机工程与设计,2013,(03)：904-908
    [22].赵佳璐,杨俊,韩晶,鄂海红,基于事务ID集合的带约束的关联规则挖掘算法,计算机工程与设计2013.5
    [1]. Zhu F, Turner M, Kotsiopoulos I, et al. Dynamic data integration using web services[C]//Web Services,2004. Proceedings. IEEE International Conference on. IEEE,2004:262-269.
    [2].周林华,陈华钧,毛郁欣.基于查询重写的数据服务组合方法[J].计算机集成制造系统,2009,15(4)：823-832.
    [3]. Vaculin R, Chen H, Neruda R, et al. Modeling and discovery of data providing services[C]//Web Services,2008. ICWS'08. IEEE International Conference on. IEEE,2008:54-61.
    [4]. Barhamgi M, Benslimane D, Medjahed B. A query rewriting approach for web service composition[J]. Services Computing, IEEE Transactions on,2010,3(3): 206-222.
    [5]. Resource Description Framework[EB/OL] http://www.w3.org/RDF/
    [6]. OWL-S:Semantic Markup for Web Services[EB/OL] http://www.w3. org/Submission/OWL-S/
    [7]. Wang G, Yang S, Han Y. Mashroom:end-user mashup programming using nested tables[C]//Proceedings of the 18th international conference on World wide web. ACM,2009:861-870.
    [8]. Flickr Service APIs[EB/OL] http://www.flickr.com/services/api,
    [9]. The UNDATA API project,[EB/OL] http://www.undata-api.org/
    [10].Meng X, Lu H, Wang H, et al. SG-WRAP:a schema-guided wrapper generator[C]//Data Engineering,2002. Proceedings.18th International Conference on. IEEE,2002:331-332..
    [11].Dickerson R, Lu J, Lu J, et al. Stream feeds-an abstraction for the world wide sensor web[M]//The Internet of Things. Springer Berlin Heidelberg,2008: 360-375.
    [12].Schulzrinne H. Real time streaming protocol (RTSP)[J].1998.
    [13].Truong H L, Dustdar S. On analyzing and specifying concerns for data as a service[C]//Services Computing Conference,2009. APSCC 2009. IEEE Asia-Pacific. IEEE,2009:87-94.
    [14].Esfandiari B, Tosic V. Towards a web service compositi on management framework[C]//IEEE International Conference on Web Services. Orlando, FL, USA:IEEE Computer Society,2005:419-426.
    [15].Martin D, Burstein M, Mcdermott D, et al. Bringing semantics to web services with owl-s[J]. World Wide Web,2007,10(3):243-277.
    [16].刘敏,严隽薇,王坚.基于Web服务与资源模型的动态工作流互操作框架[J].计算机集成制造系统,2006,12(2)：264-270
    [17].Beco S, Cantalupo B, Giammarino L, et al. OWL-WS:a workflow ontology for dynamic grid service composition[C]//e-Science and Grid Computing,2005. First International Conference on. IEEE,2005:8 pp.-155.
    [18].Matjaz B J, et al. Business Process Execution Language for Web Services[M]. Birmingham,UK:Packt Publishing,2006.7-11.
    [19].Pistore M, Traverso P, Bertoli P, et al. Automated synthesis of composite bpe14ws web services[C]//IEEE International Conference on Web Services//Orlando, FL, USA:IEEE Computer Society,2005:293-301.
    [20].Berardi D, Calvanese D, Giacomo G D, et al. Automatic composition of transition-based semantic web services with messaging[C]//Proceedings of the 31st International Conference on Very Large Data Bases. Trondheim, Norway:ACM,2005:613-624.
    [21].Esfandiari B, Tosic V. Towards a web service compositi on management framework[C]//IEEE International Conference on Web Services. Orlando, FL, USA:IEEE Computer Society,2005:419-426.
    [22]. Sirin E, Parsia B. Planning for semantic webservices[C]//Semantic Web Services Workshop at 3rd International Semantic Web Conference. Hiroshima, Japan: Springer,2004
    [23].谢兴生,庄镇泉.一种基于数据服务匹配的数据集成方法研究[J].中国科学技术大学学报,2009,39(5)：504-509.
    [24].周林华,陈华钧,毛郁欣.基于查询重写的数据服务组合方法[J].计算机集成制造系统,2009,15(4)：823-832.
    [25].唐剑锋,王继成,张夏宁.数据集成系统中的服务组合方法[J].计算机应用,2010(A01)：209-211.
    [26].Ives Z, Knoblock C, Minton S, et al. Interactive data integration through smart copy & paste[J]. arXiv preprint arXiv:0909.1769,2009.
    [27].Crasso M, Zunino A, Campo M. Easy web service discovery:A query-by-example approach[J]. Science of Computer Programming,2008,71(2): 144-164.
    [28].梁晟.基于语义Web的服务自动组合技术的研究[J].博士学位论文.北京：中国科学院软件研究所,2004.
    [29].韩晶,宋美娜,鄂海红,基于扩展OWL—S的大数据服务模型[J]电子与信息学报,(已投)
    [30].Chen M, Song M, Han J, et al. Survey on data quality[C]//Information and Communication Technologies (WICT),2012 World Congress on. IEEE,2012: 1009-1013..
    [31].张杰,鄂海红,韩晶,等.一种基于Mashup的服务聚合模型的研究与设计[J].
    [1]. Crasso M, Zunino A, Campo M. Easy web service discovery:A query-by-example approach[J]. Science of Computer Programming,2008,71(2): 144-164.
    [2]. Chirita P A, Nejdl W. Analyzing user behavior to rank desktop items[C]//String Processing and Information Retrieval. Springer Berlin Heidelberg,2006:86-97.
    [3]. Cohen S, Domshlak C, Zwerdling N. On ranking techniques for desktop search[J]. ACM Transactions on Information Systems (TOIS),2008,26(2):11.
    [4]. Yi Chen, Liadh Kelly, Gareth J.F. Jones, Memory support for desktop search[C]. In:Desktop Search Workshop at SIGIR 2010,23 July 2010, Geneva, Switzerland
    [5]. Karger D R, Bakshi K, Huynh D, et al. Haystack:A customizable general-purpose information management tool for end users of semistructured data[C]//Proc. of the CIDR Conf.2005.
    [6]. Li Y, Zhang X, Meng X. Exploring desktop resources based on user activity analysis[C]//Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval. ACM,2010:700-700.
    [7]. Jensen C, Lonsdale H, Wynn E, et al. The life and times of files and information: a study of desktop provenance[C]//Proceedings of the 28th international conference on Human factors in computing systems. ACM,2010:767-776.
    [8]. AWS Marketplace [EB/OL] https://aws.amazon.com/marketplace
    [9]. Windows Azure MarketPlace [EB/OL] http://datamarket.azure.com/
    [10].Google BigQuery [EB/OL] https://cloud.google.com/products/big-query
    [11].101 Odata [EB/OL] http://www.1010data.com/
    [12].Precog [EB/OL] http://www.precog.com/
    [13].量子恒道[EB/OL]http://www.linezing.com/
    [14].淘师爷淘宝数据分析系统[EB/OL] http://www.tshiye.com/
    [15].俞宏峰.大规模科学可视化[J].中国计算机学会通讯,2012,8(9)：29-37.
    [16].Reas C, Fry B. Processing:a programming handbook for visual designers and artists[M]. The MIT Press,2007.
    [17].Resig J, Fry B, Reas C. Processing. js[J]. Aufgerufen am,2010,20:2010.
    [18].Bostock M, Ogievetsky V, Heer J. D3 Data-Driven Documents[J]. Visualization and Computer Graphics, IEEE Transactions on,2011,17(12):2301-2309
    [19].Datav.js可视化组件库[EB/OL]http://www.civn.cn/p/8307.html
    [20].FusionCharts [EB/OL] http://www.fusioncharts.com/
    [21]. "World Internet Usage and Population Statistics [R]," Internet World Stats 2009.
    [22].班志杰,古志民,金瑜.Web预取技术综述[J].计算机研究与发展,2009,46(2)：202-210.
    [23].郝沁汾,祝明发,郝继升.WWW业务访问特性分布研究[J].计算机研究与发展,2001,38(10)：117221180)
    [24].Mahanti A, Eager D, Williamson C. Temporal locality and its impact on Web proxy cache performance[J]. Performance Evaluation,2000,42(223):1872203
    [25].Fitzpatrick B. Memcached:a distributed memory object caching system[J].2009
    [26].Redis [EB/OL] http://redis.io/
    [27].Mongodb [EB/OL] http://www.mongodb.org/display/DOCS/Home
    [28].徐宝文,张卫丰.数据挖掘技术在Web预取中的应用研究[J].计算机学报,2001,24(4)：4302436
    [29].Shi L, Han Y, Ding X, et al. An SPNbased integrated model for Web prefetching and caching [J]. Journal of Computer Science and Technology,2006,21(4) 4822489
    [30].邢永康,马少平.多Markov链用户浏览预测模型[J].计算机学报,2003,26(11)：151021517
    [31].石磊.基于流行度的Web缓存与预取模型研究[D].北京：北京理工大学,2006
    [32].石磊,张岳,裴云霞等.基于Web对象流行度的PPM预测模型.小型微型计算机系统,2006,7(27)：1378-1383.
    [33].Ban Z, Gu Z, Jin Yu. Anonline PPM prediction model forWeb prefetching [C] PPProc of the 9th Annual ACMInt Workshopon Web InformationandData Management. NewYork:ACM,2007:89296
    [34].Wu B, Kshemkalyani A D. Objective-optimal algorithms for long-term Web prefetching[J]. Computers, IEEE Transactions on,2006,55(1):2-17.
    [35].DomenecJ, SahuquilloJ, GilJ A, et al. About theheterogeneityof Web prefetching performance key metrics[C] PPLNCS3283:Proc of the IFIP Int Conf on Intelligence in Communication Systems. Berlin:Springer,2004:2202235
    [36].Venkataramani A, Yalagandula P, Kokku R, et al. The potential costs and benefits of long2term prefetching for content distribution[J]. Computer Communications,2002,25(4):3672375
    [37].Wu B, Kshemkalyani AD. Objective2greedy algorithms forlong2termWeb prefetching[C] PPProc of the 3rd IEEE IntSymp Network Computingand Applications. Los Alamitos,CA:IEEEComputer Society,2004:61268
    [38].Davison B D. Predicting web actions from html content[C]//Proceedings of the thirteenth ACM conference on Hypertext and hypermedia. ACM,2002:159-168.
    [39].刘丽文.服务运营管理[M].清华大学出版社有限公司,2004.
    [40].Gabrielli A, Caldarelli G. Invasion percolation and critical transient in the Barabasi model of human dynamics[J]. Physical review letters,2007,98(20): 208701.
    [41].韩晶,宋美娜,大数据2[J]. ZTE TECHNOLOGY JOURNAL.2013.4
    [42].韩晶,宋美娜,鄂海红,宋俊德,HotRank:热度敏感的非结构化数据检索排名算法[J]计算机应用研究2013.5
    [43]. Han J, Song M, Song J. A Novel Solution of Distributed Memory NoSQL Database for Cloud Computing[C]//Computer and Information Science (ICIS), 2011 IEEE/ACIS 10th International Conference on. IEEE,2011:351-355.
    [44], Yulu Hao,Meina Song, Jing Han, Junde Song A Cloud Computing Model Based on Hadoop with an Optimization of Its Task Scheduling Algorithms, the 14th International Conference on Enterprise Information Systems (ICEIS 2011)
    [45].Yue W, Song M, Han J, et al. Location Context Aware Collective Filtering Algorithm[M]//Pervasive Computing and the Networked World. Springer Berlin Heidelberg,2013:788-800.
    [46].Niu X, Jin X, Han J, et al. A Cache-Sensitive Hash Indexing Structure for Main Memory Database[M]//Pervasive Computing and the Networked World. Springer Berlin Heidelberg,2013:400-404.