JackHare: a framework for SQL to NoSQL translation using MapReduce
详细信息    查看全文
  • 作者:Wu-Chun Chung (1)
    Hung-Pin Lin (1)
    Shih-Chang Chen (1)
    Mon-Fong Jiang (2)
    Yeh-Ching Chung (1)
  • 关键词:Cloud computing ; Unstructured data processing ; MapReduce ; NoSQL database ; HBase ; JDBC ; Compiler
  • 刊名:Automated Software Engineering
  • 出版年:2014
  • 出版时间:December 2014
  • 年:2014
  • 卷:21
  • 期:4
  • 页码:489-508
  • 全文大小:1,847 KB
  • 参考文献:1. Abouzeid, A., Bajda-Pawlikowski, K., Abadi, D., Silberschatz, A., Rasin, A.: HadoopDB: an architectural hybrid of MapReduce and DBMS technologies for analytical workloads. In: Proceedings of the VLDB Endowment. VLDB Endowment, Armonk pp. 922鈥?33 (2009)
    2. Afrati, F.N., Ullman, J.D.: Optimizing joins in a map-reduce environment. In: Proceedings of the 13th International Conference on Extending Database Technology, pp. 99鈥?10 (2010) CrossRef
    3. Apache Hadoop: http://hadoop.apache.org (2013)
    4. Apache HBase: http://hbase.apache.org (2013)
    5. Binnig, C., Rehrmann, R., Faerber, F., Riewe, R.: FunSQL: it is time to make SQL functional. In: Proceedings of the 2012 Joint EDBT/ICDT Workshops, pp. 41鈥?6. ACM, New York (2012) CrossRef
    6. Blanas, S., Patel, J.M., Ercegovac, V., Rao, J., Shekita, E.J., Tian, Y.: A comparison of join algorithms for log processing in MaPreduce. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, pp. 975鈥?86 (2010) CrossRef
    7. Chang, F., Dean, J., Ghemawat, S., Hsieh, W.C., Wallach, D.A., Burrows, M., Chandra, T., Fikes, A., Gruber, R.E.: Bigtable: a distributed storage system for structured data. ACM Trans. Comput. Syst. 26(2), 1鈥?6 (2008) CrossRef
    8. Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107鈥?13 (2008) CrossRef
    9. Ghemawat, S., Gobioff, H., Leung, S.: The Google file system. In: Proceedings of the Nineteenth ACM Symposium on Operating Systems Principles, pp. 29鈥?3. ACM, New York (2003) CrossRef
    10. Gowraj, N., Ravi, P.V., Sumalatha, M.R.: S2MART: smart sql to map-reduce translators. In: Proceedings of the Web Technologies and Applications. LNCS, vol. 7808, pp. 571鈥?82. Springer, Berlin (2013) CrossRef
    11. Hive HBase Integration (2013). https://cwiki.apache.org/Hive/hbaseintegration.html
    12. Kaldewey, T., Shekita, E.J., Tata, S.: Clydesdale: structured data processing on MapReduce. In: Proceedings of the 15th International Conference on Extending Database Technology, pp. 15鈥?5. ACM, New York (2012) CrossRef
    13. Lee, R., Luo, T., Huai, Y., Wang, F., He, Y., Zhang, X.Y.: Yet another SQL-to-MapReduce translator. In: Proceeding of the 2011 31st International Conference on Distributed Computing Systems, Washington, pp. 25鈥?6 (2011) CrossRef
    14. Okcan, A., Riedewald, M.: Processing theta-joins using MapReduce. In: Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data, pp. 949鈥?60. ACM, New York (2011) CrossRef
    15. Olston, C., Reed, B., Srivastava, U., Kumar, R., Tomkins, A.: Pig latin: a not-so-foreign language for data processing. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, pp. 1099鈥?110. ACM, New York (2008) CrossRef
    16. Pavlo, A., Paulson, E., Rasin, A., Abadi, D.J., DeWitt, D.J., Madden, S., Stonebraker, M.: A comparison of approaches to large-scale data analysis. In: Proceedings of the 2009 ACM SIGMOD International Conference on Management of Data, pp. 165鈥?78. ACM, New York (2009)
    17. Stonebraker, M., Cattell, R.: 10 rules for scalable performance in 鈥檚imple operation鈥?datastores. Commun. ACM 54(6), 72鈥?0 (2011) CrossRef
    18. Su, X., Swart, G.: Oracle in-database hadoop: when mapreduce meets RDBMS. In: Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data, pp. 779鈥?90. ACM, New York (2012) CrossRef
    19. Thusoo, A., Sarma, J.S., Jain, N., Shao, Z., Chakka, P., Anthony, S., Liu, H., Wyckoff, P., Murthy, R.: Hive: a warehousing solution over a map-reduce framework. In: Proceedings of the VLDB Endowment. VLDB Endowment, Armonk pp. 1626鈥?629 (2009)
    20. Xu, Y., Hu, S.: QMapper: a tool for SQL optimization on hive using query rewriting. In: Proceedings of the 22nd International Conference on World Wide Web Companion, pp. 212鈥?21. ACM, Geneva (2013)
  • 作者单位:Wu-Chun Chung (1)
    Hung-Pin Lin (1)
    Shih-Chang Chen (1)
    Mon-Fong Jiang (2)
    Yeh-Ching Chung (1)

    1. Dept. of Computer Science, National Tsing Hua University, Hsinchu, 300, Taiwan
    2. is-land Systems Inc., Hsinchu Science Park, 3F, No.4, Prosperity Rd. 2, Hsinchu, 300, Taiwan
  • ISSN:1573-7535
文摘
As data exploration has increased rapidly in recent years, the datastore and data processing are getting more and more attention in extracting important information. To find a scalable solution to process the large-scale data is a critical issue in either the relational database system or the emerging NoSQL database. With the inherent scalability and fault tolerance of Hadoop, MapReduce is attractive to process the massive data in parallel. Most of previous researches focus on developing the SQL or SQL-like queries translator with the Hadoop distributed file system. However, it could be difficult to update data frequently in such file system. Therefore, we need a flexible datastore as HBase not only to place the data over a scale-out storage system, but also to manipulate the changeable data in a transparent way. However, the HBase interface is not friendly enough for most users. A GUI composed of SQL client application and database connection to HBase will ease the learning curve. In this paper, we propose the JackHare framework with SQL query compiler, JDBC driver and a systematical method using MapReduce framework for processing the unstructured data in HBase. After importing the JDBC driver to a SQL client GUI, we can exploit the HBase as the underlying datastore to execute the ANSI-SQL queries. Experimental results show that our approaches can perform well with efficiency and scalability.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700