Filter-embedding semiring fusion for programming with MapReduce
详细信息    查看全文
  • 作者:Kento Emoto (1) emoto@mist.i.u-tokyo.ac.jp
    Sebastian Fischer (2)
    Zhenjiang Hu (2)
  • 关键词:MapReduce – ; List homomorphisms – ; Functional programming – ; Program transformation – ; Program calculation – ; Semiring computation
  • 刊名:Formal Aspects of Computing
  • 出版年:2012
  • 出版时间:July 2012
  • 年:2012
  • 卷:24
  • 期:4-6
  • 页码:623-645
  • 全文大小:318.9 KB
  • 参考文献:1. Abdali SK (1993) Parallel computations in *-semirings. In: Fischer K, Loustaunau P, Shapiro J, Green E, Farkas D (eds) Computational algebra. Lecture notes in pure and applied mathematics, vol 151. CRC Press, Boca Raton, pp 1–13
    2. Anand S, Chin W-N, Khoo S-C (2001) Charting patterns on price history. In: Proceedings of the sixth ACM SIGPLAN international conference on functional programming, ICFP ’01. ACM, New York, pp 134–145
    3. Abdali SK, Saunders BD (1985) Transitive closure and related semiring properties via eliminants. Theor Comput Sci 40: 257–274
    4. Bird R, de Moor O (1996) Algebra of programming. Prentice Hall, Englewood Cliffs
    5. Bird R (1987) An introduction to the theory of lists. In: Broy M (ed.) Logic of programming and calculi of discrete design. Springer, Berlin, pp 5–42
    6. Bird R (1998) Introduction to functional programming using Haskell. Prentice Hall, Englewood Cliffs
    7. Bistarelli S, Montanari U, Rossi F (1997) Semiring-based constraint satisfaction and optimization. J ACM 44: 201–236
    8. Bistarelli S, Montanari U, Rossi F (2001) Semiring-based constraint logic programming: syntax and semantics. ACM Trans Program Lang Syst 23: 1–29
    9. Bistarelli S, Montanari U, Rossi F, Santini F (2010) Unicast and multicast QoS routing with soft-constraint logic programming. ACM Trans Comput Logic 12(5): 1–548
    10. Butkovič P (2010) Max-linear systems: theory and algorithms. Springer, Berlin
    11. Carre B (1979) Graphs and networks. Clarendon Press, Oxford
    12. Chin W-N, Khoo S-C, Hu Z, Takeichi M (2000) Deriving parallel codes via invariants. In: Static analysis. 7th international symposium, SAS 2000. LNCS, vol 1824. Springer, Berlin , pp 75–94
    13. Cole M (1995) Parallel programming with list homomorphisms. Parallel Process Lett 5(2): 191–203
    14. Conway JH (1971) Regular algebra and finite machines. Chapman & Hall, London
    15. Dean J, Ghemawat S (2008) MapReduce: simplified data processing on large clusters. Commun ACM 51: 107–113
    16. Emoto K (2011) An algebraic approach to efficient parallel algorithms for nested reductions. Technical report METR2011-01, Department of Mathematical Informatics, Graduate School of Information Science and Technology, University of Tokyo
    17. Fisher AL, Ghuloum AM (1994) Parallelizing complex scans and reductions. In: Proceedings of the ACM SIGPLAN 1994 conference on programming language design and implementation (PLDI ’94). ACM, New York, pp 135–146
    18. Green TJ, Karvounarakis G, Tannen V (2007) Provenance semirings. In: Proceedings of the twenty-sixth ACM SIGMOD-SIGACT-SIGART symposium on principles of database systems, PODS ’07. ACM, New York, pp 31–40
    19. Gill A, Launchbury J, Peyton~Jones SL (1993) A short cut to deforestation. In: Conference on functional programming languages and computer architecture, pp 223–232
    20. Gondran M, Minoux M, Vajda S (1984) Graphs and algorithms. Wiley, New York
    21. Goodman J (1999) Semiring parsing. Comput Linguist 25: 573–605
    22. Gorlatch S (1996) Systematic efficient parallelization of scan and other list homomorphisms. In: Euro-Par’96 parallel processing. LNCS, vol 1124. Springer, Berlin, pp 401–408
    23. Gorlatch S (1997) Optimizing compositions of scans and reductions in parallel program derivation. Technical report MPI-9711, Universit盲t Passau
    24. Hazewinkel, M (ed.) (2002) Encyclopaedia of mathematics. Springer, Berlin
    25. Ho T-J, Chen B-S (2006) Novel extended Viterbi-based multiple-model algorithms for state estimation of discrete-time systems with Markov jump parameters. IEEE Trans Signal Process 54(2): 393–404
    26. He Y (1988) Extended Viterbi algorithm for second order hidden markov process. In: 9th International conference on pattern recognition, vol 2. IEEE Press, pp 718–720
    27. Henriksen JG, Jensen JL, Jorgensen ME, Klarlund N, Paige R, Rauhe T, Sandholm A (1995) Mona: monadic second-order logic in practice. In: Tools and algorithms for the construction and analysis of systems. LNCS, vol 1019. Springer, Berlin, pp 89–110
    28. Hu Z, Takeichi M, Chin W-N (1998) Parallelization in calculational forms. In: 25th ACM symposium on principles of programming languages (POPL’98), San Diego, California, USA. ACM Press, pp 316–328
    29. Hu Z, Yokoyama T, Takeichi M (2005) Program optimization and transformation in calculational form. In: Summer School on generative and transformational techniques in software engineering (GTTSE 2005)
    30. Karloff H, Suri S, Vassilvitskii S (2010) A model of computation for mapreduce. In: Proceedings of the twenty-first annual ACM-SIAM symposium on discrete algorithms, SODA ’10. SIAM, pp 938–948
    31. Kohlas J, Wilson N (2008) Semiring induced valuation algebras: exact and approximate local computation algorithms. Artif Intell 172: 1360–1399
    32. L盲mmel R (2008) Google’s mapreduce programming model: revisited. Sci Comput Program 70(1): 1–30
    33. Lin J, Dyer C (2010) Data-intensive text processing with MapReduce. Morgan and Claypool Publishers
    34. Liu Y, Hu Z, Matsuzaki K (2011) Towards systematic parallel programming over mapreduce. In: Euro-Par 2011 parallel processing. LNCS, vol 6853. Springer, Berlin
    35. MapReduce Application Paper List (2011) http://www.mendeley.com/groups/1058401/mapreduce-applications/papers/
    36. Larrosa J, Oliveras A, Rodr铆guez-Carbonell E (2010) Semiring-induced propositional logic: definition and basic algorithms. In: Proceedings of the 16th international conference on logic for programming, artificial intelligence, and reasoning, LPAR’10. Springer, Berlin, pp 332–347
    37. Matsuzaki K (2007) Parallel programming with tree skeletons. PhD thesis, Graduate School of Information Science and Technology, University of Tokyo
    38. Morihata A, Matsuzaki K, Hu Z, Takeichi M (2009) The third homomorphism theorem on trees: downward & upward lead to divide-and-conquer. In: Proceedings of the 36th annual ACM SIGPLAN-SIGACT symposium on principles of programming languages, POPL ’09. ACM, New York, pp 177–185
    39. Morita K, Morihata A, Matsuzaki K, Hu Z, Takeichi M (2007) Automatic inversion generates divide-and-conquer parallel programs. In: ACM SIGPLAN 2007 conference on programming language design and implementation (PLDI ’07). ACM, New York, pp 146–155
    40. Sato S, Iwasaki H (2011) Automatic parallelization via matrix multiplication. In: Proceedings of the 32nd ACM SIGPLAN conference on programming language design and implementation (PLDI ’11). ACM, New York, pp 470–479
    41. Skillicorn DB (1992) The Bird-Meertens formalism as a parallel model. In: NATO ARW “Software for Parallel Computation”, 92
    42. Tarjan RE (1981) A unified approach to path problems. J ACM 28: 577–593
    43. Thomas W (1990) Automata on infinite objects. In: Handbook of theoretical computer science. Formal models and sematics, vol B. Elsevier/MIT Press, pp 133–192
    44. Takano A, Hu Z, Takeichi M (1998) Program transformation in calculational form. ACM Comput Surv 30(3)
    45. Takano A, Meijer E (1995) Shortcut deforestation in calculational form. In: Proceedings of conference on functional programming languages and computer architecture, La Jolla, California, pp 306–313
    46. Wadler P (1989) Theorems for free! In: Proceedings of the fourth international conference on functional programming languages and computer architecture, FPCA ’89. ACM, New York, pp 347–359
    47. White T (2009) Hadoop: the definitive guide. O’Reilly Media
    48. Zantema H (1992) Longest segment problems. Sci Comput Program 18: 39–66
    49. Harrison PG, Grant-Duff ZN (1996) Parallelism via homomorphism. Parallel Process Lett 6(2): 279–295
  • 作者单位:1. The University of Tokyo, Hongo 7-3-1, Bunkyo, Tokyo, Japan2. National Institute of Informatics, Tokyo, Japan
  • ISSN:1433-299X
文摘
We show that MapReduce, the de facto standard for large scale data-intensive parallel programming, can be equipped with a programming theory in calculational form. By integrating the generate-and-test programming paradigm and semirings for aggregation of results, we propose a novel parallel programming framework for MapReduce. The framework consists of two important calculation theorems: the shortcut fusion theorem of semiring homomorphisms bridges the gap between specifications and efficient implementations, and the filter-embedding theorem helps to develop parallel programs in a systematic and incremental way.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700