Fencing Programs with Self-Invalidation and Self-Downgrade
详细信息    查看全文
  • 刊名:Lecture Notes in Computer Science
  • 出版年:2016
  • 出版时间:2016
  • 年:2016
  • 卷:9688
  • 期:1
  • 页码:19-35
  • 全文大小:562 KB
  • 参考文献:1.Abdulla, P.A., Atig, M.F., Chen, Y.-F., Leonardsson, C., Rezine, A.: Counter-example guided fence insertion under TSO. In: Flanagan, C., König, B. (eds.) TACAS 2012. LNCS, vol. 7214, pp. 204–219. Springer, Heidelberg (2012)CrossRef
    2.Adve, S.V., Hill, M.D.: Weak ordering - a new definition. In: ISCA, pp. 2–14 (1990)
    3.Alglave, J., Kroening, D., Nimal, V., Poetzl, D.: Don’t sit on the fence. In: Biere, A., Bloem, R. (eds.) CAV 2014. LNCS, vol. 8559, pp. 508–524. Springer, Heidelberg (2014)
    4.Bouajjani, A., Derevenetc, E., Meyer, R.: Checking and enforcing robustness against TSO. In: Felleisen, M., Gardner, P. (eds.) ESOP 2013. LNCS, vol. 7792, pp. 533–553. Springer, Heidelberg (2013)CrossRef
    5.Bouajjani, A., Meyer, R., Möhlmann, E.: Deciding robustness against total store ordering. In: Aceto, L., Henzinger, M., Sgall, J. (eds.) ICALP 2011, Part II. LNCS, vol. 6756, pp. 428–440. Springer, Heidelberg (2011)CrossRef
    6.Chase, D., Lev, Y.: Dynamic circular work-stealing deque. In: SPAA, pp. 21–28 (2005)
    7.Choi, B., Komuravelli, R., Sung, H., Smolinski, R., Honarmand, N., Adve, S.V., Adve, V.S., Carter, N.P., Chou, C.T.: DeNovo: rethinking the memory hierarchy for disciplined parallelism. In: PACT, pp. 155–166 (2011)
    8.Davari, M., Ros, A., Hagersten, E., Kaxiras, S.: An efficient, self-contained, on-chip, directory: DIR\(_1\) -SISD. In: PACT, pp. 317–330 (2015)
    9.Dice, D., Shalev, O., Shavit, N.N.: Transactional locking II. In: Dolev, S. (ed.) DISC 2006. LNCS, vol. 4167, pp. 194–208. Springer, Heidelberg (2006)CrossRef
    10.Dijkstra, E.W.: Cooperating sequential processes (2002)
    11.Herlihy, M., Shavit, N.: The Art of Multiprocessor Programming. Morgan Kaufmann Publishers Inc., San Francisco (2008)
    12.Hower, D.R., Hechtman, B.A., Beckmann, B.M., Gaster, B.R., Hill, M.D., Reinhardt, S.K., Wood, D.A.: Heterogeneous-race-free memory models. In: ASPLOS, pp. 427–440 (2014)
    13.Kaxiras, S., Keramidas, G.: SARC coherence: scaling directory cache coherence in performance and power. IEEE Micro 30(5), 54–65 (2011)CrossRef
    14.Kaxiras, S., Ros, A.: A new perspective for efficient virtual-cache coherence. In: ISCA, pp. 535–547 (2013)
    15.Koukos, K., Ros, A., Hagersten, E., Kaxiras, S.: Building heterogeneous unified virtual memories (UVMS) without the overhead. ACM TACO 13(1), 1:1–1:22 (2016)
    16.Kuperstein, M., Vechev, M., Yahav, E.: Automatic inference of memory fences. In: FMCAD, pp. 111–119. IEEE (2010)
    17.Lamport, L.: A new solution of dijkstra’s concurrent programming problem. Commun. ACM 17(8), 453–455 (1974)MathSciNet CrossRef MATH
    18.Lebeck, A.R., Wood, D.A.: Dynamic self-invalidation: reducing coherence overhead in shared-memory multiprocessors. In: ISCA, pp. 48–59 (1995)
    19.Liu, F., Nedev, N., Prisadnikov, N., Vechev, M.T., Yahav, E.: Dynamic synthesis for relaxed memory models. In: PLDI, pp. 429–440 (2012)
    20.Magnusson, P., Landin, A., Hagersten, E.: Queue locks on cache coherent multiprocessors. In: Proceedings of Eighth International Parallel Processing Symposium, pp. 165–171. IEEE (1994)
    21.Martin, M.M., Sorin, D.J., Beckmann, B.M., Marty, M.R., Xu, M., Alameldeen, A.R., Moore, K.E., Hill, M.D., Wood, D.A.: Multifacet’s general execution-driven multiprocessor simulator (GEMS) toolset. Comput. Archit. News 33(4), 92–99 (2005)CrossRef
    22.Mellor-Crummey, J.M., Scott, M.L.: Algorithms for scalable synchronization on shared-memory multiprocessors. ACM Trans. Comput. Syst. (TOCS) 9(1), 21–65 (1991)CrossRef
    23.Ros, A., Davari, M., Kaxiras, S.: Hierarchical private/shared classification: the key to simple and efficient coherence for clustered cache hierarchies. In: HPCA, pp. 186–197 (2015)
    24.Ros, A., Kaxiras, S.: Complexity-effective multicore coherence. In: PACT, pp. 241–252 (2012)
    25.Ros, A., Kaxiras, S.: Callback: efficient synchronization without invalidation with a directory just for spin-waiting. In: ISCA, pp. 427–438 (2015)
    26.Ros, A., Kaxiras, S.: Fast & furious: a tool for detecting covert racing. In: PARMA and DITAM, pp. 1–6 (2015)
    27.Sakalis, C., Leonardsson, C., Kaxiras, S., Ros, A.: Splash-3: a properly synchronized benchmark suite for contemporary research. In: ISPASS (2016)
    28.Schmidt, D.C., Harrison, T.: Double-checked locking - an optimization pattern for efficiently initializing and accessing thread-safe objects. In: PLoP (1996)
    29.Scott, M.L.: Shared-Memory Synchronization. Morgan & Claypool, San Rafael (2013)
    30.Shasha, D., Snir, M.: Efficient and correct execution of parallel programs that share memory. ACM Trans. Program. Lang. Syst. (TOPLAS) 10(2), 282–312 (1988)CrossRef
    31.Sung, H., Adve, S.V.: DeNovoSync: efficient support for arbitrary synchronization without writer-initiated invalidations. In: ASPLOS, pp. 545–559 (2015)
    32.Sung, H., Komuravelli, R., Adve, S.V.: DeNovoND: efficient hardware support for disciplined non-determinism. In: ASPLOS, pp. 13–26 (2013)
    33.Woo, S.C., Ohara, M., Torrie, E., Singh, J.P., Gupta, A.: The SPLASH-2 programs: characterization and methodological considerations. In: ISCA, pp. 24–36 (1995)
  • 作者单位:Parosh Aziz Abdulla (15)
    Mohamed Faouzi Atig (15)
    Stefanos Kaxiras (15)
    Carl Leonardsson (15)
    Alberto Ros (16)
    Yunyun Zhu (15)

    15. Uppsala University, Uppsala, Sweden
    16. Universidad de Murcia, Murcia, Spain
  • 丛书名:Formal Techniques for Distributed Objects, Components, and Systems
  • ISBN:978-3-319-39570-8
  • 刊物类别:Computer Science
  • 刊物主题:Artificial Intelligence and Robotics
    Computer Communication Networks
    Software Engineering
    Data Encryption
    Database Management
    Computation by Abstract Devices
    Algorithm Analysis and Problem Complexity
  • 出版者:Springer Berlin / Heidelberg
  • ISSN:1611-3349
  • 卷排序:9688
文摘
Cache coherence protocols using self-invalidation and self-downgrade have recently seen increased popularity due to their simplicity, potential performance efficiency, and low energy consumption. However, such protocols result in memory instruction reordering, thus causing extra program behaviors that are often not intended by the programmer. We propose a novel formal model that captures the semantics of programs running under such protocols, and employs a set of fences that interact with the coherence layer. Using the model, we perfform a reachability analysis that can check whether a program satisfies a given safety property with the current set of fences. Based on an algorithm in [19], we describe a method for insertion of optimal sets of fences that ensure correctness of the program under such protocols. The method relies on a counter-example guided fence insertion procedure. One feature of our method is that it can handle a variety of fences (with different costs). This diversity makes optimization more difficult since one has to optimize the total cost of the inserted fences, rather than just their number. To demonstrate the strength of our approach, we have implemented a prototype and run it on a wide range of examples and benchmarks. We have also, using simulation, evaluated the performance of the resulting fenced programs.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700