Fencing Programs with Self-Invalidation and Self-Downgrade

详细信息查看全文

刊名：Lecture Notes in Computer Science
出版年：2016
出版时间：2016
年：2016
卷：9688
期：1
页码：19-35
全文大小：562 KB
参考文献：1.Abdulla, P.A., Atig, M.F., Chen, Y.-F., Leonardsson, C., Rezine, A.: Counter-example guided fence insertion under TSO. In: Flanagan, C., König, B. (eds.) TACAS 2012. LNCS, vol. 7214, pp. 204–219. Springer, Heidelberg (2012)CrossRef
2.Adve, S.V., Hill, M.D.: Weak ordering - a new definition. In: ISCA, pp. 2–14 (1990)
3.Alglave, J., Kroening, D., Nimal, V., Poetzl, D.: Don’t sit on the fence. In: Biere, A., Bloem, R. (eds.) CAV 2014. LNCS, vol. 8559, pp. 508–524. Springer, Heidelberg (2014)
4.Bouajjani, A., Derevenetc, E., Meyer, R.: Checking and enforcing robustness against TSO. In: Felleisen, M., Gardner, P. (eds.) ESOP 2013. LNCS, vol. 7792, pp. 533–553. Springer, Heidelberg (2013)CrossRef
5.Bouajjani, A., Meyer, R., Möhlmann, E.: Deciding robustness against total store ordering. In: Aceto, L., Henzinger, M., Sgall, J. (eds.) ICALP 2011, Part II. LNCS, vol. 6756, pp. 428–440. Springer, Heidelberg (2011)CrossRef
6.Chase, D., Lev, Y.: Dynamic circular work-stealing deque. In: SPAA, pp. 21–28 (2005)
7.Choi, B., Komuravelli, R., Sung, H., Smolinski, R., Honarmand, N., Adve, S.V., Adve, V.S., Carter, N.P., Chou, C.T.: DeNovo: rethinking the memory hierarchy for disciplined parallelism. In: PACT, pp. 155–166 (2011)
8.Davari, M., Ros, A., Hagersten, E., Kaxiras, S.: An efficient, self-contained, on-chip, directory: DIR\(_1\) -SISD. In: PACT, pp. 317–330 (2015)
9.Dice, D., Shalev, O., Shavit, N.N.: Transactional locking II. In: Dolev, S. (ed.) DISC 2006. LNCS, vol. 4167, pp. 194–208. Springer, Heidelberg (2006)CrossRef
10.Dijkstra, E.W.: Cooperating sequential processes (2002)
11.Herlihy, M., Shavit, N.: The Art of Multiprocessor Programming. Morgan Kaufmann Publishers Inc., San Francisco (2008)
12.Hower, D.R., Hechtman, B.A., Beckmann, B.M., Gaster, B.R., Hill, M.D., Reinhardt, S.K., Wood, D.A.: Heterogeneous-race-free memory models. In: ASPLOS, pp. 427–440 (2014)
13.Kaxiras, S., Keramidas, G.: SARC coherence: scaling directory cache coherence in performance and power. IEEE Micro 30(5), 54–65 (2011)CrossRef
14.Kaxiras, S., Ros, A.: A new perspective for efficient virtual-cache coherence. In: ISCA, pp. 535–547 (2013)
15.Koukos, K., Ros, A., Hagersten, E., Kaxiras, S.: Building heterogeneous unified virtual memories (UVMS) without the overhead. ACM TACO 13(1), 1:1–1:22 (2016)
16.Kuperstein, M., Vechev, M., Yahav, E.: Automatic inference of memory fences. In: FMCAD, pp. 111–119. IEEE (2010)
17.Lamport, L.: A new solution of dijkstra’s concurrent programming problem. Commun. ACM 17(8), 453–455 (1974)MathSciNet CrossRef MATH
18.Lebeck, A.R., Wood, D.A.: Dynamic self-invalidation: reducing coherence overhead in shared-memory multiprocessors. In: ISCA, pp. 48–59 (1995)
19.Liu, F., Nedev, N., Prisadnikov, N., Vechev, M.T., Yahav, E.: Dynamic synthesis for relaxed memory models. In: PLDI, pp. 429–440 (2012)
20.Magnusson, P., Landin, A., Hagersten, E.: Queue locks on cache coherent multiprocessors. In: Proceedings of Eighth International Parallel Processing Symposium, pp. 165–171. IEEE (1994)
21.Martin, M.M., Sorin, D.J., Beckmann, B.M., Marty, M.R., Xu, M., Alameldeen, A.R., Moore, K.E., Hill, M.D., Wood, D.A.: Multifacet’s general execution-driven multiprocessor simulator (GEMS) toolset. Comput. Archit. News 33(4), 92–99 (2005)CrossRef
22.Mellor-Crummey, J.M., Scott, M.L.: Algorithms for scalable synchronization on shared-memory multiprocessors. ACM Trans. Comput. Syst. (TOCS) 9(1), 21–65 (1991)CrossRef
23.Ros, A., Davari, M., Kaxiras, S.: Hierarchical private/shared classification: the key to simple and efficient coherence for clustered cache hierarchies. In: HPCA, pp. 186–197 (2015)
24.Ros, A., Kaxiras, S.: Complexity-effective multicore coherence. In: PACT, pp. 241–252 (2012)
25.Ros, A., Kaxiras, S.: Callback: efficient synchronization without invalidation with a directory just for spin-waiting. In: ISCA, pp. 427–438 (2015)
26.Ros, A., Kaxiras, S.: Fast & furious: a tool for detecting covert racing. In: PARMA and DITAM, pp. 1–6 (2015)
27.Sakalis, C., Leonardsson, C., Kaxiras, S., Ros, A.: Splash-3: a properly synchronized benchmark suite for contemporary research. In: ISPASS (2016)
28.Schmidt, D.C., Harrison, T.: Double-checked locking - an optimization pattern for efficiently initializing and accessing thread-safe objects. In: PLoP (1996)
29.Scott, M.L.: Shared-Memory Synchronization. Morgan & Claypool, San Rafael (2013)
30.Shasha, D., Snir, M.: Efficient and correct execution of parallel programs that share memory. ACM Trans. Program. Lang. Syst. (TOPLAS) 10(2), 282–312 (1988)CrossRef
31.Sung, H., Adve, S.V.: DeNovoSync: efficient support for arbitrary synchronization without writer-initiated invalidations. In: ASPLOS, pp. 545–559 (2015)
32.Sung, H., Komuravelli, R., Adve, S.V.: DeNovoND: efficient hardware support for disciplined non-determinism. In: ASPLOS, pp. 13–26 (2013)
33.Woo, S.C., Ohara, M., Torrie, E., Singh, J.P., Gupta, A.: The SPLASH-2 programs: characterization and methodological considerations. In: ISCA, pp. 24–36 (1995)
作者单位：Parosh Aziz Abdulla (15)
Mohamed Faouzi Atig (15)
Stefanos Kaxiras (15)
Carl Leonardsson (15)
Alberto Ros (16)
Yunyun Zhu (15)

15. Uppsala University, Uppsala, Sweden
16. Universidad de Murcia, Murcia, Spain
丛书名：Formal Techniques for Distributed Objects, Components, and Systems
ISBN：978-3-319-39570-8
刊物类别：Computer Science
刊物主题：Artificial Intelligence and Robotics
Computer Communication Networks
Software Engineering
Data Encryption
Database Management
Computation by Abstract Devices
Algorithm Analysis and Problem Complexity
出版者：Springer Berlin / Heidelberg
ISSN：1611-3349
卷排序：9688

文摘

Cache coherence protocols using self-invalidation and self-downgrade have recently seen increased popularity due to their simplicity, potential performance efficiency, and low energy consumption. However, such protocols result in memory instruction reordering, thus causing extra program behaviors that are often not intended by the programmer. We propose a novel formal model that captures the semantics of programs running under such protocols, and employs a set of fences that interact with the coherence layer. Using the model, we perfform a reachability analysis that can check whether a program satisfies a given safety property with the current set of fences. Based on an algorithm in [19], we describe a method for insertion of optimal sets of fences that ensure correctness of the program under such protocols. The method relies on a counter-example guided fence insertion procedure. One feature of our method is that it can handle a variety of fences (with different costs). This diversity makes optimization more difficult since one has to optimize the total cost of the inserted fences, rather than just their number. To demonstrate the strength of our approach, we have implemented a prototype and run it on a wide range of examples and benchmarks. We have also, using simulation, evaluated the performance of the resulting fenced programs.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700