A Phase Behavior Aware Dynamic Cache Partitioning Scheme for CMPs

详细信息查看全文

作者：Xiaofei Liao ; Rentong Guo ; Danping Yu…
关键词：Cache contention ; Dynamic cache partitioning ; Phase behavior ; Multi ; program
刊名：International Journal of Parallel Programming
出版年：2016
出版时间：February 2016
年：2016
卷：44
期：1
页码：68-86
全文大小：915 KB
参考文献：1.Chandra, D., Guo, F., Kim, S., Solihin, Y.: Predicting inter-thread cache contention on a chip multi-processor architecture. In: Proceedings of the 11th International Symposium on High-Performance Computer Architecture, HPCA-11, pp. 340–351 (2005)
2.Cho, S., Jin, L.: Managing distributed, shared l2 caches through os-level page allocation. In: Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO 39, pp. 455–468. IEEE Computer Society, Washington, DC, USA (2006)
3.Davies, B., Bouguet, J., Polito, M., Annavaram, M.: ipart: an automated phase analysis and recognition tool. Tech. rep., IR-TR-2004-1-iPART, Intel Corporation (2004)
4.Dhodapkar, A., Smith, J.: Managing multi-configuration hardware via dynamic working set analysis. In: Proceedings of the 29th Annual International Symposium on Computer Architecture, ISCA 29, pp. 233–244 (2002)
5.Dhodapkar, A.S., Smith, J.E.: Comparing program phase detection techniques. In: Proceedings of the 36th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO 36, pp. 217–228. IEEE Computer Society, Washington, DC, USA (2003)
6.He, L., Yu, Z., Jin, H.: Fractalmrc: online cache miss rate curve prediction on commodity systems. In: Proceedings of the IEEE 26th International Parallel Distributed Processing Symposium, IPDPS-26, pp. 1341–1351 (2012)
7.Henning, J.L.: Spec cpu2006 benchmark descriptions. SIGARCH Comput. Archit. News 34(4), 1–17 (2006)MathSciNet CrossRef
8.Isci, C., Contreras, G., Martonosi, M.: Live, runtime phase monitoring and prediction on real systems with application to dynamic power management. In: Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO 39, pp. 359–370. IEEE Computer Society, Washington, DC, USA (2006)
9.Kihm, J., Settle, A., Janiszewski, A., Connors, D.A.: Understanding the impact of inter-thread cache interference on ilp in modern smt processors. J Instr Level Parallelism 7(2), 1–28 (2005)
10.Kim, S., Chandra, D., Solihin, Y.: Fair cache sharing and partitioning in a chip multiprocessor architecture. In: Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques. PACT ’04, pp. 111–122. IEEE Computer Society, Washington, DC, USA (2004)
11.Lin, J., Lu, Q., Ding, X., Zhang, Z., Zhang, X., Sadayappan, P.: Gaining insights into multicore cache partitioning: bridging the gap between simulation and real systems. In: Proceedings of the 14th International Symposium on High Performance Computer Architecture, HPCA-14, pp. 367–378 (2008)
12.Lin, J., Lu, Q., Ding, X., Zhang, Z., Zhang, X., Sadayappan, P.: Enabling software management for multicore caches with a lightweight hardware support. In: Proceedings of the Conference on High Performance Computing Networking. Storage and Analysis, SC ’09, pp. 1–12. ACM, New York, NY, USA (2009)
13.Patil, H., Cohn, R., Charney, M., Kapoor, R., Sun, A., Karunanidhi, A.: Pinpointing representative portions of large intel programs with dynamic instrumentation. In: Proceedings of the 37th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO 37, pp. 81–92. IEEE Computer Society, Washington, DC, USA (2004)
14.Perelman, E., Polito, M., Bouguet, J.Y., Sampson, J., Calder, B., Dulong, C.: Detecting phases in parallel applications on shared memory architectures. In: Proceedings of the 20th International Parallel and Distributed Processing Symposium, IPDPS 20 (2006)
15.Qureshi, M.K., Patt, Y.N.: Utility-based cache partitioning: a low-overhead, high-performance, runtime mechanism to partition shared caches. In: Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO 39, pp. 423–432. IEEE Computer Society, Washington, DC, USA (2006)
16.Ravindar, A., Srikant, Y.N.: Implications of program phase behavior on timing analysis. In: Proceedings of the 15th Workshop on the Interaction between Compilers and Computer Architectures, pp. 71–79 (2011)
17.Sembrant, A., Eklov, D., Hagersten, E.: Efficient software-based online phase classification. In: Proceedings of the 2011 IEEE International Symposium on Workload Characterization, IISWC ’11, pp. 104–115 (2011)
18.Shen, X., Zhong, Y., Ding, C.: Locality phase prediction. In: Proceedings of the 11th International Conference on Architectural Support for Programming Languages and Operating Systems. ASPLOS XI, pp. 165–176. ACM, New York, NY, USA (2004)
19.Sherwood, T., Calder, B.: Time varying behavior of programs. Tech. Rep. CS99-630. University of California, San Diego (1999)
20.Sherwood, T., Sair, S., Calder, B.: Phase tracking and prediction. In: Proceedings of the 30th Annual International Symposium on Computer Architecture. ISCA ’03, pp. 336–349. ACM, New York, NY, USA (2003)
21.Srikantaiah, S., Kandemir, M., Irwin, M.J.: Adaptive set pinning: managing shared caches in chip multiprocessors. In: Proceedings of the 13th International Conference on Architectural Support for Programming Languages and Operating Systems. ASPLOS XIII, pp. 135–144. ACM, New York, NY, USA (2008)
22.Srivastava, A., Eustace, A.: Atom: A system for building customized program analysis tools. In: Proceedings of the ACM SIGPLAN 1994 Conference on Programming Language Design and Implementation. PLDI ’94, pp. 196–205. ACM, New York, NY, USA (1994)
23.Suh, G., Rudolph, L., Devadas, S.: Dynamic partitioning of shared cache memory. J. Supercomput. 28(1), 7–26 (2004)CrossRef MATH
24.Sundararajan, K., Porpodas, V., Jones, T., Topham, N., Franke, B.: Cooperative partitioning: energy-efficient cache partitioning for high-performance cmps. In: Proceedings of the 18th International Symposium on High Performance Computer Architecture, HPCA-18, pp. 1–12 (2012)
25.Tam, D., Azimi, R., Soares, L., Stumm, M.: Managing shared l2 caches on multicore systems in software. In: Proceedings of the Workshop on the Interaction between Operating Systems and Computer Architecture, pp. 26–33 (2007)
26.Taylor, G., Davies, P., Farmwald, M.: The tlb slice—a low-cost high-speed address translation mechanism. In: Proceedings of the 17th Annual International Symposium on Computer Architecture. ISCA ’90, pp. 355–363. ACM, New York, NY, USA (1990)
27.Van Biesbrouck, M., Sherwood, T., Calder, B.: A co-phase matrix to guide simultaneous multithreading simulation. In: Proceedings of the 2004 IEEE International Symposium on Performance Analysis of Systems and Software, ISPASS ’04, pp. 45–56 (2004)
28.Yu, Z., Zhang, W., Tu, X.: Mt-profiler: a parallel dynamic analysis framework based on two-stage sampling. In: Olivier, T., Pen-Chung, Y., Binyu Z. (eds.) Advanced Parallel Processing Technologies, pp. 172–185. Springer, New York (2011)
作者单位：Xiaofei Liao (1)
Rentong Guo (1)
Danping Yu (1)
Hai Jin (1)
Li Lin (1)

1. Cluster and Grid Computing Lab, Services Computing Technology and System Lab, School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan, 430074, China
刊物类别：Computer Science
刊物主题：Theory of Computation
Processor Architectures
Software Engineering, Programming and Operating Systems
出版者：Springer Netherlands
ISSN：1573-7640

文摘

In multi-program environment, cache contention among processors can significantly degrade system performance. Cache partitioning served as an effective measure has been widely studied, especially for dynamic cache partitioning. However, it is difficult to decide the best cache quota which should be allocated to co-scheduled programs and the best time when a cache adjusting should be performed in dynamic cache partitioning scheme. This paper presents a novel dynamic cache partitioning mechanism based on the phase behavior of programs. It uses the performance monitoring units of modern processors and detects the phase behavior of programs to guide the cache partitioning at run-time. Since programs have recurring phase behavior during the whole execution time, on one hand, we can adjust the cache quota when a phase change occurs, on the other hand, we can make cache partitioning policy with higher accuracy and lower overhead by classifying phases. The method proposed in this work is validated in the measured results for applications from SPEC CPU 2006 benchmark suite. Compared with the performance of shared cache scheme, our method can achieve a speedup up to 1.214 for co-scheduled applications. Keywords Cache contention Dynamic cache partitioning Phase behavior Multi-program

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700