Analytically Modeling the Memory Hierarchy Performance of Modern Processor Systems.
详细信息   
  • 作者:Liu ; Fang.
  • 学历:Doctor
  • 年:2011
  • 导师:Solihin,Yan,eadvisor
  • 毕业院校:The University of North Carolina
  • ISBN:9781124753522
  • CBH:3463793
  • Country:USA
  • 语种:English
  • FileSize:2167226
  • Pages:120
文摘
The first goal of this dissertation is to understand how cache parameters and application behavior influence the number of context switch misses the application suffers from. We characterize a previously-unreported type of context switch miss that occurs as the artifact of the interaction of cache replacement policy and an applications temporal reuse behavior. We characterize the behavior of these "reordered misses" for various applications,cache sizes,and various amount of cache perturbation. As a second contribution,we develop an analytical model that reveals the mathematical relationship between cache design parameters,an applications temporal reuse pattern,and the number of context switch misses the application suffers from. We validate the model against simulation studies and find it is sufficiently accurate in predicting the trends of context switch misses with regard to cache perturbation amount. The mathematical relationship provided by the model allows us to derive insights into precisely why some applications are more vulnerable to context switch misses than others. Through a case study on prefetching,we find that prefetching tends to aggravate context switch misses,and a less aggressive prefetching technique can reduce the number of context switch misses. We also investigate how cache size affects context switch misses. Our study shows that under relatively heavy workloads in the system,the worst case number of context switch misses for an application tends to increase proportionally with cache size,to an extent that may completely negate the reduction in other types of cache misses. The second goal of this dissertation is to propose a simple yet powerful analytical model that gives us the ability to answer several important questions: 1) How does off-chip bandwidth partitioning improve system performance? 2) In what situations is the performance improvement high or low,and what factors determine that? 3) In what way does cache and bandwidth partitioning interact,and is the interaction negative or positive? 4) Can a theoretically optimum bandwidth partition be derived,and if so,what factors affect it? We believe understanding the answers to these questions is very valuable to CMP system designers in coming up with strategies to deal with the scarcity of off-chip bandwidth in future CMPs. Modern high performance processors widely employ hardware prefetching techniques to hide long memory access latency. While very useful,hardware prefetching tends to aggravate the bandwidth wall due to high bandwidth consumption,where system performance is increasingly limited by the availability of off-chip pin bandwidth in CMPs. Prior studies have proposed to improve prefetching policy or partitioning bandwidth among cores to improve bandwidth usage. However,they either study techniques in isolation,leaving the significant interaction unexplored,or perform them in an ad-hoc manner,resulting in important insights missed. The third goal of this dissertation is to investigate how hardware prefetching and memory bandwidth partitioning impact CMP system performance and how they interact. To arrive at the goal,we propose an analytical model-based study. The model includes a composite prefetching metric that can help determine under which conditions prefetching can improve system performance,a bandwidth partitioning model that takes into account prefetching effects,and a derivation of the weighted speedup-optimum bandwidth partition sizes for different cores. Through model-driven case studies,we find several interesting observations that can be valuable for future CMP system design and optimization. We also explore simulation-based empirical evaluation to validate the observations and show that maximum system performance can be achieved by selective prefetching,guided by the composite prefetching metric,coupled with dynamic bandwidth partitioning. Abstract shortened by UMI.)

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700