一种基于自更新的简单高效Cache一致性协议
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:A Simple and Efficient Cache Coherence Protocol Based on Self-Updating
  • 作者:何锡明 ; 马胜 ; 黄立波 ; 陈微 ; 王志英
  • 英文作者:He Ximing;Ma Sheng;Huang Libo;Chen Wei;Wang Zhiying;College of Computer, National University of Defense Technology;
  • 关键词:共享存储 ; 片上多处理器 ; cache一致性协议 ; 自更新 ; VISU协议
  • 英文关键词:shared memory;;chip multiprocessors;;cache coherence protocol;;self-updating;;VISU protocol
  • 中文刊名:JFYZ
  • 英文刊名:Journal of Computer Research and Development
  • 机构:国防科技大学计算机学院;
  • 出版日期:2019-04-15
  • 出版单位:计算机研究与发展
  • 年:2019
  • 期:v.56
  • 基金:国家自然科学基金项目(61572508,61672526,61472435,61472432,61202121);; 国防科技大学科研计划项目(ZK-03-06)~~
  • 语种:中文;
  • 页:JFYZ201904004
  • 页数:11
  • CN:04
  • ISSN:11-1777/TP
  • 分类号:45-55
摘要
随着片上多处理器系统核数的增加,当前一致性协议上存在的许多问题使共享存储系统复杂而低效.目前一些一致性协议极其复杂,例如MESI(modified exclusive shared or invalid)协议,存在众多的中间状态和竞争.并且这些协议还会导致额外失效通信,以及大量记录共享信息的目录存储开销(目录协议)或广播消息的网络开销(监听协议).对数据无竞争的程序实现了一种简单高效一致性协议VISU(valid/invalid states based on self-updating),这种协议基于自更新操作(self-updating)、只包含2个稳定状态(valid/invalid).所设计的两状态VISU协议消除了目录和间接事务.首先基于并行编程的数据无竞争(data race free, DRF)模型,采用在同步点进行自更新共享数据来保证正确性.其次利用动态识别私有和共享数据的技术,提出了对私有数据进行写回、对共享数据进行写直达的方案.对于私有数据,简单的写回策略能够简化不必要的片上通信.在L1 cache中,对于共享数据的写直达方式能确保LLC(last level cache)中数据最新从而消除了几乎所有的一致性状态.实现的VISU协议开销低、不需要目录、没有间接传输和众多的一致性状态,且更加容易验证,同时获得了与MESI目录协议几乎相当甚至更优的性能.
        As the number of cores in a chip multiprocessor increases, cache coherence protocols have become a performance bottleneck of the share-memory system. The overhead and complexity of current cache coherence protocols seriously restrict the development of the share-memory system. Specifically, directory protocols need high storage overhead to keep track of sharer list and snooping protocols consume significant network bandwidth to broadcast messages. Some coherence protocols, such as MESI(modified exclusive shared or invalid) protocol, are extremely complex and have numerous transient states and data race. This paper implements a simple and efficient cache coherence protocol named VISU(valid/invalid states based on self-updating) for data-race-free programs. VISU is based on a self-updating mechanism and only includes two stable states(valid and invalid). Furthermore, the VISU protocol eliminates the directory and indirection transactions and reduces significant overheads. First, we propose self-updating shared blocks at synchronization points for correction with the data-race-free guarantee of parallel programming. Second, taking advantage of techniques that dynamically classify private data(only accessed by one processor) and shared data, we propose write-back for private data and write-through for shared data. For private data, a simple write-back policy can reduce the unnecessary on-chip network traffic. In L1 cache, a write-through policy for shared data which can keep the newest shared data in LLC, would obviate almost all coherence states. Our approach implements a truly cost-less two-state coherence protocol. The VISU protocol does not require directory or indirect transfer and is easier to verify while at the same time obtains similar even better performance of MESI directory protocol.
引文
[1]Martin M,Sorin D,Beckmann B,et al.Multifacet’s general execution-driven multiprocessor simulator(GEMS)toolset[J].ACM SIGARCH Computer Architecture News,2005,33(4):92-99
    [2]Cuesta B,Ros A,Gómez M E,et al.Increasing the effectiveness of directory caches by deactivating coherence for private memory blocks[C]//Proc of the 38th Int Symp on Computer Architecture.New York:ACM,2011:93-104
    [3]Kaxiras S,Keramidas G.SARC coherence:Scaling directory cache coherence in performance and power[J].IEEE Micro,2010,30(5):54-65
    [4]Menezo L G,Puente V,Gregorio J A.Flask coherence:Amorphable hybrid coherence protocol to balance energy,performance and scalability[C]//Proc of the 21st IEEE Int Symp on High Performance Computer Architecture.Piscataway,NJ:IEEE,2015:198-209
    [5]Hossain H,Dwarkadas S,Huang M.POPS:Coherence protocol optimization for both private and shared data[C]//Proc of Int Conf on Parallel Architectures and Compilation Techniques.Los Alamitos,CA:IEEE Computer Society,2011:45-55
    [6]Hou Fangyong,Gu Dawu,Xiao Nong,et al.Performance and consistency improvements of Hash tree based disk storage protection[C]//Proc of IEEE Int Conf on Networking,Architecture,and Storage.Los Alamitos,CA:IEEE Computer Society,2009:51-56
    [7]Qian Xuehai,Sahelices B,Qian Depei.Pacifier:Record and replay for relaxed-consistency multiprocessors with distributed directory protocol[J].ACM SIGARCH Computer Architecture News,2014,42(3):433-444
    [8]Huang He,Liu Lei,Song Fenglong,et al.Architecture supported synchronization-based cache coherence protocol for many-core processors[J].Chinese Journal of Computers,2009,32(8):1618-1630(in Chinese)(黄河,刘磊,宋风龙,等.硬件结构支持的基于同步的高速缓存一致性协议[J].计算机学报,2009,32(8):1618-1630)
    [9]Li Gongming.Cache coherence techniques for chip multiprocessor architecture[D].Hefei:University of Science and Technology of China,2013(in Chinese)(李功明.片上多核处理器体系结构中Cache一致性模型研究[D].合肥:中国科学技术大学,2013)
    [10]Zhang Jun,Tian Ze,Mei Kuizhi,et al.Node predicting based direct cache coherence protocol for chip multi-processor[J].Chinese Journal of Computers,2014,37(3):700-720(in Chinese)(张俊,田泽,梅魁志,等.基于节点预测的直接cache一致性协议[J].计算机学报,2014,37(3):700-720)
    [11]Kim D,Ahn J,Kim J,et al.Subspace snooping:Filtering snoops with operating system support[C]//Proc of the 19th Int Conf on Parallel Architectures and Compilation Techniques.New York:ACM,2010:111-122
    [12]Ros A,Kaxiras S.Complexity-effective multicore coherence[C]//Proc of the 21st Int Conf on Parallel Architectures and Compilation Techniques.New York:ACM,2012:241-252
    [13]Sorin D,Hill M,Wood D.A Primer on Memory Consistency and Cache Coherence[M].Williston,VT:Morgan&Claypool Publishers,2011
    [14]Agarwal N,Krishna T,Peh L S,et al.GARNET:Adetailed on-chip network model inside a full-system simulator[C]//Proc of Int Symp on Performance Analysis of Systems and Software.Piscataway,NJ:IEEE,2009:33-42
    [15]Muralimanohar N,Balasubramonian R,Jouppi N P.Architecting efficient interconnects for large caches with CACTI 6.0[J].IEEE Micro,2008,28(1):69-79

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700