摘要
硬件数据预取技术可以有效提升处理器的访存性能,但传统流预取策略存在预取不及时的问题。为此,提出一种双倍步长流预取策略,并设计对应的预取部件结构。预取部件自动检测数据流的固定步长并将该步长扩大为原有的2倍,以计算预取地址。实验结果表明,加入该预取部件后,运行SPEC2006测试集的整数应用与浮点应用时,处理器性能最高可分别提升45%与57%,针对Cache Miss率较高的应用,该预取部件可以有效隐藏访存延时。
Hardware data prefetching technology can effectively improve the memory access performance of processors,but the traditional stream prefetching strategy has the problem of untimely prefetching.Therefore,a double step stream prefetching strategy is proposed,and the corresponding prefetching component structure is designed.The prefetching component automatically detects the fixed step size of the data stream and enlarges the step size to twice of the original one to calculate the prefetching address.Experimental results show that the performance of the processor can be improved by 45% and 57% respectively when SPEC2006 test set integer application and floating-point application are run with the prefetching component.For applications with high Cache Miss rate,the prefetch component can effectively hide the memory access latency.
引文
[1] WULF W A,MCKEE S A.Hitting the memory wall:implications of the obvious[J].ACM SIGARCH Computer Architecture News,1995,23(1):20-24.
[2] SMITH J E.Decoupled access/execute computer archi-tecture[J].ACM SIGARCH Computer Architecture News,1982,10(3):112-119.
[3] GUO Yan,NARAYANAN P,BENNASER M A,et al.Energy-efficient hardware data prefetching[J].IEEE Transactions on Very Large Scale Integration Systems,2011,19(2):250-263.
[4] GINDELE J D.Buffer block prefetching method[J].IBM Technical Disclosure Bulletin,1977,20(2):696-697.
[5] CHEN Tienfu,BAER J L.Effective hardware-based data prefetching for high-performance processors[J].IEEE Transaction on Computers,1995,44(5):609-623.
[6] PINTER S S,YOAZ A.Tango:a hardware-based data prefetching technique for superscalar processors[C]//Proceedings of the 29th Annual ACM/IEEE International Symposium on Microarchitecture.Washington D.C.,USA:IEEE Computer Society,1996:214-225.
[7] 靳强,郭阳,鲁健壮.一种步长自适应二级Cache预取机制[J].计算机工程与应用,2011,47(29):56-59.
[8] BAER J L,CHEN Tienfu.An effective on-chip preloading scheme to reduce data access penalty[C]//Proceedings of 1991 ACM/IEEE Conference on Supercomputing.New York,USA:ACM Press,1991:176-186.
[9] JOUPPI N P.Improving directed-mapped cache performance by addition of small fully-associative cache and prefetching buffers[J].ACM SIGARCH Computer Architecture News,1990,18(3):363-373.
[10] ALACHARLS S,KESSLER R E.Evaluating stream buffer as a secondary cache replacement[J].ACM SIGARCH Computer Architecture News,1994,22(2):24-33.
[11] JOSEPH D,GRUNWALD D.Prefetching using Markov predictors[J].ACM SIGARCH Computer Architecture News,1997,25(2):252-263.
[12] HU Zhigang,MARTONOSI M,KAXIRAS S.TCP:tag correlating prefetchers[C]//Proceedings of International Symposium on High-performance Computer Architecture.Washington D.C.,USA:IEEE Press,2003:317-326.
[13] NESBIT K J,DHODAPKAR A S,SMITH J E.AC/DC:an adaptive data cache prefetcher[C]//Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques.Washington D.C.,USA:IEEE Computer Society,2004:135-145.
[14] LAI A C,FIDE C,FALSAFI B.Dead-block predictionand dead-block correlating prefetchers[J].ACM SIGARCH Computer Architecture News,2001,29(2):144-154.
[15] 贾迅,翁志强,胡向东.基于流访问特征的多级硬件预取[J].计算机工程,2016,42(1):51-55.
[16] 贾迅,尹飞,胡向东.申威处理器硬件预取技术的实现[J].计算机工程与科学,2015,37(11):2013-2017.