摘要
详细分析了已有针对链表结构(LDS)的预取方法,并分析了预取深度对预取性能的影响,同时分析了链表结构中单个生产者访存指令对应多个消费者访存指令的情况,并指出了现有链表结构预取器的不足。提出了针对链表结构的反馈预取机制,在原来预取器的基础上把预取命令查询处理器Cache的结果反馈给预取引擎,预取引擎根据反馈结果决定进一步预取操作。如果预取命令查询Cache发现已经命中,则反馈查询结果给预取器,预取器再针对同一个生产者指令产生预取命令。反馈预取机制可以和其他链式结构预取机制协同工作。实验结果表明,相比于无反馈的预取机制,针对链表结构的反馈预取机制当预取深度为1时,每周期执行指令数(IPC)平均提升8. 14%,L1-D Cache缺失率平均降低11. 18%,新增硬件开销几乎可以忽略。
The pre-existing prefetch technique for linked data structure( LDS) is analyzed in detail,and the influence of prefetch depth to the performance of application is reveal with detail experiments. Besides,the pattern of single producer instruction of memory access corresponding multiple consumer instructions of memory access is analyzed,and the shortness of existing prefetch technique is pointed out. To further improve the prefetch performance for LDS,a feedback prefetch technique is proposed,which can give the feedback of cache query outcome to the prefetch engine if cache hit occurs,then another prefetch command is created for that producer instruction of memory access. The feedback prefetch technique can cooperate with other prefetch techniques for LDS. Experiment results show that,compared with prefetch technique without feedback the feedback prefetch technique can increase the IPC by 8. 14%,and reduce the L1-D Cache miss rate by 11. 18% with negligible hardware area increase.
引文
[1] Karlsson M,Dahlgren F,Stenstrom P. A prefetchingtechnique for irregular accesses to linked data structures.In:Proceedings of the 6th International Symposium onHigh-Performance Computer Architecture[C]. Touluse,France,2000. 206-217
[2] Luk C,Mowry T. Compiler-based prefetching for recur-sive data structures[J]. ACM SIGOPS Operating SystemsReview,1996,31(9):222-33
[3] Roth A,Moshovos A,Sohi G. Dependence based pre-fetching for linked data structures[J]. ACM SIGOPSOperating Systems Review,1998,32(5):115-126
[4] Roth A,Sohi G. Effective jump-pointer prefetching forlinked data structures[C]. In:Proceedings of the 26thInternational Symposium on Computer Architecture,At-lanta,USA,1999,111-121
[5] Zhang Z,Torrellas J. Speeding up irregular applicationsin shared-memory multiprocessors:memory binding andgroup prefetching[C]. In:Proceedings of the 22nd In-ternational Symposium on Computer Architecture,Mar-gherita Ligure,Italy,1995. 188-99
[6] Yang C,Lebeck A. Push vs. pull:data movement forlinked data structures[C]. In:Proceedings of the 14thInternational Conference on Supercomputing, Dallas,USA,2000,176-186
[7] Yang C,Lebeck A,Tseng H,et al. Tolerating memorylatency through push prefetching for pointer-intensive ap-plications[J]. ACM Transactions on Architecture andCode Optimization(TACO),2004,1(4):445-475
[8] Chilimbi T,Larus J,Hill M. Improving pointer-basedcodes through cache-conscious data placement[R].Technical report CS-TR-98-1365. Madison:Universityof Wisconsin,1998
[9] Hughes C,Adve S. Memory-side prefetching for linkeddata structures[J]. Journal of Parallel and DistributedComputing,2005,65(4):448-463
[10] Jimenez V,Cazorla F,Gioiosa R,et al. Adaptive pre-fetching on POWER7:improving performance and powerconsumption[J]. ACM Transactions on Parallel Compu-ting,2014,1:1-25
[11] Srinath S,Mutlu O,Kim H,et al. Feedback directedprefetching:improving the performance and bandwidth-efficiency of hardware prefetchers[C]. In:Proceedingsof the IEEE 13th International Symposium on High Per-formance Computer Architecture, Scottsdale, USA2007,63-74
[12] Carlson T,Heirman,Eeckhout L. Sniper:exploring thelevel of abstraction for scalable and accurate parallelmulti-core simulation[C]. In:Proceedings of the 2011International Conference for High Performance Compu-ting,Networking,Storage and Analysis(SC),Seattle,USA,2011,1-12