链表结构反馈预取机制

英文篇名：Feedback prefetch for linked data structure
作者：张乾龙 ; 侯锐 ; 杨思博 ; 张立新
英文作者：Zhang Qianlong;Hou Rui;Yang Sibo;Zhang Lixin;Institute of Computing Technology,Chinese Academy of Sciences;University of Chinese Academy of Sciences;Institute of Information Engineering,Chinese Academy of Sciences;School of Software & Microelectronics,Peking University;
关键词：链表结构(LDS) ; 指针追逐 ; 数据预取 ; 反馈预取 ; 硬件预取
英文关键词：linked data structure(LDS);;pointer chasing;;data prefetch;;feedback prefetch;;hardware prefetch
中文刊名：GJSX
英文刊名：Chinese High Technology Letters
机构：中国科学院计算技术研究所;中国科学院大学;中国科学院信息工程研究所;北京大学软件与微电子学院;
出版日期：2019-01-15
出版单位：高技术通讯
年：2019
期：v.29;No.337
基金：国家自然科学基金优秀青年科学基金(61522212);; 中科院前沿科学重点研究(QYZDB-SSW-JSC010)资助项目
语种：中文;
页：GJSX201901005
页数：10
CN：01
ISSN：11-2770/N
分类号：31-40

摘要

详细分析了已有针对链表结构(LDS)的预取方法,并分析了预取深度对预取性能的影响,同时分析了链表结构中单个生产者访存指令对应多个消费者访存指令的情况,并指出了现有链表结构预取器的不足。提出了针对链表结构的反馈预取机制,在原来预取器的基础上把预取命令查询处理器Cache的结果反馈给预取引擎,预取引擎根据反馈结果决定进一步预取操作。如果预取命令查询Cache发现已经命中,则反馈查询结果给预取器,预取器再针对同一个生产者指令产生预取命令。反馈预取机制可以和其他链式结构预取机制协同工作。实验结果表明,相比于无反馈的预取机制,针对链表结构的反馈预取机制当预取深度为1时,每周期执行指令数(IPC)平均提升8. 14%,L1-D Cache缺失率平均降低11. 18%,新增硬件开销几乎可以忽略。
The pre-existing prefetch technique for linked data structure( LDS) is analyzed in detail,and the influence of prefetch depth to the performance of application is reveal with detail experiments. Besides,the pattern of single producer instruction of memory access corresponding multiple consumer instructions of memory access is analyzed,and the shortness of existing prefetch technique is pointed out. To further improve the prefetch performance for LDS,a feedback prefetch technique is proposed,which can give the feedback of cache query outcome to the prefetch engine if cache hit occurs,then another prefetch command is created for that producer instruction of memory access. The feedback prefetch technique can cooperate with other prefetch techniques for LDS. Experiment results show that,compared with prefetch technique without feedback the feedback prefetch technique can increase the IPC by 8. 14%,and reduce the L1-D Cache miss rate by 11. 18% with negligible hardware area increase.

引文

[1] Karlsson M,Dahlgren F,Stenstrom P. A prefetchingtechnique for irregular accesses to linked data structures.In:Proceedings of the 6th International Symposium onHigh-Performance Computer Architecture[C]. Touluse,France,2000. 206-217
    [2] Luk C,Mowry T. Compiler-based prefetching for recur-sive data structures[J]. ACM SIGOPS Operating SystemsReview,1996,31(9):222-33
    [3] Roth A,Moshovos A,Sohi G. Dependence based pre-fetching for linked data structures[J]. ACM SIGOPSOperating Systems Review,1998,32(5):115-126
    [4] Roth A,Sohi G. Effective jump-pointer prefetching forlinked data structures[C]. In:Proceedings of the 26thInternational Symposium on Computer Architecture,At-lanta,USA,1999,111-121
    [5] Zhang Z,Torrellas J. Speeding up irregular applicationsin shared-memory multiprocessors:memory binding andgroup prefetching[C]. In:Proceedings of the 22nd In-ternational Symposium on Computer Architecture,Mar-gherita Ligure,Italy,1995. 188-99
    [6] Yang C,Lebeck A. Push vs. pull:data movement forlinked data structures[C]. In:Proceedings of the 14thInternational Conference on Supercomputing, Dallas,USA,2000,176-186
    [7] Yang C,Lebeck A,Tseng H,et al. Tolerating memorylatency through push prefetching for pointer-intensive ap-plications[J]. ACM Transactions on Architecture andCode Optimization(TACO),2004,1(4):445-475
    [8] Chilimbi T,Larus J,Hill M. Improving pointer-basedcodes through cache-conscious data placement[R].Technical report CS-TR-98-1365. Madison:Universityof Wisconsin,1998
    [9] Hughes C,Adve S. Memory-side prefetching for linkeddata structures[J]. Journal of Parallel and DistributedComputing,2005,65(4):448-463
    [10] Jimenez V,Cazorla F,Gioiosa R,et al. Adaptive pre-fetching on POWER7:improving performance and powerconsumption[J]. ACM Transactions on Parallel Compu-ting,2014,1:1-25
    [11] Srinath S,Mutlu O,Kim H,et al. Feedback directedprefetching:improving the performance and bandwidth-efficiency of hardware prefetchers[C]. In:Proceedingsof the IEEE 13th International Symposium on High Per-formance Computer Architecture, Scottsdale, USA2007,63-74
    [12] Carlson T,Heirman,Eeckhout L. Sniper:exploring thelevel of abstraction for scalable and accurate parallelmulti-core simulation[C]. In:Proceedings of the 2011International Conference for High Performance Compu-ting,Networking,Storage and Analysis(SC),Seattle,USA,2011,1-12

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700