Improving effective bandwidth through compiler enhancement of global cache reuse

详细信息	查看全文 \| 推荐本文 \|

作者：Ding ; Chen ; Kennedy ; Ken
关键词：Reference affinity ; Data locality ; Program analysis ; Loop fusion ; Data transformation ; Global cache reuse
刊名：Journal of Parallel and Distributed Computing
出版年：2004
期刊代码：188_07437315
类别：et
出版时间：January, 2004
卷：64
期：1
页码：108-134
文件大小：676 K

摘要

The performance of modern machines is increasingly limited by insufficient memory bandwidth. One way to alleviate this bandwidth limitation for a given program is to minimize the aggregate data volume the program transfers from memory. In this article we present compiler strategies for accomplishing this minimization. Following a discussion of the underlying causes of bandwidth limitations, we present a two-step strategy to exploit global cache reuse—the temporal reuse across the whole program and the spatial reuse across the entire data set used in that program. In the first step, we fuse computation on the same data using a technique called reuse-based loop fusion to integrate loops with different control structures. We prove that optimal fusion for bandwidth is NP-hard and we explore the limitations of computation fusion using perfect program information. In the second step, we group data used by the same computation through the technique of affinity-based data regrouping, which intermixes the storage assignments of program data elements at different granularities. We show that the method is compile-time optimal and can be used on array and structure data. We prove that two extensions—partial and dynamic data regrouping—are NP-hard problems. Finally, we describe our compiler implementation and experiments demonstrating that the new global strategy, on average, reduces memory traffic by over 40%and improves execution speed by over 60%on two high-end workstations.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700