摘要
数据预取是为缓解微处理器与DRAM之间速度差异而出现的隐藏访存延迟的方法。当前Intel各系列处理器都采用多种预取机制来加速数据和代码向Cache的移动,从而提升程序的性能。通过对Intel~64体系结构存储层次的分析,剖析了X86/X64体系的数据预取机制,包括硬件预取和软件预取,并且分析了编译器对软件预取机制的支持。最后测试了Intel~64体系结构数据预取对科学计算程序中紧嵌套循环性能的影响,总结出了影响数据预取有效性的几个因素。此项工作对在Intel平台上进行循环数组预取优化有指导意义。
Data prefetching is an approach to reducing cache miss latencies,which can appropriately fill the speed gap between the microprocessor and DRAM.Recently,Intel processor families employ several prefetching mechanisms to accelerate the movement of data or code to Cache,and improve performance.By a brief analysis of the memory hierarchy of Intel~64 architecture,data prefetching mechanism of X86/X64 architecture,including hardware prefetching and software prefetching,was deeply dissected,and then the compiler support for software prefetching mechanism was analyzed.After testing the performance of data prefetcher of Intel~64 architecture for nested loop,we concluded several factors affecting the effect of data prefetching.These works provide a valuable contribution for the research and development of the loop-array-prefetching optimization on the Intel platform.
引文
[1]Hennessy J L,Patterson D A.Computer architecture:aquantitative approach[M].Elsevier,2012
[2]Sailing.浅谈Cache Memory[EB/OL].(2011-10-03)[2015-3-17].http://blog.sina.com.cn/s/blog_6472c4cc0102dw61.html
[3]Intel Corporation.Intel64 and IA-32Architectures Optimization Reference Manual[EB/OL].[2015-03-05].http://www.intel.com/content/www/us/en/processors/architectures-software-developer-manuals.html
[4]Intel Corporation.Intel64 and IA-32 Architectures Software Developer’s Manual Volume 1:Basic Architecture[EB/OL].[2015-03-05].http://www.intel.com/content/www/us/en/processors/architectures-software-developer-manuals.html
[5]Intel Corporation.Intel64 and IA-32Architectures Software Developer’s Manual Documentation Changes[EB/OL].[2015-03-05].http://www.intel.com/content/www/us/en/processors/architectures-software-developer-manuals.html
[6]Intel Corporation.Intel Instruction Set Architecture Extensions[EB/OL].[2014-12-31].https://software.intel.com/en-us/intel-isa-extensions
[7]Free Software Foundation,Inc.GCC,the GNU Compiler Collection[EB/OL].(2014-12-23)[2015-03-05].https://gcc.gnu.org
[8]Intel Corporation.Intel Parallel Studio XE 2015 Composer Edition C++Release Notes[EB/OL].(2014-06-25)[2015-03-05].https://software.intel.com/en-us/articles/intel-parallelstudio-xe-2015-composer-edition-c-release-notes
[9]Intel Corporation.Intel Xeon Processor E5-1600/E5-2600/E5-46 00 Product Families Datasheet Volume One[EB/OL].[2015-03-05].http://www.intel.com/products/processor%5Fnumber/
[10]Intel Corporation.An Introduction to the Intel QuickPath Interconnect[EB/OL].[2009-01-30].http://www.intel.com
[11]王恩东,等.MIC高性能计算编程指南[M].北京:中国水利水电出版社,2012
[12]Jeffers J,Reinders J.Intel Xeon Phi coprocessor high performance programming[M].Newnes,2013
[13]Intel Corporation.Intel64 and IA-32Architectures Software Developer’s Manual Volume2(2A,2B&2C):Instruction Set Reference,A-Z[EB/OL].[2015-03-05].http://www.intel.com/content/www/us/en/processors/architectures-softwaredeveloper-manuals.html
[14]Intel Corporation.Intel C++ Compiler User and Reference Guides[EB/OL].[2015-03-05].http://www.intel.com
[15]Free Software Foundation.Inc.GCC 4.9 Release Series[EB/OL].[2014-07-16].http://gcc.gnu.org/gcc-4.9/
[16]Manchanda N,Anand K.Non-Uniform Memory Access(NUMA)[OL].http://cs.nyu.edu/~lerner/spring10/projects/NUMA.pdf
[17]Intel Corporation.Intel64 and IA-32Architectures Software Developer’s Manual Volume 3(3A,3B&3C):System Programming Guide[EB/OL].[2015-03-05].http://www.intel.com/content/www/us/en/processors/architectures-softwaredeveloper-manuals.html
[18]Feng Q Y.Research on Data Prefetching Techniques for LoopLevel Array References[D].Changsha:National University of Defense Technology,2008(in Chinese)冯权友.面向循环级数组访问的数据预取技术研究[D].长沙:国防科学技术大学,2008
[19]Igor Ostrovsky Blogging.Gallery of Processor Cache Effects[EB/OL].http://igoro.com/archive/gallery-of-processor-cache-effects