Scientific application acceleration utilizing heterogeneous architectures.

详细信息

作者：Weill ; Edwin.
学历：Master
年：2014
毕业院校：Clemson University
Department：Electrical and Computer Engineering.
ISBN：9781321531039
CBH：1582965
Country：USA
语种：English
FileSize：1352161
Pages：115

文摘

Within the past decade,there have been substantial leaps in computer architectures to exploit the parallelism that is inherently present in many applications. The scientific community has benefited from the emergence of not only multi-core processors,but also other,less traditional architectures including general purpose graphical processing units GPGPUs),field programmable gate arrays FPGAs),and Intels many integrated cores MICs) architecture i.e. Xeon Phi). The popularity of the GPGPU has increased rapidly because of their ability to perform massive amounts of parallel computation quickly and at low cost with an ease of programmability. Also,with the addition of high-level programming interfaces for these devices,technical and non-technical individuals can interface with the device and rapidly obtain improved performance for many algorithms. Many applications can take advantage of the parallelism present in distributed computing and multithreading to achieve higher levels of performance for the computationally intensive parts of the application. The work presented in this thesis implements three applications for use in a performance study of the GPGPU architecture and multi-GPGPU systems. The first application study in this research is a K-Means clustering algorithm that categorizes each data point into the closest cluster. The second algorithm implemented is a spiking neural network algorithm that is used as a computational model for machine learning. The third,and final,study is the longest common subsequences problem,which attempts to enumerate comparisons between sequences namely,DNA sequences). The results for the aforementioned applications with varying problem sizes and architectural configurations are presented and discussed in this thesis. The K-Means clustering algorithm achieved approximately 97x speedup when utilizing an architecture consisting of 32 CPU/GPGPU pairs. To achieve this substantial speedup,up to 750,000 data points were used with up 30,000 centroids means). The spiking neural network algorithm resulted in speedups of about 33x for the entire algorithm and 160x for each iteration with a two-level network with 1000 total neurons 800 excitatory and 200 inhibitory neurons). The longest common subsequences problem achieved speedup of greater than 10x with 100 random sequences up to 500 characters in length. The maximum speedup values for each application were achieved by utilizing the GPGPU as well as multi-core devices simultaneously. The computations were scattered over multiple CPU/GPGPU pairs with the computationally intensive pieces of the algorithms offloaded onto the GPGPU device. The research in this thesis illustrates the ability to scale a heterogeneous cluster i.e. CPUs and GPUs working collaboratively) for large-scale scientific application performance improvements. Each algorithm demonstrates slightly different types of computations and communications,which can be compared to other algorithms to predict how they would perform on an accelerator. The results show that substantial speedups can be achieved for scientific applications when utilizing the GPGPU and multi-core architectures.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700