Conferences in Research and Practice in Information Technology
  

Online Version - Last Updated - 20 Jan 2012

 

 
Home
 

 
Procedures and Resources for Authors

 
Information and Resources for Volume Editors
 

 
Orders and Subscriptions
 

 
Published Articles

 
Upcoming Volumes
 

 
Contact Us
 

 
Useful External Links
 

 
CRPIT Site Search
 
    

Model-driven optimisation of memory hierarchy and multithreading on GPUs

Haigh, A.A. and McCreath, E.C.

    Due to their potentially high peak performance and energy efficiency, GPUs are increasingly popular for scientific computations. However, the complexity of the architecture makes it difficult to write code that achieves high performance. Two of the most important factors in achieving high performance are the usage of the GPU memory hierarchy and the way in which work is mapped to threads and blocks. The dominant frameworks for GPU computing, CUDA and OpenCL, leave these decisions largely to the programmer. In this work, we address this in part by proposing a technique that simultaneously manages use of the GPU low-latency shared memory and chooses the granularity with which to divide the work (block size). We show that a relatively simple heuristic based on an abstraction of the GPU architecture is able to make these decisions and achieve average performance within 17% of an optimal configuration on an NVIDIA Tesla K20.
Cite as: Haigh, A.A. and McCreath, E.C. (2015). Model-driven optimisation of memory hierarchy and multithreading on GPUs. In Proc. 13th Australasian Symposium on Parallel and Distributed Computing (AusPDC 2015) Sydney, Australia. CRPIT, 163. Javadi, B. and Garg, S.K. Eds., ACS. 71-74
pdf (from crpit.com) pdf (local if available) BibTeX EndNote GS