Go Back   nV News Forums > General Forums > Archived News Items

Newegg Daily Deals

Reply
 
Thread Tools
Old 07-11-12, 06:30 AM   #1
News
Registered User
 
Join Date: Jun 2009
Posts: 51,437
Post Implementing a code generator for fast matrix multiplication in OpenCL on the GPU

Abstract:
This paper presents results of an implementation of code generator for fast general matrix multiply (GEMM) kernels. When a set of parameters is given, the code generator produces the corresponding GEMM kernel written in OpenCL. The produced kernels are optimized for high-performance implementation on GPUs from AMD. Access latencies to GPU global memory is the main drawback for high performance. This study shows that storing matrix data in a block-major layout increases the performance and stability of GEMM kernels. On the Tahiti GPU (Radeon HD 7970), our DGEMM (double-precision GEMM) and SGEMM (single-precision GEMM) kernels achieve the performance up to 848 GFlop/s (90% of the peak) and 2646 GFlop/s (70%), respectively.

(K. Matsumoto, N. Nakasato, S. G. Sedukhin:'Implementing a code generator for fast matrix multiplication in OpenCL on the GPU', accepted for Special Session: Auto-Tuning for Multicore and GPU (ATMG), IEEE 6th International Symposium on Embedded Multicore SoCs (MCSoC-12), Sep. 2012. [PDF])



More...
News is offline   Reply With Quote
Reply


Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Forum Jump


All times are GMT -5. The time now is 03:29 PM.


Powered by vBulletin® Version 3.7.1
Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Copyright 1998 - 2014, nV News.