Speaker
Mr
Patrick Steinbrecher
(Fakultät für Physik, Universität Bielefeld)
Description
The runtime of a Lattice QCD simulation is dominated by a small kernel, which calculates the product of a vector by a sparse matrix known as the “Dslash” operator. Therefore, this kernel is frequently optimized for various HPC architectures. In this contribution we want evaluate the performance of the Intel Xeon Phi to current Kepler-based NVIDIA Tesla GPUs running a conjugate gradient solver. By exposing more parallelism to the accelerator by inverting multiple vectors at the same time we obtain a performance >250 GFLOPs/s on both architectures. This more than doubles the performance of the naive separate inversion. A detailed comparison of the performance of the accelerators for different scenarios will be presented in the talk. We also discuss some details of the implementation and the effort required to obtain the achieved performance.
Primary authors
Dr
Christian Schmidt
(Fakultät für Physik, Universität Bielefeld)
Dr
Mathias Wagner
(Department of Physics, Indiana University)
Mukherjee Swagato
(Brookhaven National Laboratory)
Dr
Olaf Kaczmarek
(Fakultät für Physik, Universität Bielefeld)
Mr
Patrick Steinbrecher
(Fakultät für Physik, Universität Bielefeld)