Speaker
Steven Gottlieb Gottlieb
(Indiana University)
Description
We have been extending the QUDA GPU code developed at Boston University
to include the case of improved staggered quarks. Improved staggered
quarks such as asqtad and HISQ require both first and third nearest
neighbor terms in the Dirac operator. We call the corresponding
links fatlinks and longlinks. The fatlinks are not unitary and staggered
phases are included in the links, so link reconstruction techniques may
either be inapplicable or require modification. A single precision
inverter using compressed storage for the longlinks achieves a speed
of 100 GF/s on an NVIDIA GTX280 GPU on a $24^3\times 32$ lattice.
In addition to the inverter code, we have code for fatlink computation,
gauge force and fermion force. They run at 170, 186 and 107 GF/s,
respectively, for similar conditions to the solver speed above.
The single GPU code is currently in production on NCSA's AC cluster
for the study of electromagnetic effects. The double precision multimass
solver is running at 20 GF/s, about 80\% of the speed of an 8-node
or 64-core job on Fermilab's jpsi cluster. The AC cluster has C1060
Tesla boards with lower memory bandwidth than the GTX280, where
the DP inverter runs at 33 GF/s. Multi-GPU code is in development.
Please, insert your presentation type (talk, poster) | talk |
---|
Primary authors
Mr
Guochun Shi
(National Center for Supercomputing Applications)
Steven Gottlieb Gottlieb
(Indiana University)
Co-authors
Dr
Aaron Torok
(Indiana University)
Dr
Volodymyr Kindratenko
(National Center for Supercomputing Applications)