Shinji Motoki
(Graduate School of BioSphere Science Hiroshima University)
We report an implementation and an efficient DMA transfer for SU(3) matrix-matrix and matrix-vector multiplication on Cell/B.E., which is a part of our project, Lattice Tool Kit on the Cell/B.E.. Last year, we reported results on QS20. After that we found the measured execution time is wrong because values on a resistor are distorted at the first measurement. The actual speed of the matrix multiplication on SPEs is 20GFLOPS together with data transfer from main memory by DMA, which is 23% of the theoretical peak speed of this calculation. Performance of our code on the Cell B.E. is limited by the bandwidth between main memory and the Cell SPEs. We discuss the cause of this low value and a possible remedy.
Please, insert your presentation type (talk, poster) | talk |
Primary author
Shinji Motoki
(Graduate School of BioSphere Science Hiroshima University)
Atsushi Nakamura
(Research Institute for Information Science and Education Hiroshima University)
Yoshiyuki Nakagawa
(Graduate School of Science and Technology Niigata University)
keitaro Nagata
(Department of Physics, University of Tokyo)