14–19 Jun 2010
Villasimius, Sardinia
Europe/Rome timezone

Implementation and performance optimization of Lattice QCD Tool Kit on the Cell/B.E.

17 Jun 2010, 17:40
20m
Room3 (Villasimius, Sardinia)

Room3

Villasimius, Sardinia

Tanka Village
Algorithms and machines Parallel 42: Algorithms and machines

Speaker

Shinji Motoki (Graduate School of BioSphere Science Hiroshima University)

Description

We report an implementation and an efficient DMA transfer for SU(3) matrix-matrix and matrix-vector multiplication on Cell/B.E., which is a part of our project, Lattice Tool Kit on the Cell/B.E.. Last year, we reported results on QS20. After that we found the measured execution time is wrong because values on a resistor are distorted at the first measurement. The actual speed of the matrix multiplication on SPEs is 20GFLOPS together with data transfer from main memory by DMA, which is 23% of the theoretical peak speed of this calculation. Performance of our code on the Cell B.E. is limited by the bandwidth between main memory and the Cell SPEs. We discuss the cause of this low value and a possible remedy.
Please, insert your presentation type (talk, poster) talk

Primary author

Shinji Motoki (Graduate School of BioSphere Science Hiroshima University)

Co-authors

Prof. Atsushi Nakamura (Research Institute for Information Science and Education Hiroshima University) Dr Yoshiyuki Nakagawa (Graduate School of Science and Technology Niigata University) Dr keitaro Nagata (Department of Physics, University of Tokyo)

Presentation materials