Speaker
            Mr
    Hannes Vogt
        
            (Universitaet Tuebingen)
        
    Description
We adopt CUDA-capable Graphic Processing Units (GPUs) for Landau, Coulomb and maximally Abelian gauge fixing in 3+1 dimensional SU(3) and SU(2) lattice gauge field theories. A combination of simulated annealing and overrelaxation is used to aim for the global maximum of the gauge functional. We use a fine grained degree of parallelism to achieve the maximum performance: instead of the common 1 thread per site strategy we use 4 or 8 threads per lattice site.
Here, we report on an improved version of our publicly available code (www.culgt.com) which again increases performance and is much easier to to include in existing code. On the GTX580 we achieve up to 450 GFlops (utilizing 80% of the theoretical peak bandwidth) for the Landau overrelaxation code.
            Authors
        
            
                
                        Mr
                    
                
                    Hannes Vogt
                
                
                        (Universitaet Tuebingen)
                    
            
        
            
                
                
                    Mario Schrock
                
                
                        (R)