Speaker
Description
Particle-in-cell (PIC) codes have been a cornerstone of plasma-based accelerator development. However, these work at the most fundamental, microscopic level, making few physics approximations, which makes them some of the most computationally expensive models in plasma physics, requiring efficient use of even the most modern HPC systems.
In this paper, we present a generalized parallelization algorithm for PIC simulations that is shown to work across all of the main architectures available today, including both CPUs (x86 / Arm) and GPUs (NVIDIA, AMD, Intel). The algorithm is based on a micro-spatial domain decomposition, with a high-performance particle manager to move particles between domains. Each domain is then assigned to a different thread (CPU) or thread block (GPU), achieving good parallel load balancing even for realistic simulation scenarios. The implementation is done using different programming models for different architectures, namely OpenMP (CPU), CUDA, ROCm (GPU), and SYCL (CPU/GPU/FPGA). While the implementations are effectively different code bases, given that the overall algorithm is the same, there are great similarities between all the implementations, making porting between them relatively straightforward. We present a performance comparison between different architectures/programming models for a test 2D problem, demonstrating very high performance for the architectures explored.