Speaker
Description
Over the past two years, COKA, the HPC cluster for code prototyping and benchmarking installed at INFN Ferrara, has been upgraded with newer heterogeneous compute nodes and a refactored software ecosystem. Evolving from its original homogeneous five-node architecture, today the system incorporates NVIDIA Grace Hopper Superchips, IBM POWER9 systems, and multiple FPGAs.
To efficiently manage this architectural complexity and ensure maximum reproducibility, the cluster administration has been transitioned to an automated Infrastructure-as-Code model. The cluster is now available to users, supporting a diverse range of workloads including technology tracking, theoretical physics simulations, CMB data analysis, quantum simulations and AI training.
Following last year’s presentation of the cluster's design and deployment plan, in this talk we present an update on the present configuration of the cluster. We will focus on the challenges of managing a highly heterogeneous environment, with particular emphasis on the integration and orchestration of ARM and POWER9 architectures. Preliminary performance results and user feedback from the first production workloads will also be discussed.