AI_INFN Technical Meeting

Europe/Rome
Description

Virtual meeting room (zoom): https://l.infn.it/ai-infn-meeting

AI_INFN Technical meeting – Minutes and actions

Date: 2025-01-13

Operations

  • We have restarted the minio cluster providing resources to JuiceFS and moved the metadata engine from Redis to PostgreSQL. No performance degradation was observed. Drafted dashboards for PostgreSQL and Minio.
  • On January 11, failures of cvmfs fuse mounts were reported. Problem identified and (hopefully) fixed: squid proxy was configured without defining the port (3128).

Tracked developments:

:arrow_forward: Automation of RKE2 deployments in INFN Cloud

  • The button in the INFN Cloud dashboard redirecting to the AI_INFN platform has been added.
    For the moment it is only visible in the development-version of the PaaS and for users in the admin/ai-infn group.
  • Gioacchino pushed the docker images of the JupyterLab and of the JupyterHub in the repository of DataCloud with automated build.
    • Observed strange old-version of snakemake in conda, in principle not installed from Docker image.
    • Lucio will try to use Gioacchino’s image as default image in AI_INFN Platform.

:arrow_forward: Develop monitoring and accounting infrastructure (R. Petrini)

  • The grafana dashboard has been extended to:
    • monitor the hackathon platform
    • monitor the Postgres used for the new JuiceFS
    • monitor the MinIO instance used by the new JuiceFS (this works only partially: 1/3 dashboards)
  • Improving the monitoring of postgres and minio requires some “invasive” action, at least for trial and error. It was decided to create a test environment with much more limited disk space but the same configuration as a playground for setting up the monitoring. This will also enable documenting the construction procedure of the storage cluster.

:arrow_forward: Environment setup (S. Giagu, S. Bordoni, L. Cappelli)

  • Luca Clissa reported a problem cloning an environment. Lucio is following up.

:arrow_forward: Offloading tests with virtual kubelets (G. Bianchini, D. Ciangottini)

  • Giulio (offline) - (almost) solved two issues raised by Lucio in interLink api server

:arrow_forward: Acquisto FPGA

  • The game is in Diego Michelotto’s ballpark: the most promising strategy is installing the Xilinx software in the bare metal and then re-insert the higher level in the virtualization.
  • Unfortunately Alma 9 is not officially supported by Xilinx. So, it’s hard to guarantee success.
  • Enrico Calore reports on a discussion he had with the engineers from Xilinx at the SuperComputing conference. VitisAI won’t be supported for new hardware boards and VHDL/HLS will provide the most important toolchain for the upcoming FPGAs. Unfortunately V70 FPGAs can only be programmed using VitisAI.
  • Andrea Rigoni supports the need for FPGAs in the Cloud, possibly close to GPU devices, for quantization studies: https://indico.cern.ch/event/1405026/contributions/5910214/attachments/2933286/5151597/cern_edge_ml_fpga_ai.pdf

Status legend

:arrow_forward: Active
:fast_forward: Priority
:bangbang: Problems
:parking: Postponed or Blocked by others
:white_check_mark: Completed

There are minutes attached to this event. Show them.