AI_INFN Technical Meeting

Europe/Rome
Description

Virtual meeting room (zoom): https://l.infn.it/ai-infn-meeting

 

AI_INFN Technical meeting – Minutes and actions

Date: 2025-03-24

News

  • Database consuntivi: deadline 9/4.
  • “XXII Seminar on Software for Nuclear, Subnuclear and Applied Physics, Alghero”, registration deadline May 10th, 2025. [Link]
  • “Introductory Course to VHDL and HLS FPGA Programming, Milano”, by ICSC-Spoke 2 - WP4 agenda, registration deadline April 30th.

Operations

  • NTR

Tracked developments:

:arrow_forward: Automation of RKE2 deployments in INFN Cloud

  • March 3
    • Gioacchino tagged the image with a new version schema. Snakemake is now available.
    • Gioacchino is working on Jupyter for INFN Cloud, trying to remain aligned with AI_INFN.
    • Plan migration to Jupyter 5 together with Gioacchino.
    • The new-named image will be the default one in the platform soon.
  • March 10
    • Update stopped due to the authentication is being forced every minute; to be understood;
      the development activity is stopped for a course onging on the same infrastructure.
      The course should be concluded by tomorrow.

:arrow_forward: Develop monitoring and accounting infrastructure (R. Petrini)

  • March 10
    • We are running without monitoring for the file system, which also implies no accounting. Will try to mitigate this on Wednesday afternoon while restarting the services.
  • March 17
    • Meeting tomorrow to kick-off the monitoring of the storage
  • March 24
    • Monitoring of the file system has been re-configured. A GitHub repository (landerlini/ai-infn-fs-monitoring, private) has been setup to collect the configuration based on docker-compose. It would be nice to have more metrics on network. Rosa is investigating.

:arrow_forward: Environment setup (S. Giagu, S. Bordoni, L. Cappelli)

  • March 17
    • Giuliano Panico (owner of an A30 GPU in the AI_INFN tenancy) is migrating his activity in the platform. Issues with environment setup are being followed-up by Francesca.
  • March 24
    • Giuliano is happy with the new configuration and he’s working stedily.
    • Debugging delle tracce usato per il monitoring, con informazioni potenzialmente utili. Per esempio i log dei pod che falliscono.
    •  

:arrow_forward: Offloading tests with virtual kubelets (G. Bianchini, D. Ciangottini)

  • March 3
    • 3 buckets made available; not yet tested;
    • Work on GPU continues.
    • Offloading towards FPGA. SSH tunnel is not working any longer and it is not clear why.
  • March 10
    • Offloading verso GPU: aggiornato il plugin NATS per utilizzare “SlurmFlavor” per supportare l’utilizzo di GPU. Il Flavor viene selezionato scorrendo sui flavor disponibili dal più economico al più costoso.
    • Offloading verso FPGA. Il VK di interlink supporta il provisioning di FPGA con il plugin docker. Si può schedulare un pod che richiede FPGA così come si richiede la GPU. Nel Jupyter notebook si possono già usare tutti i tool della Xilinx. Prove fatte con una U55c a Perugia e le prime prove “semplici” sembrano tutte funzionanti in modo corretto. Il sistema potrebbe essere fatto funzionare anche con V70.
    • Stefano Dal Pra organizza una call per organizzare i test.
  • March 24
    • Tested nodes by Stefano Stalio, problems with boto3 client, we will consider adding a WebDAV layer in front of S3 to use different client
    • Also presigned URLs do not work for PUTs (Access Denied). The application relying on presigned URLs would need a complete rewrite of the authorization pattern to avoid using them.

:arrow_forward: Acquisto FPGA

  • March 10
    • Stefano G. is sending the U55c FPGA to CNAF;
    • Lucio asks to remove one the V70 to send it to Ferrara to continue the test with two different hypervisors and collect additional information.
  • March 24
    • We are starting to acquire information on the bureaucracy to face to obtain a refund for the V70.
    • (Diego M., offline) E4 is available to manage the refund.
    • Deployment di Giulio è morto.

Status legend

:arrow_forward: Active
:fast_forward: Priority
:bangbang: Problems
:parking: Postponed or Blocked by others
:white_check_mark: Completed

There are minutes attached to this event. Show them.
    • 16:00 16:15
      News and setup 15m
      Speaker: Lucio Anderlini (Istituto Nazionale di Fisica Nucleare)
    • 16:15 16:50
      Updates on development activities 35m
    • 16:50 17:00
      Any other business 10m