AI_INFN Technical Meeting

Europe/Rome
Description

Virtual meeting room (zoom): https://l.infn.it/ai-infn-meeting

AI_INFN Technical meeting – Minutes and actions

Date: 2025-03-10

News

  • “XXII Seminar on Software for Nuclear, Subnuclear and Applied Physics, Alghero”, registration deadline May 10th, 2025. [Link]
  • “Introductory Course to VHDL and HLS FPGA Programming, Milano”, by ICSC-Spoke 4 agenda, registration deadline April 30th.

Operations

  • A user complained for having lost data during the migration of last week. They were stored in /home which is ephemeral. These incidents are frequent. In the next version of the platform, we will make it more difficult to erroneously access ephemeral space.
  • Tomorrow afternoon, we are shutting down all instances of AI_INFN, this includes all services connected to the platform. All notebooks will be stopped and it will be impossible to spawn new services until the end of the intervention, hopefully on Wednesday evening. This intervention to update the underlying operating system and the OpenStack IaaS.

Tracked developments:

:arrow_forward: Automation of RKE2 deployments in INFN Cloud

  • March 3
    • Gioacchino tagged the image with a new version schema. Snakemake is now available.
    • Gioacchino is working on Jupyter for INFN Cloud, trying to remain aligned with AI_INFN.
    • Plan migration to Jupyter 5 together with Gioacchino.
    • The new-named image will be the default one in the platform soon.
  • March 10
    • Update stopped due to the authentication is being forced every minute; to be understood;
      the development activity is stopped for a course onging on the same infrastructure.
      The course should be concluded by tomorrow.

:arrow_forward: Develop monitoring and accounting infrastructure (R. Petrini)

  • March 10
    • We are running without monitoring for the file system, which also implies no accounting. Will try to mitigate this on Wednesday afternoon while restarting the services.

:arrow_forward: Environment setup (S. Giagu, S. Bordoni, L. Cappelli)

  • March 10
    • Deployed an instance of OwnCloud connected to the platform to test WebDAV-based persistency with Snakemake. The use-case requiring this is ICSC - Spoke2 - ENI-PIML.

:arrow_forward: Offloading tests with virtual kubelets (G. Bianchini, D. Ciangottini)

  • March 3
    • 3 buckets made available; not yet tested;
    • Work on GPU continues.
    • Offloading towards FPGA. SSH tunnel is not working any longer and it is not clear why.
  • March 10
    • Offloading verso, GPU: aggiornato il plugin NATS per utilizzare “SlurmFlavor” per supportare l’utilizzo di GPU. Il Flavor viene selezionato scorrendo sui flavor disponibili dal più economico al più costoso.
    • Offloading verso FPGA. Il VK di interlink supporta il provisioning di FPGA con il plugin docker. Si può schedulare un pod che richiede FPGA così come si richiede la GPU. Nel Jupyter notebook si possono già usare tutti i tool della Xilinx. Prove fatte con una U55c a Perugia e le prime prove “semplici” sembrano tutte funzionanti in modo corretto. Il sistema potrebbe essere fatto funzionare anche con V70.
    • Stefano Dal Pra organizza una call per organizzare i test.

:arrow_forward: Acquisto FPGA

  • March 10
    • Stefano G. is sending the U55c FPGA to CNAF;
    • Lucio asks to remove one the V70 to send it to Ferrara to continue the test with two different hypervisors and collect additional information.

Status legend

:arrow_forward: Active
:fast_forward: Priority
:bangbang: Problems
:parking: Postponed or Blocked by others
:white_check_mark: Completed

There are minutes attached to this event. Show them.
    • 16:00 16:15
      News and setup 15m
      Speaker: Lucio Anderlini (Istituto Nazionale di Fisica Nucleare)
    • 16:15 16:50
      Updates on development activities 35m
    • 16:50 17:00
      Any other business 10m