Workshop sul Calcolo nell'INFN - Palau (Sassari) | 20 - 24 maggio 2024

Name: Workshop sul Calcolo nell'INFN - Palau (Sassari) | 20 - 24 maggio 2024
Start: 2024-05-20T09:00:00+02:00
End: 2024-05-24T23:30:00+02:00
Location: Park Hotel Cala di Lepre

20–24 May 2024

Park Hotel Cala di Lepre

Europe/Rome timezone

Support

ccr_ws_secretariat@lists.infn.it

Monitoring resources of a computing infrastructure with Redfish and SNMP

24 May 2024, 11:35

25m

Sala Meeting "Le Saline" (Park Hotel Cala di Lepre)

Sala Meeting "Le Saline"

Park Hotel Cala di Lepre

Via Cala di Lepre 07020 Palau (SS) Italia

Presentazione orale Servizi ICT Sessione "Servizi ICT"

Nicola Mosco (Istituto Nazionale di Fisica Nucleare)

Efficient and secure operation of computing centre machines is very important in contemporary digital landscapes. The complexity and scale of the computing centres requires to develop a framework to monitor the resources and to build a responsive system that is able to prevent potential failures. The aim of this presentation is to illustrate the current status and future goals of the monitoring solution that is being developed at the computing centre of INFN Torino.
The new challenges posed by the PNRR projects, such as ETIC and TeRABIT, require a fresh approach regarding monitoring of hardware and software components.
The primary motivation behind this framework is to enhance operational efficiency by providing administrators with timely insights into resource utilisation, workload distribution, and system health. By leveraging advanced monitoring tools, administrators can proactively identify bottlenecks, optimise resource allocation, and mitigate performance degradation, thereby ensuring uninterrupted service delivery.
The traditional approach to monitoring relies on the SNMP protocol. While this has been the standard for decades, it presents several downsides which call for a new approach.
While SNMP remains a widely used protocol for network management and monitoring, especially for legacy systems, Redfish offers several advantages in terms of modern design, scalability, feature set, security, and vendor support; thus, it can be more efficient and easier to use as it is based on RESTful APIs and a JSON data model.
We show a possible implementation that is able to collect performance metrics from physical machines using Redfish and integrating this information with an SNMP exporter for Prometheus, combined with the convenience of Grafana dashboards. The current setup shows the correlations between the power consumption and the workload of the HTCondor jobs.

Nicola Mosco (Istituto Nazionale di Fisica Nucleare)

Lia Lavezzi (Istituto Nazionale di Fisica Nucleare) Luca Tabasso (Istituto Nazionale di Fisica Nucleare) Marco Sadocco (Istituto Nazionale di Fisica Nucleare)

CCR_Palau_2024-05-24.pdf

Workshop sul Calcolo nell'INFN - Palau (Sassari) | 20 - 24 maggio 2024

Support

Monitoring resources of a computing infrastructure with Redfish and SNMP

Sala Meeting "Le Saline"

Park Hotel Cala di Lepre

Speaker

Description

Author

Co-authors

Presentation materials

Choose timezone

Workshop sul Calcolo nell'INFN - Palau (Sassari) | 20 - 24 maggio 2024

Support

Speaker

Description

Author

Co-authors

Presentation materials