26–30 May 2025
Hotel Hermitage - Isola d'Elba
Europe/Rome timezone

Advancements in Monitoring Infrastructure at INFN Torino Computing Centre

27 May 2025, 12:15
15m
Sala Maria Luisa (Hotel Hermitage - Isola d'Elba)

Sala Maria Luisa

Hotel Hermitage - Isola d'Elba

La Biodola 57037 Portoferraio (Li) Tel. +39.0565 9740 http://www.hotelhermitage.it/
Presentazione orale Infrastrutture e sostenibilità Infrastrutture e sostenibilita'

Speaker

Nicola Mosco (Istituto Nazionale di Fisica Nucleare)

Description

Computing centres require sophisticated management and monitoring infrastructure to maintain operational efficiency. The increasing complexity of resource management necessitates robust frameworks capable of promptly responding to potential failures. This contribution provides an update on the monitoring framework at the INFN Torino computing centre, highlighting significant advancements implemented over the past year in response to challenges posed by the PNRR projects ETIC and TeRABIT.

The monitoring solution has undergone substantial improvements across multiple domains.

Advanced Database Architecture: Our PostgreSQL-based monitoring system now incorporates dedicated configuration tables for critical parameters. Thresholds, color schemes, and significant attribute values are now defined within the database structure, enhancing consistency and facilitating centralized management.

Streamlined Dashboard Deployment: By leveraging the Grafana provisioning interface, we have automated the deployment of dashboards and data sources. We also employed Jsonnet and Grafonnet to build the structure of the main dashboard to overcome some limitations of Grafana itself. This approach ensures configuration consistency across the monitoring ecosystem while simplifying version control and reducing maintenance overhead.

Enhanced Visualization Systems: We have developed sophisticated visualization tools including: A machine status dashboard with intuitive color-coded indicators representing various functions and health conditions; A dedicated power supply dashboard that visualizes power load distribution per rack and PDU bank, enabling optimized energy management.

Integrated Protocol Framework: Our comprehensive monitoring approach combines multiple protocols: Redfish for modern performance metrics and its rich RESTful API interface; IPMI for hardware status retrieval where needed; SNMP for legacy device support.

Primary authors

Lia Lavezzi (Istituto Nazionale di Fisica Nucleare) Luca Tabasso (Istituto Nazionale di Fisica Nucleare) Nicola Mosco (Istituto Nazionale di Fisica Nucleare)

Presentation materials

There are no materials yet.