Seminari

CERN IT Monitoring: metrics, logs and beyond

by Luca Magnoni (CERN)

Europe/Rome
Sala Venturi (CNAF)

Sala Venturi

CNAF

Viale Berti Pichat, 6/2 Bologna
Description

In recent years the CERN IT monitoring infrastructure, which provides monitoring facilities to the CERN Data Centre and to WLCG services, has gone through a major redesign phase.

A number of legacy tools (e.g. Lemon, Experiment Dashboards) have been decommissioned in favor of a modern technology stack based on open-source products. Today more than 40 thousand hosts and more than 150 IT Services are successfully monitored using the new data pipeline approach, with Apache Kafka as core transport layer, different data-gathering agents (e.g. Collectd, Prometheus, Logstash, HTTP) and several storage backends (e.g. Elasticsearch, InfluxDB, HDFS), with Grafana as main visualization tool (more than two thousand dashboards and more than one million queries per day).

The new architecture not only has proved to scale well beyond design (3.5 TB/day of compressed metrics and logs), but it has also been successfully adopted by users doing on-the-fly data processing and analytics, with streaming technology like Apache Spark and Kafka Streams.

In this seminar we will explore the architecture of the infrastructure, the technical challenges and the lessons learned.