1. General Seminars

The CernVM ecosystem: a versatile environment for high-energy physics applications in the cloud

by Gerardo Ganis (CERN)

Europe/Rome
Aula Seminari (LNF INFN)

Aula Seminari

LNF INFN

Via Enrico Fermi, 40 00044 Frascati
Description
The recent years have shown a steady increase in the use of virtualization for high-energy physics computing. This is the result of the advancements in virtualization technology which have made possible, on one side, efficient running of virtual machine inside personal resources, e.g. laptops/desktops; and, on the other side, the shift towards ‘cloud’ models for resources provision, private or public, which imply the use of virtualization to separate the service layer, exposed to the users, from the hardware layer providing the required resources. Whatever the case, users need to prepare a virtual machine image providing the execution environment for the physics application at hand. Since its start, the CernVM project had the goal to exploit virtualization to provide a complete, portable and easy to configure user environment for developing and running LHC data analysis independently of Operating System software and hardware platform. The project has evolved into an ecosystem of products, centered around a small and versatile virtual appliance and an HTTP based file system, CernVM-FS, broadly used for software distribution in HEP also in non-virtualized environments. The virtual appliance runs on a variety of different cloud infrastructures and can be easily adapted to support typical physics workflows. It is used, for instance, to run LHC applications in the cloud, to tune event generators using a network of volunteer computers, and as a container for the historic software environment of the decommissioned ALEPH experiment. The seminar provides an overview of the CernVM ecosystem, the virtual appliance, the contextualization portal and CernVM-FS, the file system used for software distribution, which also takes care of providing on-demand operating system binaries to virtual machines. The use case of the Virtual Analysis Facility, example of setup using the ecosystem as a whole, will be discussed and to adapt the system to a wider set of resource providers, including tapping of so far unused resources such as supercomputers of operating a CernVM/CernVM-FS service, as well as the current development plans. Cloud resources nowadays contribute an essential share of resources for computing in high-energy physics. Such resources can be either provided by "private clouds", academic infrastructures that allow running virtual machines instead of batch jobs, or by public clouds such as Amazon EC2 or Google Compute Engine. In any case, users need to prepare a virtual machine image that provides the execution environment for the physics application at hand. CernVM is a small and versatile virtual machine base image that runs on a variety of different cloud infrastructures and can be easily adapted to support typical physics workflows. It is used, for instance, to run LHC applications in the cloud, to tune event generators using a network of volunteer computers, and as a container for the historic software environment of the decommissioned ALEPH experiment. The presentation provides an overview of the CernVM and its core technology, the CernVM File System. The file system takes care of the on-demand distribution of experiment software and operating system binaries to computing resources around the world. The latest development efforts are targeted at streamlining the maintenance and administration effort of operating a CernVM/CernVM-FS service. Currently ongoing efforts include tapping of so far unused resources such as supercomputers. The recent years have shown a gradual shift in the way computing resources are provided / managed in high-energy physics, with cloud-based setups nowadays contributing an steadly increasing share of the available computing resources. Such resources can be either provided by "private clouds", academic infrastructures which allow spawning virtual machines instead of batch jobs, or by public clouds such as Amazon EC2 or Google Compute Engine. Advancements in virtualization technology have also made viable efficient running of virtual machine inside personal resources, e.g. laptops/desktops.
Slides