Workshop di CCR: La Biodola, 3 - 7 giugno 2019

Name: Workshop di CCR: La Biodola, 3 - 7 giugno 2019
Start: 2019-06-03T09:00:00+02:00
End: 2019-06-07T23:30:00+02:00
Location: Hotel Hermitage - Isola d'Elba

3–7 Jun 2019

Hotel Hermitage - Isola d'Elba

Europe/Rome timezone

Support

Software Defect Prediction on Unlabelled Dataset using Machine Learning

5 Jun 2019, 17:30

30m

Sala Maria Luisa (Hotel Hermitage - Isola d'Elba)

Sala Maria Luisa

Hotel Hermitage - Isola d'Elba

La Biodola 57037 Portoferraio (Li) Tel. +39.0565 9740 http://www.hotelhermitage.it/

Orale Machine Learning Tecnologie Software e ML

Elisabetta Ronchieri (CNAF)

Machine Learning (ML) has proven to be of great value in a variety of Software Engineering tasks, such as software defects prediction and estimation and test code generation. To accomplish these tasks, datasets (e.g. features represented by software metrics) have to be collected for the various modules (such as files, classes and functions) and properly preprocessed before the application of machine learning techniques. These activities are essential to manage missing values and/or removal inconsistencies amongst data.

Typically, new projects or projects with partial historical data may lack some features' data, e.g. defect data are not included. Their datasets are called unlabelled datasets and are the vast majority of software datasets. The extraction of the complete set of features (defectiveness included) and the labelling of the various instances imply effort and time. In literature there exist various approaches to build a prediction model on unlabelled datasets that entail a high number of permutations that is extremely time consuming. Cloud computing infrastructure, GPU-equipped resources and adequate ML framework can give the chance to overcome this problem.

In this study, we are going to present the analysis of existing software unlabelled datasets by implementing models in different available frameworks, such as TensorFlow and Keras, and running in Python and R. Recently, as a work in progress, we started to explore the application onto a large code base like the full software stack of the CMS experiment at the LHC collider at CERN. We have evaluated these frameworks by considering three aspects: extensibility, hardware utilization and speed. We intend to reduce the distance between theory and practice by providing strengths and limitations of the considered frameworks to enable users to assess suitability according to their requirements.

Elisabetta Ronchieri (CNAF) Marco Canaparo (CNAF) Davide Salomoni (CNAF)

CCRRon2019.pdf

Workshop di CCR: La Biodola, 3 - 7 giugno 2019

Support

Software Defect Prediction on Unlabelled Dataset using Machine Learning

Sala Maria Luisa

Hotel Hermitage - Isola d'Elba

Speaker

Description

Primary authors

Presentation materials

Choose timezone

Workshop di CCR: La Biodola, 3 - 7 giugno 2019

Support

Speaker

Description

Primary authors

Presentation materials