Speaker
Description
Purpose: The World Health Organization (WHO) fixed as the gold standard for COVID-19 diagnosis the positive tested RT-PCR while the use of Computed Tomography (CT) images and Artificial Intelligence (AI) for this purpose is still under evaluation since differential diagnosis for other kinds of pneumonia is not reliable. However, AI and CT may be useful to follow up patients and to evaluate the severity of the lung infection allowing the workload reduction in radiology departments. As Artificial Intelligence in Medicine (AIM) INFN project, we worked on this topic developing a system, called LungQuant, based on a cascade of two U-nets using only publicly available data.
Method: The system is made of two U-nets: the first one is trained to segment the lung, while the second one is trained to segment the lung infection due to COVID-19. The output of the first U-Net is then refined using a connected-component labeling strategy, which helps to remove small regions of the segmented mask not connected with the main objects identified as the lungs. The second U-Net for the COVID-19 lesions segmentation is trained on a bounding-box enclosing the morphologically refined segmented lungs. The LungQuant algorithm processes CT images and returns as output the lung segmentation, the lung infection segmentation and the CT-Severity Score (CT-SS), which is defined as the ratio between infected volume and total lung volume. The system has been trained and tested on publicly available datasets to ensure reproducibility. It has been evaluated also in a cross-dataset scheme to study the transferability of its performances. 551 lung CTs have been used to train and evaluate the first U-net for lung segmentation. Three different datasets, containing both COVID-19 and non COVID-19 patients, have been aggregated to obtain a consistent number of samples. To train the second U-net, devoted to the infection segmentation, 249 CTs scans have been used taken from two different datasets. We also applied data augmentation to increase the amount of training data by applying rotations, elastic transformations, zooming and by adding gaussian noise to the original CTs. Finally, the whole system has been tested also on a completely independent public dataset (COVID-19-CT-Seg) to evaluate the LungQuant generalization capability. Both the lung and the infection segmentation masks have been evaluated with the DICE index and CT-SS assessment has been evaluated with accuracy and Mean Absolute Error (MAE).
Results: The LungQuant system has been evaluated on different test sets and we found that the performance strongly depends on the quality of labeling and on how much similar both the test image selection criteria and test image labels are with respect to the training set’s one. The DICE indices obtained on the completely independent test set are 0.95 ± 0.01 for lung segmentation and 0.66 ± 0.13 for the infection segmentation. The system is able to classify the CTs in terms of CT Severity Score with an accuracy equal to 90% on the independent test set with a MAE equal to 4.2% on COVID-19-CT-Seg.
Discussion and conclusions: We developed a fully automated quantification pipeline, the LungQuant system, for the identification and segmentation of the lungs and the pulmonary lesions related to COVID-19 pneumonia in CT scans. The system returns the COVID-19 related lesions, the lung mask and the ratio between their volumes, which is converted into a CT-SS. The LungQuant system achieved a Dice index of 0.95 ± 0.01 in the lung segmentation task and of 0.66 ± 0.13 in segmenting the COVID-19 related lesions on the fully annotated publicly available benchmark COVID-19-CT-Seg dataset of 10 CT scans. The system is able to classify the CTs in terms of CT-SS with an accuracy equal to 90% on the independent test set with a MAE equal to 4.2%. Despite this result is encouraging, it was obtained on a rather small dataset, constituted by COVID-19-CT-Seg and MosMed CT scans, which involves most subjects with low disease severity. Therefore, a broader validation of a larger data sample with more heterogeneous composition in terms of disease severity is required.