key: cord-0675779-sptonmpo
authors: Tan, Weijun; Guo, Hongwei
title: Data Augmentation and CNN Classification For Automatic COVID-19 Diagnosis From CT-Scan Images On Small Dataset
date: 2021-08-16
journal: nan
DOI: nan
sha: 35574722192cd2dcc92faf9889beeca91d554d71
doc_id: 675779
cord_uid: sptonmpo

We present an automatic COVID1-19 diagnosis framework from lung CT images. The focus is on signal processing and classification on small datasets with efforts putting into exploring data preparation and augmentation to improve the generalization capability of the 2D CNN classification models. We propose a unique and effective data augmentation method using multiple Hounsfield Unit (HU) normalization windows. In addition, the original slice image is cropped to exclude background, and a filter is applied to filter out closed-lung images. For the classification network, we choose to use 2D Densenet and Xception with the feature pyramid network (FPN). To further improve the classification accuracy, an ensemble of multiple CNN models and HU windows is used. On the training/validation dataset, we achieve a patient classification accuracy of 93.39%.

Abstract-We present an automatic COVID1-19 diagnosis framework from lung CT images. The focus is on signal processing and classification on small datasets with efforts putting into exploring data preparation and augmentation to improve the generalization capability of the 2D CNN classification models. We propose a unique and effective data augmentation method using multiple Hounsfield Unit (HU) normalization windows. In addition, the original slice image is cropped to exclude background, and a filter is applied to filter out closed-lung images. For the classification network, we choose to use 2D Densenet and Xception with the feature pyramid network (FPN). To further improve the classification accuracy, an ensemble of multiple CNN models and HU windows is used. On the training/validation dataset, we achieve a patient classification accuracy of 93.39%.

The outbreak of the novel COVID-19 coronavirus in late 2019 has put a tremendous threat to the whole world and become one of the worst disaster in the human history. As of end late April 2021, more than 142 millions infections have been identified, more than 3 million lives have been lost, and more than 200 countries have been drastically overwhelmed [1] . Therefore, it is very critical to stop the spreading of the virus. After a person is confirmed to have COVID-19, safety measures and treatments can be taken accordingly. In [2] , thermal imaging is used to detect fever patients and face recognition is used to report and trace patients and their close contacts. Among the techniques to diagnosis COVID-19, Xray and CT-scan images are studied extensively.

In this paper, we present an automatic diagnosis framework from chest CT scan images. Our goal is to classify COVID-19, Community-Acquired Pneumonia (CAP), and normal cases from a volume of CT scan images of a patient. We use the dataset provided in the Signal Processing Grand Challenge (SPGC) on COVID-19 of the IEEE ICASSP 2021 [3] . Our preliminary study shows that one major challenge is that the training/validation dataset is small. This challenge is common to many other datasets, and different approaches are studies to address this problem, or the broader across-domain dataset problem, through data augmentation, across-domain adaptation [4] , [5] , [6] , or using the capsule network [7] , [8] .

In this paper, we propose a novel data augmentation technique using multiple Houndsfeld Unit (HU) normalization windows. This data augmentation aims at improving a CNN model's generalization capability. In addition, it can exploit large COVID-19 CT scan datasets that are available online but without preprocessing details. Other signal processing techniques we use include cropping the chest images to exclude background, and filtering out close-lung images.

For the classification network, after exploring 3D CNN classification networks, lung mask segmentation networks, and quite a few 2D CNN classification networks, we choose to use Densenet [9] and Xception [10] 2D CNN classification networks with the feature pyramid network (FPN) [11] . To further improve the classification accuracy, an ensemble of multiple CNN models is used. On the provided training/validation dataset, we achieve a patient classification accuracy of 93.39%.

Due to the urgency of control of spreading of the COVID-19 virus, a lot of researches have been done to diagnose it using deep learning approaches, mostly CNNs on CT scan images or X-ray images. A few examples are [4] , [5] , [6] , [12] , [13] , [7] , [8] . For a complete review, please refer to [14] and [15] .

These methods can be categorized to 2D, 2D+1D, and 3D based on how information from multiple slice images are aggregated and how the final decision is made. In the 2D method, a 2D CNN classification network in used to make a prediction on slice image individually. Then to make a decision for a patient, some voting method is typically used [16] , [17] , [8] . Others use a 2D CNN network on slice image to generate embedding feature vector for every image, then the feature vectors of selected multiple or all slice images are pooled to a single global feature vector, and finally a small classification network (typically just a few fully-connection (FC) layers) is used to make a final decision. This is called 2D+1D method [18] , [19] , [20] . In these two methods, annotation for slice image is needed. The third method is a pure 3D CNN network, where slice annotation is not needed, and a selected set of or all the slice images are used as input, and the 3D network process all these input images all at once in a 3D channel space [21] , [22] , [23] .

In the 2D CNN method, some use the lung mask segmentation, but most of them directly use the raw slice image. The COVID-MaskNet [24] uses a segmentation network to localize the disease lesion, then use a FasterCNN-based approach to do the classification on the detected lesion regions. The COVID-Net Initiative [25] , [26] have done extensive studies of COVID classification on both CT scan images and X-ray images. They also collect and publish the largest CT image dataset -so called COVIDx CT-2 dataset. In [17] , Resnet50 with FPN is used. In [16] , a combination of infection/non-infection classifier, and a COVID-19/CAP/normal classifier is used.

In the 2D+1D method, in [18] , a pretrained 2D Resnet classification network is used to extract a feature vector out of every slice image, then all the features are pooled using max-pooling. This feature is sent to a few FC layers to make the final classification prediction. In [19] , a Capsule network is used to extract feature vector for every image, then these feature vectors are pooled using max-pooling into a global feature vector and a decision is made for the volume. It is claimed that this method is good for small training dataset. In [20] , a feature vector is extracted for every image, then multiple pooling methods are ensembled to generate a global feature vector before a final classification is made. In [27] , an RNN is used to aggregate 2D features, but the performance is poor.

In the 3D CNN method, in [21] , a 3D CNN network is used with both the slice image and a segmented lung mask as input. They discard a fixed percentage of slice images at the beginning and end of a CT-scan volume. In [22] , the authors first segment the lung mask from a slice image using traditional morphological transforms, then use this mask to select good slice images and generate lung-only images. To make the number of images a fixed number, they use a 3D cubic interpolation to regenerate slice images. In [23] , a 3D CNN network using a fixed number of slice images as input is used. Instead of using a fixed 3D CNN architecture, an autoML method is used to search for best 3D CNN architecture in the network space with MobileNetV2 [28] block. In [29] , a 3D CNN with BERT is used on selected slice images of a patient. Sampling is used to make the number of slice images a fixed number.

In this section, we discuss all the data preparation and augmentation techniques we explore. This data augmentation is an unique one crucial to our final performance. It applies only to CT images where a HU normalization is needed.

The dataset is given in DICOM format and the CT image is in Hounsfield unit (HU). So the first thing is to convert the DICOM format to image format. Most of the public datasets are in PNG or JPG format, except for [17] which uses a TIFF format. We use the PNG format in order to leverage other datasets in PNG format.

We follow the Kaggle tutorial [30] to convert the DICOM to PNG format. After reading the DICOM file, we extract the slice thickness, slope and intercept. The HU value is,

As explained in the tutorial, different HU values correspond to different materials in human's body, and background. The very important step is to do the HU normalization, from HU value to the PNG value as follows,

where clip is a function to limit the value in range [0,1], HU max and HU max are the maximum and minimum HU value for normalization. These pair (HU min , HU max ) is called a HU window.

Even though there are a lot of public CT image dataset, unfortunately, we cannot find anywhere what HU window should be used in the DICOM to PNG conversion. The COVIDx-CT-2 [26] uses [-1350, 150] as default value, the [21] uses [-1200, 600]. We test different HU window in our study and find that they give very similar classification results. However, when we want to use other public dataset, our HU window has to match that of the dataset in order to have a reasonable result. We analyze the image intensity histogram of the COVIDx-CT-2 [26] , and find that if we use a HU window [-1200, 0], the SPGC image intensity histogram can match closely that of the COVIDx-CT-2. Shown in Fig. 1 are sample images using these three HU windows and a sample from COVIDx-CT-2. We notice that the Fig. 1 (a) is more natural in human eyes, but it may not be the best for CNN classification. Shown in Fig. 2 are the histograms of these four sample images. We see that the one using the HU window [-1200,0] matches that of the COVIDx-CT-2 sample well.

To solve this problem, we propose the so called HU augmentation -to use multiple HU windows in the HU normalization,

(3) So the PNG datasets consist of all PNG images from P N G 1 , P N G 2 , P N G 3 , .... This not only may overcome the HU window mismatch problem, but also provides more data for training the classification network. We find in the final benchmarking that this is a crucial contribution to our results due to the improved generalization.

Furthermore, this provides us with another ensemble -the ensemble of test data for prediction. If we use three HU windows to prepare the data, we have three images for every original DICOM slice image, we can have three predictions and we can post process them to get the best performance. we will show these effects in our experiments.

We find that the useless background in the CT image interferes with the training and prediction of the classification network. Therefore, we use lung segmentation mask to cut of the background and only keep the useful portion. This lung segmentation uses traditional image morphological transforms, similar to the Kaggle tutorial [30] . We also train a simple object detection network and its performance is about same as the morphological transform. This process is shown in Fig.  3(b) .

We use an idea similar to [17] to filter out closed lung images. We use two metrics -the percent of segmented lung in the total image, and the one in [17] . Please note that in [17] , the images are not cropped, so the filter does not work consistently. We use the idea on the cropped images, the so region of interest (ROI) is aligned more consistently on different CT images.

In the first metric, we keep slice images whose percent of segmented lung is more than 10% of the total image. Shown in Fig. 3(c) is an extracted lung mask. When the percentage of the white pixels is less than 10% of the total pixels, the lung is closed. In the second metric, we count the number of black pixels (intensity value < 100) in the ROI = ([120,240], [370,340]) in a slice image, as shown in the red rectangle in Fig.3(d) . Then among all the slice images of a patient, find the maximum and minimum values, and use a threshold = (maxmin)/factor, where factor is typically 1.5-3. Slice images whose number of black pixels in the ROI is less than the threshold are filtered out [17] .

Even with the HU augmentation, the training dataset is still very small. We explore the CT image dataset available in the literature. There are quite a lot of them, we namely some of them here COVIDx-CT-2 [26] , CNCB [31] , CT-Codeset [17] . Out of all these datasets, COVIDx-CT-2 [26] is the largest one with nearly 200K images, we decide to use it in our study.

However, we do not want these third-party dataset to overrun the SPGC dataset. So we only use a small portion of it, so the total number from it is not more than that in the SPGC dataset. We use this dataset in both the training and validation dataset, but not in the test dataset. We only use the SPGC images in the test dataset.

Our 2D CNN classification network is shown in Fig. 4 . We use a network similar to [17] . Three different scale features are generated in the FPN, and a three-class classifier is applied Fig. 4. CNN-FPN classifier for COVID-19 , CAP and normal case. on every feature. At the end, these three classifiers are merged into the final three-class classifier.

Some details of our implementation are listed here. We use a three-way classification, no matter the test dataset is a twoway or three-way classification task. We use rotation, shift, and scale transforms. We find that the rotation degree has big impact on the classification performance. We limit it to 15 degree. We use batch normalization after all convolutional layers. We use the image size 224x224 in ResNet50 [32] and DenseNet121/201 [9] , and 299x299 in Xception [10] .

We test different networks, including ResNet50 [32] , MobileNetV2 [28] , Xception [10] , DenseNet121 and DenseNet201 [9] . We find that the DenseNet trains faster and achieves good performance. So in our ablation study, we use the DenseNet201. In the final classification benchmarking, we use an ensemble of the Xception, DenseNet121, and DenseNet201. In order not to use too much CPU or GPU memory at a time, we run the three models one by one and post-process the results.

We also test a regular 2D CNN classification network without FPN, such as a ResNet50, as well as a small private CNN, we find that they can achieve good accuracy with all our other data augmentation and training techniques. However, the CNN with FPN can achieve a better accuracy, usually 1 − 2% better than the network without FPN.

We use Keras-2.3.0 and Tensorflow-GPU-2.2 in our implementation. We use learning rate 1E-4 at the beginning then adjust to 1E-5. We use typically 50 epochs to train the CNN network, except for fine tuning where a smaller number of epochs is used. A class weight is used when the numbers of images for the three classes -COVID, CAP, normal are unbalanced.

The dataset provided by SPGC [3] includes 307 patients, which are diagnosed by medical experts. Out of these patients, 171 patients have COVID-19, 60 patients have CAP, and the rest 76 are normal cases. This dataset is not small, however, only a small portion including 55 COVID-19 patients and 25 CAP patients has slice annotations. This limited slice annotation turns out a big challenge in the slice based classification.

Since we use a slice image based classification model, we use all the annotated slice images as the training and validation datasets. For COVID and normal cases, we use a 7:3 split ratio for training and validation dataset, and use a 9:1 split ratio for the CAP cases since there are a lot less of them. All COVID/CAP patients in the validation dataset are not used in the training, only a small portion of slice images of the normal patients are used in the training. In the patient-wise classification, we use all the patients without slice annotation as the validation dataset.

Furthermore, we leverage the CT image dataset we can find online -the COVIDx-CT-2 dataset [26] . The combined dataset makes our trained model generalize much better than using one single HU window alone.

The slice based classification network predicts a result for every slice image. Our goal, however, is to have a classification result on patient. Given the number of slice images per patient, we use two metrics to make the final decision. In the first method, we use a slice threshold th s , Please note that, in our final decision, we do not do majority vote on multiple model predictions on every image. Instead, we mix all image predictions from all three models and make a final decision based on the number of predictions be-longing to the three classes. The performance of this rule is noticeably better than the majority vote on a single image.

At validation/test time, we first do slice image classification, then use the above rule to make decision for a patient. We use the un-annotated patients as the validation patient dataset. We optimize the parameters of our classifier on the validation dataset, then use the optimized parameters on the final test datasets for the challenge submission.

The use of HU augmentation is our unique novelty. So we do some ablation study to demonstrate its effects using the DenseNet201-FPN network. In this experiment we use the factor = 1.5 in the closed-lung filtering. We use three HU windows: SPGC3 = [-1000,400], SPGC4 = [-1200,0], SPGC6 = [-1200,600]. In Table 1 , we list the result of individual HU window in the first panel, and list the result of the HU augmentation (three HU windows combined) in the second panel. From the results in the first panel, we notice that the SPGC3 and SPGC4 are two good HU windows, while SPGC6 is not a good one. From the results in the second panel, the test results of three HU windows are about the same, even though their sensitivities (not shown) are not the same. Even the individually bad HU window (SPGC6) now has a good performance.

In addition to the HU augmentation, we use the COVIDx-CT-2 dataset [26] in our training. As we mentioned before, we only use a small portion of it, so it does not overrun the SPGC dataset. So in our training dataset, we include both the three HU window augmented SPGC dataset, and the selected COVIDx-CT-2 dataset. The patient classification results are listed in the third panel of Table I .

From the results, we notice that the addition of the extra training data has a big boost to the patient classification accuracy, from previous about 82% to nearly 91%. On the other hand, SPGC6 is a lot worse than the SPGC3 and SPGC4, even though its accuracy is improved from without using COVIDx-CT-2 dataset. Based on the poor performance of SPGC6, we do not use it in the ensemble of HU windows in the final tests.

We test a few classification networks including ResNet50, MobileNetV2, and others. Based on the patient classification accuracy results, we choose to use the Xception, DenseNet101 and DenseNet201. The individual model results are listed in first panel of Table II. The three networks give similar patient classification accuracy results on the two HU window datasets SPGC3 and SPGC4. The DenseNet201 has the best accuracy on SPGC3, and the Xception has best accuracy on SPGC4. Adding all three network into an ensemble, and on the ensemble of SPGC3 and SPGC4, our final accuracy result on the SPGC training/validation dataset is 93.39%.

On the new test dataset that may come from a different domain, we can tune the thresholds th s and th p on a small portion of the dateset to achieve good performance, then use the two thresholds on the rest of the data. Listed in Table III are the results. Since the ground truth of this dataset has not been released, we cannot optimize the parameters th s and th p other than using the optimized values on the validation dataset.

We provide a solution for the COVID-19 automatic diagnosis on the SPGC dataset. The key novelty is a data augmentation using multiple HU normalization windows. With all techniques put together, we achieve a patient classification accuracy 93.39% on the provided training/validation dataset, and an accuracy at least 81.11% on the test dataset.

Coronavirus disease (covid-19) pandemic

Application of face recognition in tracing covid-19 patients and close contacts

2021 ieee icassp signal processing grand challenge (spgc) covid-19 radiomics

Hybrid-covid: a novel hybrid 2d/3d cnn based on cross-domain adaptation approach for covid-19 screening from chest x-ray images

Soda: Detecting covid-19 in chest xrays with semi-supervised open set domain adaptation

Contrastive cross-site learning with redesigned net for covid-19 ct classification

Covid-caps: A capsule networkbased framework for identification of covid-19 cases from c-ray images

Covid-fact: A fully-automated capsule network-based framework for identification of covid-19 cases from chest ct scans

Densely connected convolutional networks

Xception: Deep learning with depthwise separable convolutions

Feature pyramid networks for object detection

Deep learning approaches for covid-19 detection based on chest x-ray images

Multi-task deep learning based ct imaging analysis for covid-19 pneumonia: Classification and segmentation

Review on diagnosis of covid-19 from chest ct images using artificial intelligence

Systematic review of artificial intelligence techniques in the detection and classification of covid-19 medical images in terms of evaluation and benchmarking: Taxonomy analysis, challenges, future solutions and methodological aspects

Detecting covid-19 and community acquired pneumonia using chest ct scan images with deep learning

A fully automated deep learning-based network for detecting covid-19 from a new and large lung ct scan dataset

Artificial intelligence distinguishes covid-19 from community acquired pneumonia on chest ct

Ct-caps: Feature extraction-based automated framework for covid-19 disease identification from chest ct scans using capsule networks

Multi-scale residual network for covid-19 diagnosis using ct-scans

A weakly-supervised framework for covid-19 classification and lesion localization from chest ct

Covid-19 diagnostic using 3d deep transfer learning for classification of volumetric computerised tomography chest scans

Automated model design and benchmarking of 3d deep learning models for covid-19 detection with chest ct scans

Covid-ct-mask-net: Prediction of covid-19 from ct scans using regional features

Covidnet-ct: A tailored deep convolutional neural network design for detection of covid-19 cases from chest ct images

Covid-net ct-2: Enhanced deep neural networks for detection of covid-19 from chest ct images through bigger, more diverse learning

Mia-cov19d: Covid-19 detection through 3-d chest ct image analysis

Mo-bilenetv2: Inverted residuals and linear bottlenecks

A 3d cnn network with bert for automatic covid-19 diagnosis from ct-scan images

full-preprocessing-tutorial

Clinically applicable ai system for accurate diagnosis, quantitative measurements and prognosis of covid-19 pneumonia using computed tomography

Deep residual learning for image recognition