key: cord-0790306-yytp1018 authors: Sarkar, Arjun; Vandenhirtz, Joerg; Nagy, Jozsef; Bacsa, David; Riley, Mitchell title: Identification of Images of COVID-19 from Chest X-rays Using Deep Learning: Comparing COGNEX VisionPro Deep Learning 1.0™ Software with Open Source Convolutional Neural Networks date: 2021-03-10 journal: SN Comput Sci DOI: 10.1007/s42979-021-00496-w sha: f15cd17b9b932557b36ca41c3671b080e200d7e8 doc_id: 790306 cord_uid: yytp1018 The novel Coronavirus, COVID-19, pandemic is being considered the most crucial health calamity of the century. Many organizations have come together during this crisis and created various Deep Learning models for the effective diagnosis of COVID-19 from chest radiography images. For example, The University of Waterloo, along with Darwin AI—a start-up spin-off of this department, has designed the Deep Learning model ‘COVID-Net’ and created a dataset called ‘COVIDx’ consisting of 13,975 images across 13,870 patient cases. In this study, COGNEX’s Deep Learning Software, VisionPro Deep Learning™, is used to classify these Chest X-rays from the COVIDx dataset. The results are compared with the results of COVID-Net and various other state-of-the-art Deep Learning models from the open-source community. Deep Learning tools are often referred to as black boxes because humans cannot interpret how or why a model is classifying an image into a particular class. This problem is addressed by testing VisionPro Deep Learning with two settings, first, by selecting the entire image as the Region of Interest (ROI), and second, by segmenting the lungs in the first step, and then doing the classification step on the segmented lungs only, instead of using the entire image. VisionPro Deep Learning results: on the entire image as the ROI it achieves an overall F score of 94.0%, and on the segmented lungs, it gets an F score of 95.3%, which is better than COVID-Net and other state-of-the-art open-source Deep Learning models. The novel coronavirus disease, named COVID-19 by the World Health Organization-is caused by a new coronavirus class known as SARS-CoV2 (Severe Acute Respiratory Syndrome Coronavirus 2). It is a single-stranded RNA (ribonucleic acid) virus that causes severe respiratory infections. The first COVID-19 cases were reported in December 2019, in Wuhan, Hubei province, China [1] . As the virus has since spread worldwide, it has been given the status of pandemic by the World Health Organization. As of 16th February,2021, 12:00 GMT, 110 million people have been Hence, a significant number of expert radiologists who can interpret these images are needed. Due to the ever-increasing number of cases of COVID-19 infection, it is becoming more difficult for radiologists to keep up with this demand. In this scenario, Deep Learning techniques have proven to be beneficial in both classifying abnormalities from lung X-ray images and aiding the radiologists to accurately predict COVID-19 cases in a reduced time frame. While many studies have demonstrated success in detecting images of COVID-19 using Deep Learning with both CT scans and X-rays, most of the Deep Learning architectures require extensive programming. Moreover, most of the architectures fail to showcase whether the Deep Learning model is being triggered by abnormalities in the lungs or artefacts not related to COVID-19. Due to the absence of a GUI (Graphical User Interface) with most of these Deep Learning models, it is difficult for radiologists, who lack knowledge in Deep Learning or programming, to use these models, let alone train them. Therefore, we showcase an already existing Deep Learning software with a very intuitive GUI, which can be used as a pre-trained software or can even be trained on new data from particular hospitals or research centres. COGNEX VisionPro Deep Learning™ is a Deep Learning vision software, from COGNEX Corporation (Headquarters: Natick, MA, United States). It is a field-tested, optimised, and reliable software solution based on a state-ofthe-art set of machine learning algorithms. VisionPro Deep Learning combines a comprehensive machine vision tool library with advanced Deep Learning tools. In this study, we used the latest version-VisionPro Deep Learning 1.0-to aid in the classification of images as normal, non-COVID-19 (pneumonia), or COVID-19 chest X-rays. The results are compared with various stateof-the-art open-source neural networks. The VisionPro Deep Learning GUI, called COGNEX Deep Learning Studio, has three tools for image classification, segmentation, and location. It contains various Deep Learning architectures built within the GUI, to carry out specific tasks: 1. Green Tool-This is the Classify tool. It is used to classify objects or complete scenes. It can be used to classify defects, cell types, images of different labels, or different types of test tubes used in laboratories. The Green tool learns from the collection of labelled images of different classes and can then be used to classify images that it has not previously seen. This tool is similar to classification neural networks such as VGG [13] , ResNet [14] and DenseNet [15] . 2. Red Tool-This is the Analyse tool. It is used for segmentation and defect/anomaly detection; for example, to aid in the detection of anomalies in blood samples (clots), incomplete or improper centrifugation, or sam-ple quality management. The Red tool is also used to segment specific regions, such as defects or areas of interest. The Red tool comes with the option of using either Supervised Learning or Unsupervised Learning for segmentation and detection. This is similar to the segmentation neural network, such as U-Net [16] . 3. Blue Tool-(a) This is the Feature Localisation and Identification tool. The Blue tool finds complex features and objects by learning from labelled images. It has selflearning algorithms that can locate, classify, and count the objects in an image. It can be used for locating organs in X-ray images or cells on a microscopic slide. (b) The Blue tool also has a Read feature. It is a pre-trained model that helps to decipher severely deformed and poorly etched words and codes using optical character recognition (OCR). This is the only pre-trained tool. All other tools need to be trained on images first to get results. For the classification of COVID-19 images, two settings are used: 1. Green tool for classification of the entire chest X-ray images 2. Red tool for segmentation of the lungs, and then a subsequent Green tool classifier just run on the segmented lungs to make sure the Deep Learning software predicts its results based on the lungs only. The open access benchmark dataset called COVIDx was used for training the various models [17] . The dataset contains a total of 13,975 Chest X-ray images from 13,870 patients. The dataset is a combination of five different publicly available datasets. According to the authors [17] of COVID-Net, COVIDx is one of the largest open-source benchmark datasets in terms of the number of COVID-19 positive patient cases. These five datasets were used by the authors of COVID-Net to generate the final COVIDx dataset: (a) Non-COVID 19 pneumonia patient cases and COVID-19 cases from the COVID-19 Image Data Collection [18] , (b) COVID-19 patient cases from the Fig. 1 COVID-19 Chest X-ray Dataset [19] , (c) COVID-19 patient cases from the ActualMed COVID-19 Chest X-ray Dataset [20] , (d) Patient cases who have no pneumonia (that is, normal) and non-COVID-19 pneumonia patient cases from RSNA Pneumonia Detection Challenge dataset [21] , (e) COVID-19 patient cases from the COVID-19 radiography dataset [22] . The idea behind using these five datasets was that these are all open-source COVID-19/Pneumonia Chest X-ray datasets, so they can be accessed by everyone in the research community and by the general public, and also add variety to the dataset. However, the lack of COVID-19 Chest X-ray images made the dataset highly imbalanced. Of the 13,975 images, the data were split into 13,675 training images and just 300 test images. The data were divided across three classes: (1) normal (for X-rays which did not contain Pneumonia or COVID-19), (2) non-COVID-19/ Pneumonia (for X-rays, which had some form of bacterial or viral pneumonia, but not COVID-19), and (3) images. In the test set, there is an equal distribution of 100 images across all the three classes. Horizontal axis represents the different categories or classes, and the vertical axis represents the number of images in each of these categories the Non-COVID-19/Pneumonia class, and only 258 images in the COVID-19 class. The test set was a balanced set, with each of the three classes having 100 of each image type [17] . Fig. 2 shows three X-ray images from each class in the dataset. The authors of COVID-Net have shared the dataset generating scripts, for the construction of the COVIDx dataset for public access available at the following link-https :// githu b.com/linda wangg /COVID -Net [17] . The python notebook 'create_COVIDx_v3. ipynb' was used to generate the dataset. The text files 'train_COVIDx3.txt' and 'test_COV-IDx3.txt' contains the file names used in the training and test set, respectively. It was then tested with VisionPro Deep Learning, and the results were compared with COVID-Net results and other open-source Convolutional Neural Network (CNN) architectures such as VGG [13] and DenseNet [15] . Tensorflow [33] (developed by Google Brain Team [34] ) library was used to generate and train the open-source CNN architectures The scripts for generating the COVIDx dataset were used to merge the five datasets together and separate the images into training and test folders. Along with the images, the script also generated two text files containing the names of images belonging to the training and test folders, and their class labels [17] . To simplify classification, a python script was used to convert the '.txt' files into 'pandas' data frames and then finally converted to '.csv' files for better understanding. Next, another python script was created to rename all of the X-ray images of the training and test folders according to their class labels and store them in new training and test directories. Since the goal was classification of the X-ray images, renaming the images made it easier to interpret the images directly from their file names, rather than consulting a '.csv' file every time. Finally, we have all the 13,975 images in train and test folders, with their file names containing the class labels. Unlike most other Deep Learning architectures VisionPro Deep Learning does not require any pre-processing of the images. The images can be fed directly into the GUI, and the software automatically does the pre-processing, before starting to train the model. Since the COVIDx dataset is a combination of various datasets, the images have different colour depths, and VisionPro Deep Learning GUI found 326 anomalous images. Training could have been done by keeping the anomalous images in the dataset, but it might have reduced the overall F score of the model. Therefore, we normalised the colour depth of all COVIDx images to 24-bit colour depth using external software, IrfanView (open source: irfanview.com). Then, the images were added into the VisionPro Deep Learning GUI. No other pre-processing steps are necessary with VisionPro Deep learning, such as image augmentation or setting class weights or oversampling of the imbalanced classes, which are necessary for the training the other open-source CNN models. Once the images are fed into the VisionPro Deep learning GUI, they are ready to be trained. images. All images belong to the training set of the COVIDx dataset [17] (b) Open Source Convolutional Neural Network (CNN) Models Before training the CNN models, such as VGG [13] or DenseNet [15] , it was necessary to execute some pre-processing steps, such as resizing, artificial oversampling of the classes with fewer images, image standardisation and finally data augmentation. First, the images were resized to 256 × 256 pixels. The entire training was done on an Nvidia 2080 GPU, as this was found to be the perfect image size to not run into 'GPU memory errors. Once the images were resized, and the images and labels loaded together, it was necessary to oversample images which belong to the classes having fewer images, that is, for the Non-COVID-19 and COVID-19 classes. For oversampling, random artificial augmentations were carried out, such as rotation (− 20° to + 20°), translation, horizontal flip, Gaussian blur, and adding external noise. All of these were applied randomly using the 'random' library in python. Then, all of the X-ray images were standardised to have values with a mean of zero and a standard deviation of 1. This was done, keeping in mind that standardisation helps the Deep Learning network to learn much faster. Finally, data augmentation was added to all classes, irrespective of the number of images belonging to those classes. Augmentations include rescaling, height and width shifting, rotating, shearing and zooming. After all of these pre-processing steps, the images were ready to be fed into the deep neural networks. The goal of the study was the classification of normal, non-COVID-19(Pneumonia) and COVID-19 X-ray images. For classification, VisionPro Deep Learning uses the Green tool. Once the images were loaded and labelled, they were ready for training. In VisionPro Deep Learning, the Region of Interest (ROI) of the images can be selected. Thus, it is possible to reduce the edges by 10-20% to remove artefacts like the letters or borders, which are usually at the edges of the images. In this case, the entire images were used without cropping the edges because many images have the lungs towards the edge, and we did not want to remove essential information. To feed the images into VisionPro Deep learning, the images did not need to be resized. Images of all resolutions and aspect ratios can be fed into the GUI, and the GUI does the pre-processing automatically before starting the training. In VisionPro Deep Learning, the Green tool has two subcategories, High-detail and Focussed. Under High-detail there are several options such as sizes of model architecturessmall, normal, large and extra-large models, which can be selected for training the model. We train the network using the High-detail subcategory and selecting the 'Normal' size model. Out of the 13,675 images, 80% of the images are used for training. The VisionPro Deep Learning suite automatically selects the other 20% images for validation. Both the training and validation sets are randomly selected by the VisionPro Deep Learning suite. The user just needs to specify the trainvalidation split. The maximum number of epoch counts was selected to be 100. There are options of selecting the minimum epochs and patience for which the model will train, but this was not selected. Once these are selected, training is started by clicking on the 'brain' icon on the green tool, as seen in Fig. 3 . The Green tool is used to classify entire X-ray images, but for the identification of images of COVID-19, the Deep Learning model needs to focus on the lungs, and not the peripheral bones, organs and soft tissues. The model must make its predictions exclusively based on the lungs and not on the differences in spinous process, clavicles, soft tissues, ornaments worn around the patient's neck or even the background. This way we can be sure that the model is classifying based entirely on the normal and infected lungs. Therefore, segmentation of the lungs from each image makes sure that the model trains only on these segmented lungs, and not on the entire image. To implement this, the VisionPro Deep Learning Red tool is used. The Red tool is used to segment the images, such that only the lungs are visible to the Deep Learning model for training. To achieve this, 100 images of the training set are manually masked using the 'Region selection' option in the Red Tool. The training set consists of 13,675 images, but a manual masking of 100 images is enough to train the model. Once the manual masking is done on the 100 images, the Red tool is trained. After training is completed, the VisionPro Deep Learning GUI has all the training and test images properly masked, as seen in Figs. 4 and 5, such that only the lungs are visible. Anything outside the lungs is treated as outside the ROI and is not used in classification. The Red tool is added in the same environment as the previous green tool and there is no need to create a new instance for segmentation. Once all of the images are segmented, a Green classification tool is implemented after the Red tool. The Green tool is then used to start the classification (similar to Step 3 of "Method"), but this time exclusively on the segmented lungs and not on the entire images. Page 6 of 16 The VGG [13] network is a deep neural network and is still one of the state-of-the-art Deep Learning models used in image classification. We used the 19-layer VGG 19 model for training using transfer learning on the COVIDx dataset. VGG takes an image of size 224 × 224 pixels. Pre-processing of the images was performed automatically by calling 'preprocess input' from the VGG19 model in TensorFlow. The 'preprocess input' is fed into the 'ImageDataGenerator' from TensorFlow (Keras). 'ImageNet' weights are used for training. The COVIDx dataset was also resampled as stated in 2 (b) of the "Method". This ensures that all classes have similar number of images, to avoid the model favouring a particular class during training. The VGG19 architecture uses 3 × 3 convolutional filters which performs much better than the older AlexNet [23] models. All of the activation functions used in the hidden layers are ReLU (Rectified Linear Units) [24] . After the VGG architecture, we add four fully connected layers with 1024 nodes each. All four layers use the ReLU activation function and L2 regularization [25, 27] . To provide better regularisation, after each of these layers a Dropout is set. The final layer is a fully connected layer of three nodes for the classification of the 3 classes. The final layer has the activation function 'SoftMax' [27] . In the pre-processing steps, the labels of the images are not 'one-hot encoded' but kept as three distinct digits. So instead of using 'categorical cross entropy' [28] , which is commonly used when the labels are one-hot encoded, the 'sparse categorical cross entropy' is used as the loss function. 'Adam' [29] optimiser is used with learning rate scheduling, such that the learning rate decreases after every thirty epochs. During training, several call-backs are set, such as saving the model each time the validation loss decreases, and using early stopping, to stop the training when the validation loss does not improve even after several epochs. The epoch count is set to 100. For training, batches of 32 images are fed to the model at once. Once all of these hyperparameters are set, training of the model is started. After training completion, the programme is set to plot the confusion matrix and give results on the various evaluation metrics, based on which the various models are compared. One of the bottlenecks of the VGG network is that it does not go too deep as it starts losing generalisation capability the deeper it goes. To overcome this problem ResNet or Residual Network [14] is chosen. The ResNet architecture consists of several residual blocks with each block having several convolutional operations. The implementation of skip connections makes the ResNet better than VGG. The skip connections between layers add the outputs from previous layers to the outputs of the stacked layers. This allows the training of deeper networks. One of the problems that ResNet solves is the vanishing gradient problem [30] . For training the COVIDx dataset, we use the 50-layer ResNet50V2 (Version 2) architecture. We use transfer learning to train the model, and then add eight fully connected layers with L2 regularisation followed by Dropouts for better regularisation. All of the other settings and hyperparameters are kept similar to the training of the VGG19 network ("Method", part 5). DenseNet (Dense Convolutional Network) [15] is an architecture which focuses on making the Deep Learning networks go even deeper, while at the same time making them more efficient to train, using shorter connections between the layers. DenseNet is a convolutional neural network where each layer is connected to all other layers that are deeper in the network, that is, the first layer is connected to the 2nd, 3rd, 4th and so on, the second layer is connected to the 3rd, 4th, 5th and so on. Unlike ResNet [14] , it does not combine features through summation but combines the features by concatenating them. So, the 'i th ' layer has 'i' inputs and consists of feature maps of all its preceding convolutional blocks. It therefore requires fewer parameters than traditional convolutional neural networks. To train the COVIDx dataset, we use the 121 layered DenseNet121 architecture. We use transfer learning to train the model, and then add eight fully connected layers with L2 regularisation followed by Dropouts for better regularisation. All the other settings and hyperparameters are kept similar to the training of the VGG19 network ("Method", part 5). Inception Network [31] has been developed with the idea of going even deeper with convolutional blocks. Very deep networks are prone to overfitting, and it is hard to pass gradient updates throughout the entire network. Also, images may have huge variations, so choosing the right kernel size for convolution layers is hard. To address these problems, Inception network is one of the best possible networks. Inception network version 1 has multiple sizes of filters in the same level. It has various connections of three different sizes of filters of 1 × 1, 3 × 3, 5 × 5, with max pooling in a single inception module. All of the outputs are concatenated and then sent to the next inception module To train the COVIDx dataset, we use the 48 layered InceptionV3 [32] architecture, which also includes 7 × 7 convolutions, Batch Normalisation and Label smoothing in addition to the Inception version 1 modules. We use transfer learning to train the model, and then add eight fully connected layers with L2 regularization followed by Dropouts for better regularization. All of the other settings and hyperparameters are kept similar to the training of the VGG19 network ("Method", part 5). In medical imaging, since the decisions are of high impact, it is very important to understand exactly which evaluation metrics are necessary to decide whether a model works on a patient or not. Accuracy of a model is not the best metric for deciding whether the model is fit for a patient. Rather, it is important to look into other evaluation metrics such as sensitivity, predictive values and overall F-scores. First, the confusion matrix is plotted for the 300 test images, for all of the models that we use for the comparison. Figure 6 shows the confusion matrix of VisionPro Deep Learning on the entire ROI. VisionPro Deep learning GUI does not display numbers of correctly classified or misclassified images on the confusion matrix, but if any point on the confusion matrix is clicked, it displays not only the number of images in that category, but also all the images Fig. 4 Lungs masked using the Red Tool. 100 such images are manually masked. Then, the Red tool is trained. This helps to mask all of the images in the training set and are used later for classification belonging to that category, with the prediction percentage and whether the prediction it made is correct or not. Below the confusion matrix, it displays all of the evaluation metrics of recall (sensitivity), precision (positive predictive value) and F-scores. This table contains the number of labelled images, which shows the number of images in each class in the test set. The 'Found' column shows the number of images that VisionPro Deep Learning thinks should belong in those classes. A report can also be generated on all of the test images, as seen in Fig. 7 and 8 . It VisionPro Deep learning GUI does not display numbers of correctly classified or misclassified images on the confusion matrix, but if any point on the confusion matrix is clicked, it displays not only the number of images in that category, but also all of the images belonging to that category, with the prediction percentage and whether the prediction it made is correct or not shows a small snippet of six COVID-19 positive images from the test set. The report contains details of the 300 test images, including the filename, the image, the original label as 'Labelled', and the predictions made by Vision-Pro Deep learning as 'Marked', with the percentage of confidence of prediction on each class. If the prediction is different from that of the label, then it is marked in red. Of the 300 test images, Vision-Pro Deep Learning classified 18 images incorrectly with the entire ROI selected, and 16 images incorrectly with the segmented lungs. COVID-Net had classified 20 images incorrectly [17] . VGG19 [13] , ResNet50V2 [14] , Densenet121 [15] , and InceptionV3 [32] networks made 47, 37, 41, 26 misclassifications, respectively. VisionPro Deep Learning has fewer misclassifications than all of the open-source models in both the settings. Compared to COVID-Net [17] , the performance of VisionPro Deep learning is similar with the entire image as the ROI and much better when using the segmented lungs . Heatmaps are a great way to visualise predictions of the Deep Learning algorithm. They highlight exactly which parts of the image trigger the model to generate its predictions. Figure 9 shows the heatmaps generated by VisionPro Deep Learning on six COVID-19 images. Confidence interval A confidence interval is a range of values that we are fairly sure the true value always lies in. Since the number of images in the test set was so small, with only 100 images in each class, we saw high confidence intervals in most of the cases, both with the open-source models, and with VisionPro Deep Learning. The best possible way to reduce the confidence interval is to increase the number of images in the test set, by a range, which lies somewhere in the thousands and not in the hundreds. Since the number of COVID-19 images was very small, and we wanted to make a one-to-one comparison with the results of COVID-Net [17] , we used the same number of images provided in the test set of the COVIDx dataset. We calculated a 95% confidence interval on the predicted sensitivity and the positive predicted values, to determine a possible range of values by which the actual results may vary on the given test data. The confidence interval of the accuracy rates is calculated using the formula: where z is the significance level of the confidence interval (the number of standard deviation of the Gaussian distribution), accuracy is the estimated accuracy (in our cases sensitivity, positive predictive value, and F score), and N (100 for each class) denotes the number of samples for that class. Here, we used the 95% confidence interval, for which the corresponding value of z is 1.96 [34] . For normal and COVID-19 classes, VisionPro Deep Learning significantly outperforms all other models, as seen in Table 1 . For the Non-COVID-19 class COVID-Net [17] has the best results. VisionPro Deep Learning has a really good sensitivity to COVID-19 images: 95% for the images with the entire ROI selected, and 97% for images with the lungs segmented. Also, both settings of VisionPro Deep Learning showed 98% sensitivity for images belonging to the normal class. Positive predictive value (PPV) or Precision shows the percentage of how many predictions selected by the model are relevant. As seen in Table 2 , DenseNet121 [15] has the best PPV for Normal images, VisionPro Deep Learning has the best PPV for Non-COVID-19 images and COVID-Net [17] has the best PPV for COVID-19 images. Although it is not the best in comparison, VisionPro Deep Learning still has a high PPV value for COVID-19 images, with 96.9% for the images with the entire ROI selected, and 97.0% for the images with the lungs segmented. F score takes into consideration both the Sensitivity and PPV of a model. It can be considered as an overall score of the performance of the model. Table 3 , of the open-source architectures, InceptionV3 [32] has the best F score for all the three classes. When compared to InceptionV3, COVID-Net has a higher F score in the Non-COVID-19 and COVID-19 classes but is slightly lower in the Normal class. are very close, with COVID-Net [17] having an F score of 92.6% and VisionPro Deep Learning having an F score of 92.2%. The setting with segmented lungs, VisionPro Deep Learning outperforms all the open-source models, COVID-Net [17] and even itself with the entire ROI selected. On all classes, it has the highest F score. It gets an F score of 95.6% for normal images, 93.3% for non-COVID-19/Pneumonia images and an F score of 97.0% for the COVID-19 images. In this setting, VisionPro Deep Learning is only classifying based on the lungs, so there are no artefacts, and the results obtained are highly focussed. This helps to overcome the black-box idea of Deep Learning results. VisionPro Deep Learning has the best F-scores on COVID-19 images for both of the settings. On the entire ROI it has an F score of 96.0% and on the segmented lungs it has an F-scores of 97.0%. Overall, for all three classes, VisionPro Deep Learning achieves an F score of 94.0% for the entire image as the ROI, and an F score of 95.3% for the segmented lungs. The similarity of the results in both the settings and the heatmaps, show that even without the lungs being segmented, VisionPro Deep Learning is still predicting its classes based on the actual abnormalities. Figures 10 and 11 show the confusion matrix of the various open-source models and COVID-Net [18] , respectively. As expected, when comparing the confidence intervals, none of the models perform well due to the significantly lower number of images in each class in the test set. We also tested VisionPro Deep Learning on a previous version of the COVIDx dataset, which has a total of 15,374 images. They were split into the following number of images in the training set: 7965 images in the normal class, 5459 Fig. 7 A snippet of the report generated on the 300 test images by VisionPro Deep Learning with the entire image selected as the ROI. The report contains the confusion matrix with the evaluation metrics: sensitivity (recall), positive predictive value (precision) and F score for each class. The test images are also shown with the correct labels, the predicted labels and the confidence percentage of each class. In this image, 5 images are classified correctly, and 1 image misclassified (marked in red) Table 4 , due to the significantly higher number of images in the Normal and Non-COVID-19 class, the confidence interval significantly improves from the previous values ranging from 3-5% to just 1.0-2.4%. Figure 12 shows the results on this dataset on Cognex VisionPro Deep Learning. These results clearly indicate that as the number of test images increased, the confidence interval improved significantly. Similarly, as this dataset had only 91 images in the COVID-19 class, the confidence interval was similar to the previous results. Also, Table 4 indicates that even when the number of images in the test set is significantly increased, the performance of VisionPro Deep Learning does not decrease, but rather still produces sensitivity, PPV and F-scores above 90% in all of the classes. If Table 4 is compared with the previous results, it can be seen that the results are very consistent for VisionPro Deep Learning, even with a change in the number of images in the training and test sets. Also, the results for the sensitivity, PPV and F-scores are very similar with the entire image as the ROI and also for the segmented Fig. 9 Four COVID-19 X-ray images from the test set of the COVIDx dataset along with the predicted heatmaps generated by VisionPro Deep Learning. Heatmaps can be a great indicator for radiologists to identify whether the predictions made by the Deep learning algorithm is based on actual infection or some artefacts Various other studies have been undertaken for the detection of COVID-19 from radiological images. One such study implements the idea of using Active Learning (AL), which implements Incremental Learning (IL), allowing the algorithm to self-learn over time in the presence of experts [34] . The aim of the study was to create a model which iteratively learns and adapts to new data without forgetting what it has previously learnt. Another study showed how their network performed equally well for both X-ray and CT images [35] . The study designed its own deep learning architecture which was trained on 336 Chest X-ray and 336 CT scan images. It reached a sensitivity of 97% and a precision of 94% on the dataset. A truncated form of the Inception network [36] achieved an accuracy of 99.96% while classifying COVID-19 positive cases from combined pneumonia and healthy cases, and an accuracy of 99.92% when classifying COVID-19 cases from combined pneumonia, tuberculosis and healthy chest X-rays. CoroNet [37] , which is based on the Xception [38] architecture, was trained on another X-ray dataset and compared with COVID-Net [17] . CoroNet [37] achieved an accuracy of 89.6% on the dataset, while COVID-net [17] achieved an accuracy of 83.5%. COVID_MTNet [39] is another architecture which classifies and segments both Chest X-rays and CT scan images and obtains an accuracy of 94.67% on Chest X-rays and 98.78% on Chest CT scan images. In some cases, generative adversarial networks (GANs) [40] , such as CycleGAN [41] , were used to augment the minority class of COVID-19 images [42] . Several networks have also been designed to forecast the growth and spread of COVID-19 [43] . In fact, several books have been published which showcase systems and methods to prevent the further spread of COVID-19 using artificial intelligence, computer vision and robotics [44, 45] . In this study, we used COGNEX's Deep Learning Software VisionPro Deep Learning (version 1.0) and compared its performance with other state-of-the-art Deep Learning architectures. VisionPro Deep Learning has an intuitive GUI, making the software very easy to use. Building applications requires no coding skills in any programming language, and little to no pre-processing is required, also decreasing the Fig. 10 Confusion matrix on the 300 test images for the open-source architectures a VGG19 [14] , b Resnet50 V2 [15] , c Densenet121 [16] , d Inception V3 [33] . Inception V3 has the best results, with the lowest number of false predictions. ResNet50 V2 has the next best result, followed by DenseNet121 and VGG19, respectively Confusion matrix on the 300 test images for COVID-Net. Image from the original COVID-Net paper [17] . COVID-Net results are better than all of the open-source models that we use for training development time. Imbalanced data are automatically balanced within the software. Once the images are loaded into VisionPro Deep Learning and the correct tool is selected, the Deep Learning training can start. After the completion of training, it outputs a confusion matrix, along with various important metrics, such as precision, recall and F score. Additionally, a report can be generated that identifies all misclassified images. This makes it particularly suitable for radiologists, hospitals, and research workers to harness the power of Deep Learning without advanced coding knowledge. Moreover, as the results from this study indicates, the Deep Learning algorithms in VisionPro Deep Learning are robust and comparable or even better than the various state-of-the-art algorithms available today. The problem of Deep Learning algorithms being a "black box" can be overcome using a pipeline of tools, stacked sequentially to first segment the lungs and then classify only on the segmented lungs. It is like combining a U-Net [16] and Inception [31] model together. This ensures that the algorithm does not focus on any artefacts when generating its classification results. A heatmap can be generated to showcase exactly where the model is focussing when making the predictions, and with both settings, using the entire image as the ROI and classification on the segmented lungs, VisionPro Deep Learning achieves the highest overall F-scores, surpassing the results of the various open-source architectures. In the future, more testing will be performed to understand how changing the number of training images or using augmentations in the training set affects the performance of VisionPro Deep Learning compared to the other opensource models. The software also gave F-scores of 99% on identification of COVID-19 from CT images [46] . This software is by no means a stand-alone solution in the identification of images of COVID-19 from Chest X-rays, but can help radiologists and clinicians to achieve a faster and understandable diagnosis using the full potential of Deep Learning, without the prerequisite of having to code in any programming language. Clinical features of patients infected with 2019 novel coronavirus in Wuhan Clinical characteristics of 138 hospitalized patients with 2019 Novel Coronavirus-infected pneumonia in Wuhan Imaging profile of the COVID-19 infection: radiologic findings and literature review Clinical characteristics of coronavirus disease 2019 in China COVID-19 detection using chest X-ray Squire's fundamentals of radiology Pathogenesis of COVID-19 from a cell biology perspective SARS-CoV replicates in primary human alveolar type II cell cultures but not in type I-like cells Influenza A virus's target type II pneumocytes in the human lung Chest CT findings in patients with coronavirus disease 2019 and its relationship with clinical features Portable chest X-ray in coronavirus disease-19 (COVID-19): a pictorial review Very deep convolutional networks for large-scale image recognition Deep residual learning for image recognition Densely connected convolutional networks U-net: convolutional networks for biomedical image segmentation COVID-Net: a tailored deep convolutional neural network design for detection of COVID-19 cases from chest X-ray images COVID-19 image data collection Figure 1 COVID-19 chest X-ray data initiative Actualmed COVID-19 chest x-ray data initiative RSNA pneumonia detection challenge Can AI help in screening viral and COVID-19 pneumonia? ImageNet classification with deep convolutional neural networks Improving deep neural networks for LVCSR using rectified linear units and dropout L2 regularization for learning kernels L2 regularisation versus batch and weight normalisation Deep learning Generalized cross entropy loss for training deep neural networks with noisy labels Adam: a method for stochastic optimisation Residual networks behave like ensembles of relatively shallow networks Going deeper with convolutions Rethinking the inception architecture for computer vision TensorFlow: a system for large-scale machine learning AI-driven tools for coronavirus outbreak: need of active learning and cross-population train/test models on multitudinal/multimodal data Deep neural network to detect COVID-19: one architecture for both CT scans and chest X-rays Truncated inception net: COVID-19 outbreak screening using chest X-rays CoroNet: a deep neural network for detection and diagnosis of COVID-19 from chest X-ray images Xception: deep learning with depthwise separable convolutions COVID-19 detection with multi-task deep learning approaches Generative adversarial nets Unpaired image-to-image translation using cycle-consistent adversarial networks COVID-19 detection and disease progression visualization: deep learning on chest X-rays for classification and coarse localization Forecasting models for coronavirus disease (COVID-19): a survey of the state-of-theart Intelligent systems and methods to combat COVID-19: Springer briefs in computational intelligence COVID-19: prediction, decision-making, and its impacts, book series in lecture notes on data engineering and communications technologies Identification of images of COVID-19 from chest computed tomography (CT) images using deep learning: comparing COGNEX VisionPro Deep Learning 1.0 software with open source convolutional neural networks We would like to thank COGNEX for providing their latest Deep Learning software for testing, and the University of Waterloo, along with Darwin AI for collecting and merging the X-ray images from various sources and for providing the python scripts for generating the COVIDx dataset.