key: cord-029898-2t9mif9k authors: Chaganti, Shikha; Grenier, Philippe; Balachandran, Abishek; Chabin, Guillaume; Cohen, Stuart; Flohr, Thomas; Georgescu, Bogdan; Grbic, Sasa; Liu, Siqi; Mellot, François; Murray, Nicolas; Nicolaou, Savvas; Parker, William; Re, Thomas; Sanelli, Pina; Sauter, Alexander W.; Xu, Zhoubing; Yoo, Youngjin; Ziebandt, Valentin; Comaniciu, Dorin title: Automated Quantification of CT Patterns Associated with COVID-19 from Chest CT date: 2020-07-29 journal: Radiol Artif Intell DOI: 10.1148/ryai.2020200048 sha: doc_id: 29898 cord_uid: 2t9mif9k PURPOSE: To present a method that automatically segments and quantifies abnormal CT patterns commonly present in coronavirus disease 2019 (COVID-19), namely ground glass opacities and consolidations. MATERIALS AND METHODS: In this retrospective study, the proposed method takes as input a non-contrasted chest CT and segments the lesions, lungs, and lobes in three dimensions, based on a dataset of 9749 chest CT volumes. The method outputs two combined measures of the severity of lung and lobe involvement, quantifying both the extent of COVID-19 abnormalities and presence of high opacities, based on deep learning and deep reinforcement learning. The first measure of (PO, PHO) is global, while the second of (LSS, LHOS) is lobe-wise. Evaluation of the algorithm is reported on CTs of 200 participants (100 COVID-19 confirmed patients and 100 healthy controls) from institutions from Canada, Europe and the United States collected between 2002-Present (April 2020). Ground truth is established by manual annotations of lesions, lungs, and lobes. Correlation and regression analyses were performed to compare the prediction to the ground truth. RESULTS: Pearson correlation coefficient between method prediction and ground truth for COVID-19 cases was calculated as 0.92 for PO (P < .001), 0.97 for PHO (P < .001), 0.91 for LSS (P < .001), 0.90 for LHOS (P < .001). 98 of 100 healthy controls had a predicted PO of less than 1%, 2 had between 1-2%. Automated processing time to compute the severity scores was 10 seconds per case compared to 30 minutes required for manual annotations. CONCLUSION: A new method segments regions of CT abnormalities associated with COVID-19 and computes (PO, PHO), as well as (LSS, LHOS) severity scores. Coronavirus disease is an infectious disease affecting more than 2 400 000 individuals worldwide as of April 20, 2020 (1). The disease is caused by the severe-acute respiratory syndrome coronavirus 2 (SARS-Cov2) and has a high case fatality rate of up to 4% (2) with more than 170 000 deaths reported currently around the globe. Due to a high reproduction number and infectious nature of the disease, tools for rapid testing and evaluation are vital to track and mitigate the spread. COVID-19 infection is confirmed by a reverse transcription polymerase chain reaction (RT-PCR) (3) . However, the sensitivity of this test may be as low as 60-70% (4) and can result in false negatives within first few days of infection. The role of CT for diagnosis is currently being debated, however two preliminary studies (5, 6) showed that chest CT imaging of the lung provides improved sensitivity when compared to RT-PCR for patients suspected to have COVID-19. These studies show that non-contrast chest CT has been useful to detect, quantify severity, and assess progression of the disease. The primary features observed on a lung affected by COVID-19 are peripheral focal or multi-focal ground glass opacities, consolidation, and crazy-paving patterns. In another study, CT scans were evaluated from patients that tested positive for COVID -19 across 552 hospitals in China (7) . They found that 84.4% of patients exhibiting mild symptoms showed CT changes, whereas 94.6% of patients exhibiting severe symptoms showed CT changes. In a study of 121 patients, Bernheim et al showed that CT changes correlated with disease severity, with increasing lung involvement and abnormalities as the disease progressed (8) . A recent radiologist performance study also points to preliminary evidence that radiologists can discriminate COVID-19 from other types of pneumonia (9) . The study showed that seven United States and Chinese radiologists were able to identify COVID-19 from other types of viral pneumonia with a median sensitivity of 73% and a median specificity of 93% (9) . The studies so far have been performed on patients either suspected or confirmed I n p r e s s to be tested positive of COVID-19, therefore it is too early to state that CT can be used as an effective diagnostic or screening tool for evaluation of COVID-19 in general population. However, most patients with COVID-19 do seem to exhibit CT abnormalities and the extent of these changes seem to correlate with disease severity. Recently, the Fleischner society provided recommendations for use of chest imaging in evaluation of COVID-19 in regions with resource constraints especially for patients with worsening respiratory status, suspected false negative PCR, and more severe infection. Elsewhere, they indicated that chest imaging can be used to establish a baseline for moderate to severe cases (10) . As a result, CT imaging is being adapted as a tool to evaluate COVID-19 in emergency rooms across the world, where other resources for testing are scarce. We propose an automated system that can quantify the abnormalities most often seen in COVID-19 (ie ground glass opacities and consolidations). The system outputs regions of abnormality demarcated in a three-dimensional CT scan and output two combined severity measures of the disease: (a) (percentage of opacity, percentage of high opacity) or (PO, PHO) and (b) (lung severity score, lung high opacity score) or (LSS, LHOS). The first measure (PO,PHO) indicates the overall disease spread relative to the volume of the lungs, while the second measure (LSS, LHOS) is a cumulative measure of the extent of involvement for each lung lobe ( Figure 1 ). Opacities here include GGO, consolidation, and crazy-paving patterns which are all commonly observed in COVID-19. Given that higher opacity abnormalities such as consolidation were shown to correlate with severe symptoms, the measures cover both the extent of disease and the presence of high opacities (7, 11) . In this study, we use the term high opacity to mean consolidation of sub-solid regions of high attenuation. We use this term since these regions were determined using an empirical threshold. The ability to determine the extent of abnormalities and quantify the presence of high opacities provides valuable insight for prognosis prediction, risk prioritization, and therapy response (12) . All the authors have either been employed or partially supported by Siemens Healthineers. The datasets were de-identified and retrospectively collected from multiple centers in USA, Canada, and Europe, between 2002 to present (April 2020) with the data usage being approved by the respective ethics committees, with waived consent. A subset of the datasets are from NLST (13) and COPDgene (14) . Non-contrasted chest CT datasets that completely show lungs in their field of view were used to train two models: one for lung and lobe segmentation and another for abnormality segmentation. Complete details of the datasets are provided in Tables 1 and 2, including age and sex distribution, CT scanner manufacturer, slice thickness, and kernel type. A total of 9749 CT volumes were used in the study. A total of 200 volumes were reserved for testing, and the remaining were used for lung and lesion segmentation training and model selection. A total of 901 volumes were used for training the abnormality segmentation model. Due to limited availability of COVID-19 data, we determined three groups of data that are useful for training the abnormality model: COVID-19 datasets (n = 431), pneumonia datasets (n = 174) and other interstitial lung diseases (ILDs, n = 274). Note that 575 datasets were common to the lung segmentation and lesion segmentation training sets. The distribution of the datasets for each task is visualized in Figure 2 . Annotation details for the two tasks are presented in Appendix E1 (supplement). A separate test set of 100 control volumes and 100 COVID-19 positive volumes were used to evaluate lung and lobe segmentation as well as abnormality segmentation. The control group consisted of volumes without CT abnormalities that were randomly sampled and visually inspected by a radiologist. There were 110 COVID-19 positive cases that were randomly selected from two European and two institutions from the United States. Of these, 10 I n p r e s s cases were excluded for having very low percentage of abnormalities based on ground truth annotations. The algorithm generates lung and lobe segmentation mask on a given CT data set. First, multi-scale deep reinforcement learning (15) is used to robustly detect anatomical landmarks in a CT volume. The carina bifurcation is used to identify the lung region of interest (ROI). When the lung ROI is not detected by the system, the sternum tip is used as a landmark. The size and the relative location of the lung ROI to the landmark are specified according to annotated lung datasets. Next, the lung ROI image is resampled to a 2 mm isotropic volume and fed into a Deep Image-to-Image Network (DI2IN) (16) to generate the lung segmentation. Finally, the ROI segmentation mask is remapped to have the same dimension and the resolution as the input data. The DI2IN has been first trained on CT scans from a large group of patients with various diseases (miscellaneous in Table 1 ), then fine-tuned with scans with abnormal patterns including ( ILDs, pneumonia, and COVID-19 in Table1), to improve the robustness of the lung segmentation over the infected areas. We formulated the segmentation of the COVID-19 related abnormality patterns as a semantic segmentation problem involving binary classes. A DenseUNet (17) with anisotropic kernels is trained to transfer a three-dimensional (3D) chest CT volume to a semantic segmentation mask of the same size. We use a single label to define all voxels within the lungs that fully or partially contain ground glass opacity or consolidations as the positive voxels (see Annotation Details in Appendix E1). The rest of the image areas within the lungs and the entire area outside the lungs are defined as negative. The network is trained as an end-to-end segmentation system. The 3D output segmentation is filtered by the lung I n p r e s s segmentation as described above. The technical details of the network architecture and the training strategy are summarized in Appendix E1 (supplement). The severity of airspace abnormalities in each subject is measured by the combined metrics (PO, PHO) and (LSS, LHOS) as defined below. Percentage of Opacity (PO): The PO is calculated as the total percent volume of the lung parenchyma that is affected by disease: The aim of the study is to show that abnormalities associated with COVID-19, namely GGO and consolidation, are segmented in a given CT scan, and to also demonstrate that there are no false positive segmentations when there are no abnormalities. In order to do this, we show that ground truth matches the predicted scores of severity using the following statistical tests. We compute the chi-squared contingency test (18) In order to further evaluate the linear relationship between the ground truth and predicted metrics, we compute the linear regression estimates for metrics which have continuous values (PO and PHO), as follows: We report the following: , the estimate of the intercept of the linear regression, along with the confidence interval. , the estimate of the slope for the linear regression, along with the confidence interval. Coefficient of determination or R-squared value. Mean squared error of the metric from the regression line. We used 8792 chest CT volumes for training the 3D segmentation of lung and lobes. The DI2IN was first trained on a data set from 8087 patients (8006 for training and 81 for validation) without the prevalence of viral pneumonia, then it was fine-tuned on another data set from 1136 patients (1076 for training and 60 for validation) with abnormality patterns including ground glass opacities, consolidation, effusions, masses, and others to improve the robustness of the lung and lobe segmentation over the infected areas. The DI2IN was finally tested on a data set from 200 patients, consisting of 100 COVID-19 and 100 controls (13) . The sample size and the data partition of the COVID-19 positive patients are determined by balancing the patients from the US and Europe (see Table 1 ). test between ground truth and predicted metrics shows that there is no significant difference between ground truth and predicted metrics for clinically relevant binning of the measures (see Table 3 ). 98 of the 100 healthy controls have a predicted PO less than 1%. 2 of them have PO detected between 1-2%. Figure 4 shows the scatter plot between ground truth and predicted metrics PO, PHO, LSS, and LHOS. Figure 5 shows the I n p r e s s combined measures of (PO, PHO) as well as (LSS, LHOS) in two-dimensional plots. The diagonal in Figure 5 (a) indicates that PO is equal to PHO. A subject's marker falling on the diagonal would indicate that 100% of the abnormalities present on their lung CT are high opacity or consolidation. Note that all the markers for abnormalities greater than 0 fall below the line, indicating various levels of consolidation present. The closer a subject's marker is to the diagonal, the more consolidation there is in their lungs. In Figure 5 (b), the diagonal indicates that LSS is equal to LHOS. A marker on the diagonal here indicates that all the lobes that are involved have high opacity. When a patient's marker falls below the diagonal, it means that either some lobes do not have high opacities or that they have less extent of high opacities. In addition to correlation, we also present estimates of the linearity of the relationship between ground truth and predicted metrics for PO and PHO for COVID-19 positive cases, as shown in Table 4 . The intercepts are 0.03 and 0.004, and the slopes are 0.839 and 0.805 for PO and PHO respectively. The mean error between ground truth and predicted PO and PHO are 3.5% and 1.9% respectively. Figure 6 shows the qualitative results for COVID-19 positive cases along with their scores. Figure 6 (a) shows six cases where the automated segmentation of the algorithm matches the ground truth closely. For example, the upper left corner case resulted in a (PO, PHO) = (55.9%, 7.6%). That means that 55.9% of the lung volume is affected by disease and 7.6% of the lung volume has high consolidation abnormalities. Figure 6 (b) shows one example of abnormality segmentation (left) along with high opacity regions identified through thresholding (right). Figure 6 (c) shows two cases which were identified as outliers due to their PO score differing by more than 20% from the ground truth. The algorithm misses fainter areas of opacity (left) and tends to over segment into normal areas between the abnormal opacities (right). In this work we proposed a method to automatically quantify regions with ground glass opacity and consolidation in chest CT scans using deep learning algorithms. The abnormalities segmented by our system are the most common chest CT findings in COVID-19. The system outputs the 3D contours of the CT abnormalities and computes two combined severity measures of (PO, PHO) and (LSS, LHOS). Automated characterization of disease can be extremely useful to facilitate rapid response in assessing the extent of lung infection and progression of the disease in patients exhibiting symptoms of COVID-19. Evaluation of a chest CT can be used to answer several questions in a clinical setting such as triage of a patient, diagnosis, assessment of severity and progression, and response to therapy. Recently, lung severity measures have been proposed for COVID-19 that demonstrate correlation to disease severity and progression (8, (22) (23) (24) (25) . Manual calculation of these scores produces a two-fold problem: either the affected regions are annotated precisely, which is a time-consuming process, or the affected regions are assessed subjectively, which translates into low reproducibility of the assessment. A precise automated measurement of lung severity scores will address both the speed and reproducibility problems. In addition to quantitative outputs, the abnormality segmentation produced by the system can also be used for visual interpretation and inspection. Progression assessment.-In several instances, the patient might be imaged multiple times to assess disease progression or to assess response to drugs. In order to make these decisions, an objective metric or a score that can assess the extent of disease is vital. The (PO, PHO) and (LSS, LHOS) scores can be useful when evaluating a patient's lung status over time. Such a tool could also be used in scientific research studies that are documenting the temporal radiological change of COVID-19 pneumonia and clinical trials studying drug responses. Early detection.-Although the evidence is preliminary, the system could be used as an effective tool in conjunction with RT-PCR tests to increase overall sensitivity of COVID-19 detection, especially in scenarios where there are limited resources and a crunch for time. An automated system that can perform this disease severity assessment can be of great use to hospitals that have a shortage of manpower combined with a high-volume of patients during the outbreak (12) . To our knowledge this is one of the first systems to provide an automatic, efficient, and detailed evaluation of tomographic CT patterns associated with COVID-19. We also believe that this is the first system to propose and show how to automatically compute severity measures such as (PO, PHO) and (LSS, LHOS) that characterize not only the extent of disease in the lungs, but also the extent of high opacity abnormalities, both globally and lobe-wise. This performance is facilitated by the experience of a multidisciplinary team of clinicians, machine learning experts, and engineers, whose combined clinical and technical knowledge helped to solve robust landmark detection in chest CT, 3D lung and lobe segmentation, and 3D segmentation of abnormalities associated with COVID-19. The system was trained with specific abnormalities that were relevant to COVID-19 such as ground glass capacities and consolidations. Presence of other abnormalities in or around the lung such as pleural I n p r e s s effusion could produce challenges to the algorithm. In the future, the system will be trained comprehensively to account for abnormalities other than GGO and consolidation. The algorithm will be improved using more training data in boundary conditions, which cause under segmentation with faint regions and over segmentation with highly abnormal regions as seen in Figure 6 (c). The slope and intercept in table 4 show that the AI system outputs a value that is slightly higher than ground truth value, indicating that the system needs to be calibrated further for accurate metric measurements. The system produces a quantification of abnormalities in a lung CT scan only. While we evaluate this quantification in COVID-19 and healthy cases, we do not evaluate it in other lung diseases such as other viral pneumonias or ILDs. It is difficult to ascertain the differences between diseases and diagnosis capability based on this information alone. In the future, we will evaluate the feasibility of distinguishing COVID-19 from other types of pneumonias and ILDs. We will conduct studies to correlate the imaging findings computed in this work to clinical factors and outcomes. A similar study can be performed with X-ray chest images as well, which will be a part of our future investigations. These additional studies can provide a comprehensive toolbox for clinicians to diagnose and treat COVID-19. Disclaimer: The concepts and information presented in this paper are based on research results that are not commercially available. We gratefully acknowledge the contributions of Clinica Universidad de Navarra, Health Time, Houston Methodist, and multiple other frontline hospitals to this collaboration. The authors also thank the COPDGene for providing the data. The COPDGene study (NCT00608764) was funded by NHLBI U01 HL089897 and U01 HL089856 and also supported by the COPD Foundation through contributions made to an Industry Advisory Committee comprised of AstraZeneca, BoehringerIngelheim, GlaxoSmithKline, Novartis, and Sunovion. We thank many colleagues who made this work possible in a short amount of time. Special recognition to Christian Eusemann, and Ahmed Halaweish, for the great effort to ensure proper data de-identification and compliance with local regulations, to Dr. Vishwanath RS and Dr. Eileen Krieg for their valuable input on CT data and their annotations, to Brian Teixeira and Sebastien Piat, who I n p r e s s were instrumental for implementing and managing the data infrastructure, and to Marlène Roblot, for her support with the manuscript editing. I n p r e s s COPDgene study (14) . For the ground truth of the abnormality, a single label mask included all areas of ground glass opacity and consolidation within both the lung fields. However, pleural effusion was excluded in the mask. For crazy paving pattern the mask excluded normal areas of the lung with precision contouring of the mask. The average time for annotation and review of the lung and lobe masks is 2 hours. First, the training images are resampled to the resolution 1 × 1 × 3mm. The image intensity is clipped using the standard lung window with the width 1500 HU and level −600 HU before being normalized to We use a UNet (17) resembling architecture with convolutional blocks containing either 1 × 3 × 3 or 3 × 3 × 3 CNN kernels to deal with the anisotropic image resolutions. The network architecture is illustrated in Figure 3 . The 3D input tensor is first fed into a 3D 1 × 3 × 3 convolutional layer followed by batch normalization and LeakyReLU. The features are then propagated to 5 DenseNet blocks (26) . For the first two DenseNet blocks, the features are downsampled by a 1 × 2 × 2 convolution with a stride of 1 × 2 × 2. These anisotropic downsampling kernels are designed to preserve the inter-slice resolution of the input tensors. The last 3 DenseNet blocks have isotropic downsampling kernels with the stride of 2 × 2 × 2. The input to each decoder block is obtained by concatenating the encoder output features with the same resolution and the features upsampled from the previous decoder. The upsampling kernels are built with transpose convolutional kernels with the sizes and strides same to the corresponding DenseNet blocks. The final network output is derived by projecting the feature maps to 2 feature channels and being activated by the softmax activation. The network parameters are randomly initialized. We use the Jaccard index as the training loss function. The loss function L(p, y) between the probability prediction tensor p and the ground truth tensor y is only computed within the precomputed lung segmentation as Where is the smoothing factor and · represents the tensor inner product operator. The loss function is optimized using Adabound (27) with an initial learning rate of 0.001. The final model used for testing is selected using a validation set with 10 % of the training set patients that is randomly sampled. The training and testing pipeline were implemented using Pytorch. JHU. Coronavirus COVID-19 Global Cases by the Center for Systems Science and Engineering Case-Fatality Risk Estimates for COVID-19 Calculated by Using a Lag Time for Fatality Essentials for radiologists on COVID-19: an update-radiology scientific expert panel Sensitivity of chest CT for COVID-19: comparison to RT-PCR Chest CT for Typical 2019-nCoV Pneumonia: Relationship to Negative RT-PCR Testing Clinical characteristics of coronavirus disease 2019 in China Chest CT findings in coronavirus disease-19 (COVID-19): relationship to duration of infection Performance of radiologists in differentiating COVID-19 from viral pneumonia on chest CT The Role of Chest Imaging in Patient Management during the COVID-19 Pandemic: A Multinational Consensus Statement from the Fleischner Society Chest CT findings in cases from the cruise ship "Diamond Princess Outbreak of novel coronavirus (COVID-19): What is the role of radiologists The national lung screening trial: overview and study design Multi-scale deep reinforcement learning for real-time 3D-landmark detection in CT scans Automatic liver segmentation using an adversarial image-to-image network U-net: Convolutional networks for biomedical image segmentation Multinomial goodness-of-fit tests SciPy 1.0: fundamental algorithms for scientific computing in Python A computer method for calculating Kendall's tau with ungrouped data Radiology Perspective of Coronavirus Disease 2019 (COVID-19): Lessons From Severe Acute Respiratory Syndrome and Middle East Respiratory Syndrome CT Imaging Features of 2019 Novel Coronavirus (2019-nCoV) Time Course of Lung Changes On Chest CT During Recovery From 2019 Novel Coronavirus (COVID-19) Pneumonia Relation Between Chest CT Findings and Clinical Conditions of Coronavirus Disease (COVID-19) Pneumonia: A Multicenter Study Chest CT Severity Score: An Imaging Tool for Assessing Severe COVID-19 Densely Connected Convolutional Networks. arXiv e-prints Adaptive Gradient Methods with Dynamic Bound of Learning Rate. arXiv e-prints