key: cord-0985403-l6eg0okd
authors: Yan, Chenggong; Wang, Lingfeng; Lin, Jie; Xu, Jun; Zhang, Tianjing; Qi, Jin; Li, Xiangying; Ni, Wei; Wu, Guangyao; Huang, Jianbin; Xu, Yikai; Woodruff, Henry C.; Lambin, Philippe
title: A fully automatic artificial intelligence–based CT image analysis system for accurate detection, diagnosis, and quantitative severity evaluation of pulmonary tuberculosis
date: 2021-11-29
journal: Eur Radiol
DOI: 10.1007/s00330-021-08365-z
sha: af0fca5d4a1b020f4de978af19c2641dc5c789c2
doc_id: 985403
cord_uid: l6eg0okd

OBJECTIVES: An accurate and rapid diagnosis is crucial for the appropriate treatment of pulmonary tuberculosis (TB). This study aims to develop an artificial intelligence (AI)–based fully automated CT image analysis system for detection, diagnosis, and burden quantification of pulmonary TB. METHODS: From December 2007 to September 2020, 892 chest CT scans from pathogen-confirmed TB patients were retrospectively included. A deep learning–based cascading framework was connected to create a processing pipeline. For training and validation of the model, 1921 lesions were manually labeled, classified according to six categories of critical imaging features, and visually scored regarding lesion involvement as the ground truth. A “TB score” was calculated based on a network-activation map to quantitively assess the disease burden. Independent testing datasets from two additional hospitals (dataset 2, n = 99; dataset 3, n = 86) and the NIH TB Portals (n = 171) were used to externally validate the performance of the AI model. RESULTS: CT scans of 526 participants (mean age, 48.5 ± 16.5 years; 206 women) were analyzed. The lung lesion detection subsystem yielded a mean average precision of the validation cohort of 0.68. The overall classification accuracy of six pulmonary critical imaging findings indicative of TB of the independent datasets was 81.08–91.05%. A moderate to strong correlation was demonstrated between the AI model–quantified TB score and the radiologist-estimated CT score. CONCLUSIONS: The proposed end-to-end AI system based on chest CT can achieve human-level diagnostic performance for early detection and optimal clinical management of patients with pulmonary TB. KEY POINTS: • Deep learning allows automatic detection, diagnosis, and evaluation of pulmonary tuberculosis. • Artificial intelligence helps clinicians to assess patients with tuberculosis. • Pulmonary tuberculosis disease activity and treatment management can be improved. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1007/s00330-021-08365-z.

Tuberculosis (TB) is an airborne infectious disease caused by the bacillus Mycobacterium that rapidly spreads resulting in significant public health concerns [1, 2] . Although morbidity and mortality associated with TB have continued to gradually decrease globally, the disease burden remains substantial in endemic countries [3] . Chest imaging plays a crucial role in the workup of patients with pulmonary TB [4] . In particular, computed tomography (CT) has been used to help diagnose, monitor imaging changes, and evaluate the severity of pulmonary TB [5] . On the other hand, the radiographic features can be suggestive of the type and activity of TB [6] . Active TB demonstrates radiologic findings that include cavitation, consolidation, centrilobular nodules and tree-in-bud, or clusters of nodules, while inactive TB is characterized by fibronodular scarring and calcified granulomas [5] . Hence, radiographic monitoring can support the decision making of clinicians for timely isolation and appropriate treatment. If the results of the workup are positive for active TB, initial concomitantdrug therapy is required. However, the complexity of chest CT images is challenging. In countries with suboptimal health and surveillance systems, underdiagnosis and missed reporting of TB are common problems.

Artificial intelligence (AI) has gained significant attention in recent years and many applications have been proposed in medical image recognition and interpretation [7] . Deep learning (DL), as the core technique of the increasing application of AI, has made great progress in medical image analysis, including skin disease classification [8] , diabetic retinopathy detection [9] , and lung cancer screening by chest CT [10] . These DL algorithms can autonomously "learn" to predict features from manually classified initial data. Recent promising advances in CT-based DL systems have demonstrated the potential of AI-assisted radiological diagnostic technology [11] [12] [13] . For example, Zhang et al employed a ResNet-18-supervised network for the diagnosis and prediction of triage of patients with coronavirus disease 2019 (COVID-19) during the global pandemic based on CT [14] . Therefore, we hypothesized that end-to-end DL networks can be designed and established by automatic and adaptive feature learning to achieve human expert-level performance for diagnosis and follow-up.

In this study, an AI-based fully automated CT image analysis model was developed and evaluated to provide support for the detection, diagnosis, and disease severity quantification of patients with pulmonary TB.

The study protocol was approved by the institutional review board of Nanfang Hospital of Southern Medical University. Written informed consent was waived due to its retrospective nature. The chest CT scans of patients with suspected TB acquired from December 2017 to September 2020 were retrieved from the picture archiving and communication system and review. A total of 1356 CT scans of 865 patients (maximum of two scans per patient) were collected who met the following inclusion criteria ( Fig. 1) : (1) age ≥18 years;

(2) CT examinations for known or suspected primary or secondary TB; and (3) CT slice thickness ≤ 1.5 mm. Exclusion criteria were CT scans with inadequate image quality (n = 73), no typical imaging findings indicative of TB (n = 215), and negative Mycobacterium tuberculosis culture from sputum, bronchoalveolar lavage, or lung biopsy samples (n = 176).

Chest CT was acquired with different CT scanners from multiple centers. The acquisition and reconstruction parameters are summarized in Table 1 . CT images were reconstructed with a 512 × 512 matrix and a slice thickness of 1-1.5 mm. Images were preprocessed via the lung window setting (window width, 1500 HU; window level, −700 HU) and resampling the voxels to 1 × 1 × 1 mm 3 . A three-dimensional reconstruction approach was used to visualize the severity of TB.

For DL algorithm development (training) and optimization (validation) for lesion localization and classification, data annotation was performed by a thoracic radiologist (J.H.) with 3 years of experience and verified by an expert radiologist (C.Y.) with 11 years of experience. First, abnormal slices with typical pulmonary TB lesions and normal slices without pathological findings were manually marked and used as gold standards to train the DL network. Second, for each CT scan, the TB-related imaging features were judged according to the Fleischner Society glossary [15] , and the center layers with the maximum areas of TB lesions were labeled. The bounding boxes of the above critical findings (average of 2.2 lesions per CT scan) were drawn with the open-source software ITK-SNAP (version, 3.6.0) for lesion detection and segmentation.

Convolutional neural network (CNN)-based cascading networks were automatically connected to create an end-toend processing pipeline (Fig. 2 ). The AI model first identified the abnormal CT slices based on the Attention Branch ResNet [16] . The Sigmoid function was used as the last activation function (Fig. 3) . The top ten abnormal slices of each raw chest CT scan according to Sigmoid function scores were selected by the algorithm for subsequent analysis. Then, the lung region of interest (ROI) based on the CenterNet detection framework was localized [6, 17] (Fig. S1 ). During the training phase, annotated ground truth ROIs were first converted into heatmaps as input. Then, the trained CNN can output the location of the TB lesions and the corresponding segmented ROIs. CenterNet can generate a heatmap and then the error between the predicted and annotated heatmaps is optimized by the focal loss function. Mean average precision (mAP) was used to evaluate the lesion detection performance of the proposed system. To perform region-specific disease diagnosis, we used an 18-layer squeeze-and-excitation ResNet model (SeNet-ResNet-18) [18] pretrained on the ImageNet dataset. Images with bounding boxes as input to the network produced a final prediction of the categories and activities of TB (Fig. S2 ). To further estimate TB activity, the output prediction probability was merged by a two-class classification of active and inactive TB [4] . Finally, a Grad-CAM algorithm based on the slice selection CNN described above [16] was used to generate an activation map to evaluate the disease severity. The computerized quantitative approach provided segmentation of the lung tissues based on thresholds and adaptive region growing. A lung "TB score" was Flowchart of the study process for the training and testing datasets calculated by the ratio of lesion volumetric summation to that of the corresponding lung lobes. Additionally, each lobe was visually scored by consensus of two independent radiologists. According to the subjective score, the patient was defined as severe (≥2 for any lobe) or non-severe (< 2). The detailed system network structure for the AI model is summarized in the Supplementary Materials.

The performance of the model was tested independently at three datasets: dataset 2 (Yanling Hospital, n = 99), dataset 3 (Haikou Hospital, n = 86), and dataset 4 (National Institutes of Health [NIH] TB Portal dataset at https:// tbpor tals. niaid. nih. gov/, n = 171). The model output was probabilities for six critical imaging findings and activities, and the consensus categories, as determined by two radiologists (C.Y., J.H.) were considered as ground truths.

A confusion matrix was calculated to estimate multiclass classifiers, whereas recall, precision, and a more balanced F1_score were used to measure the performance by class. Spearman correlation analysis was performed to assess the correlations between the radiologist-estimated CT score and TB score as determined by the algorithm. The correlation was defined as mild (r < 0.3), moderate (0.3 ≤ r < 0.5), good (0.5 ≤ r < 0.8), or strong (r ≥ 0.8). Also, the interobserver reliability of the subjective CT score rated from 30 randomly chosen cases was calculated as 

The development dataset included 892 scans of 526 patients (320 men and 206 women; mean age, 48.5 ± 16.5 years; age range, 18-92 years) with clinically diagnosed TB.

The AI cascading models consisted of four subsystems, which provide consistent visual descriptions: (1) screening to distinguish between normal and abnormal CT images, (2) object detection and localization of pulmonary infectious lesions, (3) diagnostic assessment of radiological features (6 types) and TB activity, and (4) evaluation of disease severity.

Slice-level analysis was performed to select the top 10 positive slices according to the predicted probability for each CT scan. The average accuracy of the training and validation cohort was 99.6% and 99.8%, respectively.

An advanced real-time object detection algorithm based on the CenterNet detection framework was used to localize lesions, which yielded a mAP of 0 

The classification CNN demonstrated a training accuracy of 99 The classification results per lesion are summarized as a confusion matrix for the critical imaging findings predicted by the AI model (Fig. 4a) . In the validation cohort, the DL model showed good discriminative Table S1 .

To validate the general applicability of the proposed AI system, CT images were obtained from the NIH open-source dataset and additional data from our collaborators. In the independent test cohorts (datasets 2, 3, and 4), the overall classification accuracy rates of the six pulmonary infiltrate types were 91.50% (474/518), 87.65% (440/502), and 86.08% (748/869), respectively. The confusion matrix for each testing dataset is shown in Fig. 4a -d. The predictive performances of the corresponding recall, precision, and F1 score by class are listed in Table 2 . For binary decisions, the CNN model achieved an accuracy of 96.97%, 95.35%, and 98.25% for predicting participant-wise activity of the independent test cohorts (dataset 2, 3, and 4), respectively (Fig. 4e-g) , while the corresponding recall rate was 100%, 94.87%, and 97.87%, and the precision was 95.08%, 94.87%, and 98.70%, respectively.

A Grad-CAM framework to automatically highlight pulmonary lesions was used to assess the extent of the disease. The intraclass correlation coefficient for agreement between the subjective scores of the two radiologists was strong (0.92, 95% confidence interval = 0.90-0.95). As displayed in the attention heatmap obtained by fusion (Fig. 5a) , the AI-discovered suspicious infectious areas matched highly with the actual pulmonary TB lesions. Spearman correlation analyses demonstrated a moderate to good correlation between the AI model-quantified TB score and the radiologist-estimated CT score (r = 0.545-0.713) in the validation cohort. The correlation results are summarized in Table 3 . The TB scores per lobe were significantly higher in patients with severe disease than non-severe disease in the validation and testing sets (all p < 0.05; Fig. 5b) . Some examples of TB and the corresponding prediction results are shown in Fig. 6 .

In this retrospective, multicohort, diagnostic study, an AI cascading model for fully automated diagnosis and triage of pulmonary TB was developed and evaluated based on chest CT images. The results confirmed that the model was useful for detection and classification of critical imaging features and achieved an overall accuracy of 0.86-0.92 with the use of external datasets. Moreover, an attention heatmap highlighted infectious areas for evaluation of TB burden with human-level accuracy. The AI system succeeded to stratify patients into severe and non-severe groups by TB scores quantified by the algorithm. Furthermore, the DL system allowed for accurate detection, diagnosis, and severity a b Fig. 5 a AI-identified suspicious infectious areas on images of severe and non-severe disease. Pseudocolor map represents the three-dimensional reconstruction of the lesion. b Boxplots comparing TB scores per lobe between severe and non-severe patients for the validation and test datasets assessment of TB lesions. The proposed AI system can assist clinicians with the significant demands for pulmonary TB screening, diagnosis, and follow-up in daily clinical practice. TB accounts for an estimated 1.4 million deaths annually worldwide [19] . Although the incidence has continued to gradually decrease over the past decade, TB remains an enormous burden globally [3, 20] . Prompt detection, timely treatment, and routine follow-up are priorities to prevent TBrelated morbidity and spread to the unaffected. Although sputum and blood assays are standard for the diagnosis of TB, these tests are inadequate to assess smears negative for TB and the results can take up to several days [2] . Lung biopsy provides a proven diagnosis pathologically, but the procedure is invasive with significant risks for comorbidities [21] . Chest radiography is a widely available imaging tool for screening and diagnosis of TB [22] . Experimental results have shown that CT scans can aid radiologists in the diagnosis of suspected TB cases when chest radiographs are inconclusive [23] . Furthermore, CT imaging features (including centrilobular nodules, tree-in-bud, consolidation, and cavitation) are reported strongly correlated with the positivity and grading of sputum microbiology results [24] .

AI has become state of the art for image analysis [25] and plays a role in supporting clinical decision making with respect to diagnosis and risk stratification [7, 26] . Lakhani et al [27] recently reported a DL with CNN system to rapidly and accurately classify TB on chest radiographs with a sensitivity of 97.3% and specificity of 100.0%. When evaluating a three-dimensional CT scan, abnormal slices in the full series of images must be initially identified. To reduce the computation burden of the proposed AI model, a pretrained network was used to select key slices with the top 10 confidence to represent a complete CT scan, which achieved similar accuracy of 99.80% as the BConvLSTM U-Net method for image selection [16] . However, there is a tradeoff with this strategy due to missing diagnostic information from the slices for further analysis.

Several previous studies have explored the detection and classification of pulmonary infections, especially during the recent COVID-19 pandemic [16, [28] [29] [30] [31] . Jaeger et al [32] utilized the simple, but effective, one-stage Retina U-Net model for lesion detection and localization, which achieved a mAP of 0.50. In the present study, a CNN based on the Center-Net detection framework managed to automatically identify Fig. 6 Example of chest CT images of patients with pulmonary TB and performance of our AI model suspected regions that were strongly indicative of TB with a mAP of 0.68. More recently, Li and colleagues [33] reported a state-of-the-art three-dimensional DL model to annotate the spatial location of lesions and classify five critical CT imaging types of TB disease (miliary, infiltrative, caseous, tuberculoma, and cavitary), with a classification precision rate at 90.9%. The overall accuracy of the proposed model was similar (0.86-0.92 vs. 0.91) for six typical CT imaging findings. Another novel feature of the AI model is the ability to analyze imaging features and simultaneously predict disease activity. The increased accuracy of participant-wise prediction (98% compared with region-wise accuracy of 90%) is rational. Errors in region-wise predictions are minimized for the prediction of active TB, which could facilitate more effective identification, intervention, and isolation of active cases.

Further to this, various computer-aided CT image analysis tools were recently developed for the diagnosis and evaluation of the disease burden of coronavirus-positive patients, which can help to predict the progression to critical illness [14] . Similarly, Shan et al [30] reported a CT-based DL system automatically focused on segmentation and quantification of regions of infection, which can identify suspicious abnormal areas of the bilateral lungs as a slice-based "heat map." This system allows automatic delineation and prediction of disease severity consistently and quantitatively. The results of the present study demonstrated that the proposed TB score corresponds to disease severity. The lesion percentage determined by the AI model was moderate to well correlated to that of the radiologists (r = 0.453-0.761). Furthermore, the AI model demonstrated significant differences in the TB scores between the severe and non-severe groups in the testing datasets (all p < 0.05). Therefore, such a TB score provides a potential quantitative tool for patient follow-up and management to monitor progression and regression of findings. More importantly, unlike prior supervised algorithms based on slice-level analysis, the proposed approach can search an entire CT study without human guidance. Then, the exported quantitative report, including overall TB infection probability, imaging features with spatial coordinates, activity, and severity prediction may serve as an effective reference for clinicians to make decisions, which is well-suited in real-life health services.

There were several limitations to this study that should be addressed. First, the sample size was relatively small and the insufficiency of data for training the AI networks may have limited the performance of the model. Second, the CT data from different centers were rather heterogeneous regarding the scanning parameters and slice thicknesses. However, such heterogeneous data may allow the results to be more generalizable. As demonstrated in the independent test cohorts, the proposed AI model is robust regarding slice thickness and lesion distribution. Third, this study only focused on the six typical types of pulmonary infiltrates, while ignoring other rare signs of primary TB, including pleural effusion and enlarged lymph nodes. The sample size of the rare imaging features was insufficient for training. In the future, the AI system must be trained with a larger dataset with nontypical imaging manifestations to be more suitable for clinical needs. Lastly, overfitting is problematic with complex deep learning models. The external validation datasets used in this study had inherent differences, as the cohorts consisted of patients with different disease burdens, which may explain the slight drop in performance in the testing phase of this study.

In conclusion, the DL cascading model based on chest CT images can be clinically applicable for accurate detection, diagnosis, and triage of pulmonary TB. This fully automated AI system has great potential in clinical practice for rapid assessment of disease activity and guidance of treatment and management of pulmonary TB.

Statistics and biometry One of the authors has significant statistical expertise.

Ethical approval Institutional review board approval was obtained.

• retrospective • diagnostic or prognostic study • multicenter study

The global burden of tuberculosis: results from the global burden of disease study 2015

Global control of tuberculosis: from extensively drug-resistant to untreatable tuberculosis

Imaging in tuberculosis

Pulmonary tuberculosis: role of radiology in diagnosis and management

Artificial intelligence applications for thoracic imaging

Artificial intelligence in radiology

A deep learning system for differential diagnosis of skin diseases

Development and validation of a deep learning system for diabetic retinopathy and related eye diseases using retinal images from multiethnic populations with diabetes

End-to-end lung cancer screening with three-dimensional deep learning on lowdose chest computed tomography

Deep learning: definition and perspectives for thoracic imaging

Convolutional neural networks for radiologic images: a radiologist's guide

Artificial intelligence, machine (deep) learning and radio(geno)mics: definitions and nuclear medicine imaging applications

Clinically applicable AI system for accurate diagnosis, quantitative measurements, and prognosis of COVID-19 pneumonia using computed tomography

Fleischner Society: glossary of terms for thoracic imaging

Artificial intelligenceenabled rapid diagnosis of patients with COVID-19

Dense anatomical annotation of slit-lamp images improves the performance of deep learning for the diagnosis of ophthalmic disorders

Deep learning with convolutional neural network for differentiation of liver masses at dynamic contrast-enhanced CT: a preliminary study

WHO (2020) Global tuberculosis report

Determining the optimal puncture site of CT-guided transthoracic needle aspiration biopsy for the diagnosis of tuberculosis

Deep learning-based automated detection algorithm for active pulmonary tuberculosis on chest radiographs: diagnostic performance in systematic screening of asymptomatic individuals

Pulmonary tuberculosis in infants: radiographic and CT findings

The relation between CT findings and sputum microbiology studies in active pulmonary tuberculosis

Automated bank cheque verification using image processing and deep learning methods

Demystification of AI-driven medical image interpretation: past, present and future

Deep learning at chest radiography: automated classification of pulmonary tuberculosis by using convolutional neural networks

Artificial intelligence for the detection of COVID-19 pneumonia on chest CT using multinational datasets

Open resource of clinical data from patients with pneumonia for the prediction of COVID-19 outcomes via deep learning

Abnormal lung quantification in chest CT images of COVID-19 patients with deep learning and its application to severity prediction

XCOVNet: chest X-ray image classification for COVID-19 early detection using convolutional neural networks

Retina U-Net: embarrassingly simple exploitation of segmentation supervision for medical object detection

A deep learning system that generates quantitative CT reports for diagnosing pulmonary Tuberculosis

Authors and Affiliations