key: cord-0868005-3lddq034 authors: Choi, Hyunsu; Chang, Won; Kim, Jong Hyo; Ahn, Chulkyun; Lee, Heejin; Kim, Hae Young; Cho, Jungheum; Lee, Yoon Jin; Kim, Young Hoon title: Dose reduction potential of vendor-agnostic deep learning model in comparison with deep learning–based image reconstruction algorithm on CT: a phantom study date: 2021-08-14 journal: Eur Radiol DOI: 10.1007/s00330-021-08199-9 sha: 25697c596ae0ded890dc267d52cc314bf47aede8 doc_id: 868005 cord_uid: 3lddq034 OBJECTIVES: To compare the dose reduction potential (DRP) of a vendor-agnostic deep learning model (DLM, ClariCT.AI) with that of a vendor-specific deep learning–based image reconstruction algorithm (DLR, TrueFidelity™). METHODS: Computed tomography (CT) images of a multi-sized image quality phantom (Mercury v4.0) were acquired under six radiation dose levels (0.48/0.97/1.93/3.87/7.74/15.47 mGy) and were reconstructed using filtered back projection (FBP) and three strength levels of the DLR (low/medium/high). The FBP images were denoised using the DLM. For all DLM and DLR images, the detectability index (d′) (a task-based detection performance metric) was obtained, under various combinations of three target sizes (10/5/1 mm), five inlets (CT value difference with the background; −895/50/90/335/1000 HU), five phantom diameters (36/31/26/21/16 cm), and six radiation dose levels. Dose reduction potential (DRP) measures the dose reduction made by using DLM or DLR, while yielding d′ equivalent to that of FBP at full dose. RESULTS: The DRPs of the DLM, DLR-low, DLR-medium, and DLR-high were 86% (81–88%), 60% (46–67%), 76% (60–81%), and 87% (78–92%), respectively. For 10-mm targets, the DRP of the DLM (87%) was higher than that of all DLR algorithms (58–86%). However, for smaller targets (5 mm/1 mm), the DRPs of the DLR-high (89/88%) were greater than those of the DLM (87/84%). CONCLUSION: The dose reduction potential of the vendor-agnostic DLM was shown to be comparable to that of the vendor-specific DLR at high strength and superior to those of the DLRs at medium and low strengths. KEY POINTS: • DRP of the vendor-agnostic model was comparable to that of high-strength vendor-specific model and superior to those of medium- and low-strength models. • Under various radiation dose levels, the deep learning model shows higher detectability indexes compared to FBP. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1007/s00330-021-08199-9. Although computed tomography (CT) is a widely used imaging modality in clinical practice [1] , it inherently involves the hazard of radiation exposure [2] . Iterative reconstruction (IR) has facilitated technical efforts to reduce the radiation dose without compromising diagnostic performance [3] [4] [5] . Recently, however, deep learning-based image reconstruction algorithms (DLRs) have emerged as an alternative technical approach. For example, a DLR introduced by GE Healthcare (Truefidelity™) was reported to reduce the radiation dose by 36-50% compared with IR, while decreasing image noise and improving lesion detectability [6] [7] [8] [9] [10] . However, because it is a vendor-specific algorithm, it can only be mounted and operated on state-of-the-art GE Healthcare machines. The need for a dose reduction technique that is applicable to any existing CT scanner was strongly reinforced by the coronavirus disease 2019 pandemic, with the increased demand for chest CTs [11] . Therefore, the development and evaluation of vendor-agnostic deep learning-based algorithms that improve image quality with a lower radiation dose is a timely and necessary task. A few vendor-agnostic techniques that can reduce image noise with non-inferior detectability compared to existing vendor-specific IR and DLR have been introduced [8, [12] [13] [14] . ClariCT.AI (ClariPI) is a vendor-agnostic deep learning model (DLM) that works in the image post-processing domain and does not need projection data. According to a recent study by Nam et al, for ultralow-dose CT, the DLM achieved better subjective image quality with fewer artifacts than did the DLR, although the vendor-specific DLR showed less noise and a higher signal-to-noise ratio than did the vendor-agnostic DLM [15] . However, this study assessed the subjective image quality under a single radiation dose level, and the diagnostic performance was not evaluated. With the development of various techniques (e.g., IR), for CT, AAPM Task Group (TG) 233 suggested a guideline for the performance evaluation of CT images. This guideline includes a detectability index, which is a task-based detection performance metric that indicates the clinical performance in detecting lesions [16] . To the best of our knowledge, the dose reduction potentials of vendor-agnostic DLM and DLR have not yet been compared using the detectability index. Therefore, in this study, we compared the dose reduction potential (DRP) of the vendor-agnostic DLM and DLR using the detectability index. We used a multi-sized image quality phantom (Mercury v4.0 phantom, GAMMEX Sun Nuclear Corp.) suggested by AAPM TG 233 for assessing a CT system; it is a cylindrical polyethylene phantom with expected background CT values of −90 HU under 120 kVp. It consists of five sections with different diameters (36/31/26/21/16 cm) and a total length of 52 cm. Each section includes five inlets representing solid water, bone-mimicking material, polystyrene, 10-mg/ml iodine, and air ( Fig. 1) , each with expected CT difference with the background of 90, 1000, 50, 335, and −895 HU, respectively [16] . We measured the background CT value by drawing a concentric circular ROI around each inlet, excluding the area of the inlet (Supplementary Appendix Figure E1 ). Images were acquired using Revolution CT (GE Healthcare) equipped with the TrueFidelity algorithm. Scanning was done under six radiation dose levels (volume CT dose index (CTDI vol ): 15.47, 7.74, 3.87, 1.93, 0.97, and 0.48 mGy) with a tube voltage of 120 kV. We set the full-dose level in this study to 15.47 mGy by referencing the diagnostic reference level of abdominal pelvic CT, which is typically set at the 75th percentile of CTDI vol for patients with water equivalent diameters between 29 and 33 cm [17] . The applied CT parameters are listed in Table 1. CT images of the Mercury phantom were reconstructed using the FBP and a DLR (TrueFidelity-L/M/H). The inference engine of TrueFidelity generates three selectable reconstruction strength levels (i.e., low, medium, high (L/M/H)) to control the degree of noise reduction. Reconstruction strength levels are qualitatively divided according to the noise reduction and are selectable based on the clinical application. A body kernel was used for reconstruction with FBP. Scanning was performed 10 times at each radiation dose level [16, 18] . To achieve a reliable estimate of the task-based transfer function (TTF), which represents the spatial resolution of the CT image, the contrast-to-noise ratio (CNR) should be at least 15. However, for low-contrast inlets with high image noise, the minimum CNR may not be achievable in a single slice. In that case, multiple slices should be averaged to lower the noise and increase the CNR [16, 18] . We followed the averaging method specified by AAPM TG 233 to achieve a total effective CNR greater than 15 [16, 18] . We obtained TTF to calculate a metric representing the CT image quality (i.e., detectability index), which will be described in detail later. Deep learning-based denoising from the FBP images was performed using the DLM [19] . The DLM is an image-based CT denoising solution having a U-net-based CNN, trained by taking a noise-added CT image as input to produce an original CT image as output. To generate the training dataset, a synthetic sinogram-based low-dose CT simulation technique was applied to create a paired set of low-dose and standard-dose CT images from a set of given standard-dose CT images [20] . To solve the problem of overfitting and to train a model that minimizes bias, we trained the DLM using various combinations of clinical and phantom images from four major vendors. Various vendor-specific datasets were used for the training, so that the DLM could acquire vendor-agnostic denoising capability. The training dataset comprised more than 1 million clinical CT images encompassing 2,100 different combinations of scan and reconstruction conditions, including various kVp, mAs, automatic exposure controls, slice thicknesses, contrast enhancements, and convolution kernels, with 24 scanner models from four different CT manufactures (GE Healthcare, Siemens, Philips Healthcare, and Canon). In this study, we used the DLM version (ver.1.2.1), which was further trained using the images of standard QA phantoms such as AAPM, ACR, and Catphan considering phantom-based image quality assurance practices. The DLM performance has been previously evaluated in several clinical studies [12] [13] [14] [15] 21] . We used a dedicated analysis software tool (imQuest, Duke University) to evaluate the detectability index (d′) [22] . This index is a metric of the image quality that is specifically designed to represent the detectability of target objects; an c Three-centimeter-sized five inlets were radially distributed equally from the center in each section of diameter increase in d′ suggests that the target object is easier to detect, and the image quality is better [16] . The detectability index consists of three primary components: a task to be performed, an observer to perform the task, and the image to be evaluated. The task is to detect a circular signal defined by a task function (W task ). W task encodes the properties of the image (i.e., size, shape, contrast, and contrast profile), and these encoded profiles are used to calculate the detectability index. We used the target signals of designer profile with three sizes of circular signals (i.e., 1, 5, and 10 mm) for each of the five inlets. For the observer model, a non-pre-whitening (NPW) matched filter was adopted that correlates strongly with human performance in the low-contrast detection task [16] . To acquire the detectability index, the noise power spectrum (NPS) and TTF were first measured. The noise was measured in terms of the NPS to analyze the components of the Mercury phantom at each dose level. The TTF was measured as an indicator of the spatial resolution for each dose level and inlet. The spatial frequency at which the TTF reached 50% was indicated as TTF 50% and was used to represent the system resolution under the given condition [16] . The change in TTF 50% compared with the FBP was obtained with respect to the object and radiation dose. NPS and a detectability index were obtained according to the methods specified by AAPM task group 233. The d′ was obtained for DLM and DLR images using the methods introduced by AAPM TG 233 under various combinations of five inlets, five phantom diameters, and six radiation dose levels [16] . For convenience, the normalized d′ value was used in consideration of the wide variation in d′ across the five inlets. The normalization of d′ at each dose level was achieved by dividing the value by that of the same inlet on the full dose ( The performances of the DLM and DLR-L/M/H were compared by obtaining the DRP as the final comparison value. The DRP was defined as the amount of radiation dose that could be reduced by using either the DLM or DLR instead of the FBP. For example, the radiation dose at which the d′ of polystyrene reaches 1 is 16 mGy for the FBP, whereas that for the DLM is 6 mGy. Thus, the DRP of the DLM is (16-6) mGy/16 mGy × 100 (%) = 62.5 (%) [23] . The detectability index was analyzed with respect to the inlet, phantom diameter, and radiation dose. For all inlets and phantom diameters, d′ of the DLM was higher than those of all DLR levels (Figs. 2 and 3) . The difference in d′ between the DLM and the DLR-H was greater when the diameter of the phantom was larger. At higher dose levels (3.87/7.74/15.47 mGy), d′ of the DLM was higher than that of the DLR-H. For radiation doses below 1.93 mGy, d′ of the DLM was lower than that of the DLR-H but was comparable to that of the DLR-M (Fig. 4) . Figure 5 illustrates the image of polystyrene Table 1 -3). The average DRPs of the DLM, DLR-L, DLR-M, and DLR-H were 86% (81-88%), 60% (46-67%), 76% (60-81%), and 87% (78-92%), respectively. For large target objects (10 mm), the DRP of the DLM was slightly higher than those of all DLR levels. However, for smaller targets (1 mm and 5 mm), the DRP of the DLR-H (89 and 88%) was greater than that of the DLM (87 and 84%) ( Table 2 There was a tendency where the DLM showed higher spatial resolution than the DLRs for inlets having high contrast values (bone-mimicking material and iodine) ( Table 3) . On the other hand, in the case of polystyrene, being the inlet with the lowest contrast value among the five inlets, the spatial resolution of DLM was degraded compared with the FBP; that of the DLM was between that of the medium and high reconstruction levels of the DLR. For air, the spatial resolution of the DLR was higher than that of the DLM. Moreover, the spatial resolutions of the DLM and DLR were dependent on the level of radiation exposure (Table 4) . TTF 50% of FBP, DLM, and DLR-H, M, and L decreased as the radiation dose decreased. The spatial resolution of DLR decreased with the increase of reconstruction strengths. In this study, performances of the vendor-agnostic DLM and a vendor-specific DLR were compared in terms of the DRP. We found that the DLM showed DRP comparable to that of DLR. Although several vendor-specific DLRs are now available, to the knowledge of the authors, this study is the first to compare a vendor-agnostic DLM with a commercially available DLR based on the method specified by the AAPM TG 233 group. Analysis of CT image quality based on conventional metrics such as CNR or modulation transfer function is insufficient for an objective evaluation of DL algorithms, as those metrics cannot appropriately evaluate the non-linear and multifarious nature of the DL algorithms [24] . To reflect the object-dependent resolution and noise performance of DL algorithms, we employed metrics (e.g., TTF, NPS, and d′) by adopting the methodology specified by AAPM TG 233 to better represent the clinical performance. It was unclear whether the vendor-agnostic DLM could exhibit DRP performance comparable to that of the DLR. The DLR was expected to have integrated into its neural networks many vendor-specific experiences and knowledge regarding the physical and mathematical properties of the whole reconstruction process. In contrast, the DLM focuses on a single task of image noise reduction without using raw projection data. Therefore, our finding that the DLM showed DRP comparable to that of the DLR is encouraging for attempts to reduce radiation doses. The algorithm can potentially be adapted by many institutions having CT scanners from different vendors. Indeed, compared to reported DRPs of the currently used IRs (range from 25 to 65%) (4, 5) , the DLM in our study showed DRP of a higher range (81-88%). If this degree of dose reduction potential can be translated into clinical practice, CT images of the abdomen could be acquired under a 2-mGy radiation dose (15 mGy × 14%). When the effective dose (mSv) is calculated with a constant 0.015 mSv·mGy -1 · cm -1 and a scan length of 50 cm, the radiation exposure is approximately 1.6 mSv, which is equivalent to that of two abdomen x-rays [25] . Previous studies have shown that clinical outcome using 2-mSv CT is not inferior to that using conventional dose CT for acute appendicitis. There is a dearth of similar research in other disease groups [26, 27] . Our results may facilitate future research on reducing CT radiation used for various diseases In the subgroup analysis, the DLM showed d′ higher than that of the DLR as the diameter of the phantom increased. Such strength of DLM may be translated to its application in obese patients. In low radiation-dose levels, however, the DLM showed lower d′ than did the DLR-H, owing to its relatively low noise reduction. This performance change may be attributed to the difference in denoising principles between the two algorithms. Because the DLM is an imagebased noise reduction algorithm, it was trained to reduce the noise level only to the extent where the image texture could be preserved. In contrast, in the DLR, which is a projection sinogram-based algorithm, the noise reduction power might override the preservation of image texture. In fact, a previous study reported that when applied to low radiation dose CT scans in its full strength, the DLR caused image texture degradation and texture distortion artifacts that can be mistaken for a true lesion [15] . In previous DLR studies, the image texture was preserved, and the NPS peak remained unchanged compared with the FBP [28] . However, analysis of the noise and spatial resolution revealed that the DLM showed a slightly right-shifted NPS peak, reflecting a sharpening of the image texture. This shift of the NPS peak in the DLM resulted in higher d′ values than did the DLR-H for all object materials. Further evaluation is needed to Previous studies have demonstrated that for DLR, spatial resolution depends on contrast of the object material and radiation dose [29] [30] [31] . Our results were consistent with those of the previous studies, showing a reduction in the spatial resolution (TTF) with a lower radiation dose and target contrast, in both DLR and DLM images. Owing to such dependency, McCollough et al found that a radiation dose reduction of more than 25% results in lower spatial resolution in IR images than in FBP images [30] . Therefore, the decrease in spatial resolution is likely to be worse for objects having contrast lower than that of polystyrene (50 HU). To overcome the degradation of the spatial resolution, the use of a generative adversarial network (GAN) was proposed in deep learning-based algorithms. The GAN focuses on migrating the data noise distribution from the strong to the weak statistically, to avoid over-smoothing or structural detail losses [32] . However, verification of GAN in clinical CT remains insufficient. Therefore, further study is needed to determine whether the DLM incorporating a GAN can further preserve the spatial resolution for low-contrast object materials. Meanwhile, our results also showed the degradation of spatial resolution of DLM below 1.97 mGy. There is no clear explanation for this phenomenon, and further research is needed regarding how the spatial resolution of DLM is affected at a very low radiation dose level. This study has several limitations. First, because the DLM and DLR were evaluated using a phantom, further research using clinical CT images is needed to validate their performance in clinical applications. Second, although we tried to remove vendor-specific overfitting bias by training with various combinations of clinical and phantom images from four vendors, the DLM and DLR were evaluated using only one type of CT equipment. Therefore, to demonstrate the generalizability of the DLM, it should be evaluated on various other CT machines. Third, because we used FBP images reconstructed only with the body kernel, further research using other kernels may be needed. Finally, as mentioned, owing to the dependency of spatial resolution on contrast, further study is needed on low-contrast objects (below 50 HU) based on the AAPM TG 233 guideline to validate the clinical applicability of the DLM. In conclusion, the dose reduction potential of the vendoragnostic DLM was shown to be comparable to that of the vendor-specific DLR at high strength and superior to those of the DLRs at medium and low strengths Guarantor The scientific guarantor of this publication is Won Chang. Conflict of interest ClariPI provided technical support in obtaining denoised images. Jong Hyo Kim is CO-CEO & CTO of ClariPI but did not have control over any of the data or information submitted for publication. No complex statistical methods were necessary for this paper. Informed consent Written informed consent was waived by the Institutional Review Board. • Experimental • Performed at one institution Computed tomography-an increasing source of radiation exposure Radiation dose associated with common computed tomography examinations and the associated lifetime attributable risk of cancer Iterative reconstruction technique for reducing body radiation dose at CT: feasibility study Estimated radiation dose reduction using adaptive statistical iterative reconstruction in coronary CT angiography: the ERASIR study Model-based iterative reconstruction technique for radiation dose reduction in chest CT: comparison with the adaptive statistical iterative reconstruction technique Image quality and dose reduction opportunity of deep learning image reconstruction algorithm for CT: a phantom study Deep learningbased image restoration algorithm for coronary CT angiography Low-dose abdominal CT using a deep learning-based denoising algorithm: a comparison with CT reconstructed with filtered back projection or iterative reconstruction algorithm Deep learning reconstruction at ct: phantom study of the image characteristics Validation of deep-learning image reconstruction for low-dose chest computed tomography scan: emphasis on image quality and noise Variations in CT utilization, protocols, and radiation doses in COVID-19 pneumonia: results from 28 countries in the IAEA study Application of vendorneutral iterative reconstruction technique to pediatric abdominal computed tomography Incremental image noise reduction in coronary CT angiography using a deep learning-based technique with iterative reconstruction Effect of a novel denoising technique on image quality and diagnostic accuracy in low-dose CT in patients with suspected appendicitis Image quality of ultralow-dose chest CT using deep learning techniques: potential superiority of vendor-agnostic post-processing over vendor-specific techniques Performance evaluation of computed tomography systems: summary of AAPM task group 233 US diagnostic reference levels and achievable doses for 10 adult CT examinations Assessment of volumetric noise and resolution performance for linear and nonlinear CT reconstruction methods Combined low-dose simulation and deep learning for CT denoising: application in ultra-low-dose chest CT Noise reduction approach in pediatric abdominal CT combining deep learning and dual-energy technique CT iterative reconstruction algorithms: a task-based image quality assessment Assessment of the dose reduction potential of a model-based iterative reconstruction algorithm using a task-based performance metrology An improved index of image quality for task-based performance of CT iterative reconstruction across three commercial implementations Effective doses in radiology and diagnostic nuclear medicine: a catalog Low-dose CT for the diagnosis of appendicitis in adolescents and young adults (LOCAT): a pragmatic, multicentre, randomised controlled noninferiority trial Low-dose abdominal CT for evaluating suspected appendicitis Noise and spatial resolution properties of a commercially available deep learningbased CT reconstruction algorithm Measuring contrast-and noise-dependent spatial resolution of an iterative reconstruction method in CT using ensemble averaging Degradation of CT low-contrast spatial resolution due to the use of iterative reconstruction and reduced dose levels Statistical model based iterative reconstruction (MBIR) in clinical CT systems. Part II. Experimental assessment of spatial resolution performance Low-dose CT image denoising using a generative adversarial network with Wasserstein distance and perceptual loss Publisher's note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations