key: cord-0023104-en1eefdv authors: Narang, Akhil; Bae, Richard; Hong, Ha; Thomas, Yngvil; Surette, Samuel; Cadieu, Charles; Chaudhry, Ali; Martin, Randolph P.; McCarthy, Patrick M.; Rubenson, David S.; Goldstein, Steven; Little, Stephen H.; Lang, Roberto M.; Weissman, Neil J.; Thomas, James D. title: Utility of a Deep-Learning Algorithm to Guide Novices to Acquire Echocardiograms for Limited Diagnostic Use date: 2021-02-18 journal: JAMA Cardiol DOI: 10.1001/jamacardio.2021.0185 sha: fce8e1d4db6688e80b09073ffed5b86984fabcde doc_id: 23104 cord_uid: en1eefdv IMPORTANCE: Artificial intelligence (AI) has been applied to analysis of medical imaging in recent years, but AI to guide the acquisition of ultrasonography images is a novel area of investigation. A novel deep-learning (DL) algorithm, trained on more than 5 million examples of the outcome of ultrasonographic probe movement on image quality, can provide real-time prescriptive guidance for novice operators to obtain limited diagnostic transthoracic echocardiographic images. OBJECTIVE: To test whether novice users could obtain 10-view transthoracic echocardiographic studies of diagnostic quality using this DL-based software. DESIGN, SETTING, AND PARTICIPANTS: This prospective, multicenter diagnostic study was conducted in 2 academic hospitals. A cohort of 8 nurses who had not previously conducted echocardiograms was recruited and trained with AI. Each nurse scanned 30 patients aged at least 18 years who were scheduled to undergo a clinically indicated echocardiogram at Northwestern Memorial Hospital or Minneapolis Heart Institute between March and May 2019. These scans were compared with those of sonographers using the same echocardiographic hardware but without AI guidance. INTERVENTIONS: Each patient underwent paired limited echocardiograms: one from a nurse without prior echocardiography experience using the DL algorithm and the other from a sonographer without the DL algorithm. Five level 3–trained echocardiographers independently and blindly evaluated each acquisition. MAIN OUTCOMES AND MEASURES: Four primary end points were sequentially assessed: qualitative judgement about left ventricular size and function, right ventricular size, and the presence of a pericardial effusion. Secondary end points included 6 other clinical parameters and comparison of scans by nurses vs sonographers. RESULTS: A total of 240 patients (mean [SD] age, 61 [16] years old; 139 men [57.9%]; 79 [32.9%] with body mass indexes >30) completed the study. Eight nurses each scanned 30 patients using the DL algorithm, producing studies judged to be of diagnostic quality for left ventricular size, function, and pericardial effusion in 237 of 240 cases (98.8%) and right ventricular size in 222 of 240 cases (92.5%). For the secondary end points, nurse and sonographer scans were not significantly different for most parameters. CONCLUSIONS AND RELEVANCE: This DL algorithm allows novices without experience in ultrasonography to obtain diagnostic transthoracic echocardiographic studies for evaluation of left ventricular size and function, right ventricular size, and presence of a nontrivial pericardial effusion, expanding the reach of echocardiography to clinical settings in which immediate interrogation of anatomy and cardiac function is needed and settings with limited resources. Detailed Description of Caption Guidance User Workflow: As shown in eFigure 1 right, the user interface is designed to aid medical professionals without prior ultrasound experience to perform diagnostic imaging. The user interface guides users through a predefined and customizable imaging workflow to capture a specific set of ultrasound views. In the clinical study, a protocol of 10 views was utilized. Each view was attempted sequentially according to the preset protocol. The user interface contains static guidance, which indicates the approximate starting location on the surface of the body, and a canonical image for the desired view in the protocol. Other relevant user interface features include the "quality meter," "prescriptive guidance," and "save best clip." Users were instructed to first observe the static guidance display to orient the probe to begin and to familiarize themselves with the desired view. They are then instructed to begin scanning and observe the response of the quality meter. Because the quality meter provides an estimate of the 6D distance from the desired view, the user can observe the response of the quality meter as it corresponds to their probe movements and choose to continue probe movements that increase the response of the quality meter. For example, moving the probe more medially may reduce the quality meter response, but moving more laterally may increase the response, thus giving the user feedback to continue with a lateral probe movement. When the underlying algorithms detect a recognizable image appearance, users are presented with prescriptive guidance cues to guide probe movement with a specific motion. For example, an under-rotation may be detected for a parasternal long axis view acquisition, and the prescriptive guidance would then instruct the user to rotate slowly counterclockwise. When the user follows the recommended prescriptive guidance command, the quality meter typically responds with increasing response and users are instructed to continue their motion until they maximize the response of the quality meter. The quality meter also indicates a diagnostic quality threshold, and if the user maintains a quality meter level above this threshold, the software will automatically begin to capture a clip prospectively (called auto-capture) and store the clip as long as the user maintains the meter level above the threshold for at least 2 seconds up to 4 seconds; a clip shorter than 2 seconds will be discarded. This auto-captured 2 to 4 second clip is then utilized as the resulting clip for that view. If during scanning, the user does not cross the auto-record threshold within 2 minutes, the save best clip option will appear, enabling the user to proceed with the selection of the 2 second image sequence that produced the highest quality meter response over the 2 minutes. The user may either choose to tap on the save best clip option and proceed to the next view, or may continue to scan and attempt to achieve an auto-captured image. The save best clip feature was utilized in a high proportion of patient exams, most of which were deemed to be diagnostic quality even though they did not cross the auto-capture threshold. Note that the diagnostic threshold has been optimized to have high precision, rather than high recall, so as to optimize for high-diagnostic quality scans. At completion of the 10 view protocol, users are presented a summary page that enables them to review the imagery they have acquired for each view. Upon review, users can elect to re-record a specific view, which then enters them into the workflow for that specific view before returning them to the summary page. Once review is complete, users chose to end the study and save the results. Studies were then transferred for storage and the clinical read using standard methods. eFigure 2 shows in greater detail how an operator uses AI-guided feedback as they acquire each ultrasound view, including the process of following the Quality Meter and Prescriptive Guidance prompts to achieve Auto-Capture, and also a situation where the Save Best Clip is triggered. Through this process, operators obtain a video clip of the desired view by either achieving Auto-Capture or a Save Best Clip capture, repeating this process for each of the 10 views (which are shown in eFigure 3). Testing / Validation: The individual components of the AI-guidance have been evaluated for the precision of estimates vs sonographer judgements, estimation of diagnostic quality, performance of prescriptive guidance cues, and pilot testing of novice user performance. In preparation for the prospective study described in this paper, we performed a pilot study of 4 nurses with no prior ultrasound experience. This study included 16 subjects with cardiac pathology and a range of body mass index (BMI). The pilot study produced numbers utilized in the power analyses we performed to determine the study size of this pivotal study. We obtained consistent results in the final study as compared to the pilot study. Additional descriptions of the component testing activities of neural network are beyond the scope of this paper. In calculating the sample size for the current study, we recognized two sources of random variance in the data: the nurses and the patients. Accordingly, we approached this as a multi-reader, multi-case (MRMC) study (with the nurses serving as "readers" in this context): the MRMC approach does not assume that all nurses have the same skill level nor that all the patients present the same level of scanning "difficulty." Note that the variability of RN skill level can influence the precision of our estimate of acquisition success rate. Because the success/failure of the clinical trial depends not just on the level of acquisition success but the precision as well, we sized the study to sufficiently demonstrate the generalizability to our conclusions in a statistically significant manner. In particular, the study was powered to detect the primary endpoint's exceeding the performance goal of 80% (alpha = 0.05, beta = 0.2). This was done based on the result of the pilot study with the same study design (with a minor difference in initial RN training duration). The statistical power was estimated using iMRMC 4.0 software developed in the FDA, which provides the 95% CI around the point estimate for a given parameter and was used for the primary endpoint analysis. The software considers the variability of the performance of RNs as a source of variability as well as the variability of the outcomes across different patients as another source of variability (random effects model), performing multi-reader, multi-case analysis based on Gallas et al. 6 It was assumed that the mean effect size, the variance of RN performance, and the variance of success rates across patients would remain the same between the pilot study and the main study. As detailed in the manuscript, a study design with eight RNs performing 30 cases each led to a power of 0.92 for the sequential testing of the four primary endpoints with the above performance requirement. eFigure 1. Evidence base (left), neural network optimization (center), and user interface (right) of the AI guidance for echo acquisition. Schematic diagram illustrating the deep learning algorithm training dataset, optimization, and runtime operation. To the left is shown how the impact on image appearance from millions of probe movements was captured along with hundreds of thousands of expert sonographer and cardiologist judgements of quality, as well as suggested manipulations to improve the image. This then was provided as the input training dataset to a multilayer convolutional neural network (center) to optimize the deep learning algorithm parameters using massive calculations on a 31 teraFLOPS (trillion floating point operations per second) GPU array running for two weeks (for a total of 7.2x10 18 32-bit calculations). To the right depicts the operation of the deep learning algorithm at runtime (during the operation by the nurses in the study). Note that during runtime the deep learning algorithm's only input is the live ultrasound image, and no positioning information or clinician input is necessary for the algorithm to judge quality and issue guidance commands. In the right panel, the guidance indicates that the user needs to "rotate [the probe] slowly counter-clockwise" in order to improve the parasternal long-axis image. Abbreviations: GPU, graphical processing unit; exaflops, 10 18 floating point operations. eFigure 2. The typical workflow for user interaction with the AI guidance. Schematic diagram illustrating the user operation by the nurses during the study. Users begin (Step 1) by manipulating the probe position and watching for feedback from the quality meter. A guidance command may appear directing the user to make a specific probe manipulation to acquire a more diagnostic image. ( Step 2) If the user follows the instruction, the guidance meter is likely to increase to a level appropriate for diagnostic purposes, as indicated by the quality meter. After holding the probe such that the quality meter remains in the diagnostic regime for sufficient time (Step 3a) , the image will be auto recorded. If after a pre-specified time interval (Step 3b), this threshold has not been reached, the user may opt to capture the highest scoring clip thus far or continue scanning in hopes of achieving auto-capture. eFigure 3. Ten representative still images acquired by a study nurse from a single patient. Representative still images of 10 standard TTE views acquired by a nurse using the DL algorithm that were judged to be of diagnostic quality. Moving images are provided in the online supplement (eVideo 2-11) . Abbreviations: PLAX, parasternal long-axis view; PSAX-AV, -MV, -PM, parasternal short axis view at the aortic valve, mitral valve, and papillary muscle Deep Neural Networks for Acoustic Modeling in Speech Recognition ImageNet classification with deep convolutional neural net-works2012 RNA splicing. The human splicing code reveals new insights into the genetic determinants of disease Mitosis detection in breast cancer histology images with deep neural networks Deep neural networks rival the representation of primate IT cortex for core visual object recognition Multireader multicase variance analysis for binary data