key: cord-0016303-xbv45kv5
authors: Raghunath, Sushravya; Pfeifer, John M.; Ulloa-Cerna, Alvaro E.; Nemani, Arun; Carbonati, Tanner; Jing, Linyuan; vanMaanen, David P.; Hartzel, Dustin N.; Ruhl, Jeffery A.; Lagerman, Braxton F.; Rocha, Daniel B.; Stoudt, Nathan J.; Schneider, Gargi; Johnson, Kipp W.; Zimmerman, Noah; Leader, Joseph B.; Kirchner, H. Lester; Griessenauer, Christoph J.; Hafez, Ashraf; Good, Christopher W.; Fornwalt, Brandon K.; Haggerty, Christopher M.
title: Deep Neural Networks Can Predict New-Onset Atrial Fibrillation From the 12-Lead ECG and Help Identify Those at Risk of Atrial Fibrillation–Related Stroke
date: 2021-02-16
journal: Circulation
DOI: 10.1161/circulationaha.120.047829
sha: f8498c8a41b5c22e7e44dd0d76a03d7a278d2677
doc_id: 16303
cord_uid: xbv45kv5

Atrial fibrillation (AF) is associated with substantial morbidity, especially when it goes undetected. If new-onset AF could be predicted, targeted screening could be used to find it early. We hypothesized that a deep neural network could predict new-onset AF from the resting 12-lead ECG and that this prediction may help identify those at risk of AF-related stroke. METHODS: We used 1.6 M resting 12-lead digital ECG traces from 430 000 patients collected from 1984 to 2019. Deep neural networks were trained to predict new-onset AF (within 1 year) in patients without a history of AF. Performance was evaluated using areas under the receiver operating characteristic curve and precision-recall curve. We performed an incidence-free survival analysis for a period of 30 years following the ECG stratified by model predictions. To simulate real-world deployment, we trained a separate model using all ECGs before 2010 and evaluated model performance on a test set of ECGs from 2010 through 2014 that were linked to our stroke registry. We identified the patients at risk for AF-related stroke among those predicted to be high risk for AF by the model at different prediction thresholds. RESULTS: The area under the receiver operating characteristic curve and area under the precision-recall curve were 0.85 and 0.22, respectively, for predicting new-onset AF within 1 year of an ECG. The hazard ratio for the predicted high- versus low-risk groups over a 30-year span was 7.2 (95% CI, 6.9–7.6). In a simulated deployment scenario, the model predicted new-onset AF at 1 year with a sensitivity of 69% and specificity of 81%. The number needed to screen to find 1 new case of AF was 9. This model predicted patients at high risk for new-onset AF in 62% of all patients who experienced an AF-related stroke within 3 years of the index ECG. CONCLUSIONS: Deep learning can predict new-onset AF from the 12-lead ECG in patients with no previous history of AF. This prediction may help identify patients at risk for AF-related strokes.

A trial fibrillation (AF) is a common cardiac rhythm disorder associated with several important adverse health outcomes including stroke and heart failure. [1] [2] [3] [4] In patients with AF and risk factors for thromboembolism, early anticoagulation is effective at preventing strokes. [5] [6] [7] [8] Unfortunately, AF is often unrecognized and untreated because it is frequently asymptomatic or minimally symptomatic. [9] [10] [11] Thus, methods to screen for and identify undetected AF are of significant interest [12] [13] [14] to ultimately prevent strokes.

Population-based screening for AF is challenging for 2 primary reasons. First, the yearly incidence of AF in the general population is low, with reported incidence rates of <10 per 1000 person-years younger than 70 years of age. [15] [16] [17] Second, AF is often paroxysmal, with many episodes lasting <24 hours. 18 At present, the most common screening strategy is opportunistic pulse palpation, sometimes in conjunction with a 12-lead ECG during routine medical visits. This has been shown to be cost-effective in certain populations and is recommended in some guidelines. [19] [20] [21] However, studies of implantable cardiac devices suggest that this strategy will miss many cases of AF. 10, 11 Many continuous monitoring devices are now available to detect paroxysmal and asymptomatic AF. 10, 12, 13 Patch monitors can be worn for up to 14 to 30 days, implantable loop recorders provide continuous monitoring for as long as 3 years, and wearable monitors such as the Apple Watch 13 can be worn indefinitely. Continuous monitoring devices overcome the problem of paroxysmal AF but must still contend with the overall low incidence of new-onset AF and cost and convenience limit their use for widespread population screening.

If future AF could be accurately predicted from a widely used and inexpensive test, this could identify a high-risk population that could then be screened with a continuous monitoring device. Machine learning, in particular deep neural networks (DNNs), can likely assist with this task. A recent study by Attia et al demonstrated the ability of a DNN to identify the electrocardiographic signature of paroxysmal AF from 12-lead ECGs showing sinus rhythm in a short time window. 22 A similar signature may be present in the ECG of patients without AF but who develop AF in the future. The prediction of truly future clinical outcomes from the ECG using machine learning methods is a new area of research with great potential. For example, recent work has demonstrated how a DNN can predict 1-year all-cause mortality directly from the 12-lead ECG with good performance, even in patients with ECGs clinically interpreted as normal. 23 In the present study, we trained a DNN to use ECGs to predict new-onset AF in patients with no history of AF. We then simulated a deployment scenario of this model retrospectively to demonstrate the high potential to identify patients who later have an AF-related stroke.

Study data are available to researchers on reasonable request to the corresponding author. The methods can be reproduced based on details in the article; code will not be made available.

The Geisinger Institutional Review Board approved this retrospective study with a waiver of consent, in conjunction with our institutional patient privacy policies. We extracted 2.8 million standard 12-lead digital ECG traces from Geisinger's clinical MUSE (GE Healthcare, Milwaukee, WI) database, acquired between January 1984 and June 2019. Although 12-lead resting ECGs are acquired for 10-s, the ECG traces available for this study were in the standard clinical PDF format with 2.5-s traces for all 12 leads and 10-s rhythm strip traces for leads II, V1, and V5 (15 signal traces in total) at 500 Hz sampling frequency (42% of studies acquired at 250 Hz were resampled to 500 Hz by linear interpolation) and 1 µV resolution. We retained only ECGs (1) acquired in patients ≥18 years of age, and (2) with no significant artifacts as identified by the final ECG interpretation at the time of acquisition. This amounted to 1.6 million ECGs from 431 000 patients. The median (interquartile range) followup available after each ECG was 4.1 (1.5-8.5) years. Qualifying follow-up encounters were restricted to ECG, echocardiography, outpatient visit with internal medicine, family medicine or cardiology, any inpatient encounter, or any surgical procedure. An ECG was classified as normal if the findings text included strings that matched "normal ECG" or "within normal limits" and no other abnormalities were identified. All other ECGs were considered abnormal.

We excluded patients with preexisting or concurrent documentation of AF. The AF phenotype was defined as a clinically reported finding of AF from a 12-lead ECG or a diagnosis of AF applied to 2 or more inpatient or outpatient encounters or

What Is New?

• A deep learning model can identify patients at high risk for new-onset atrial fibrillation (AF). • In patients with no history of AF who have an AFrelated stroke, nearly two thirds would have been predicted to be high-risk for AF before the stroke by the deep learning model.

• AF is a leading cause of stroke, and AF-related strokes can occur in patients with no known history of AF. • A deep learning model capable of predicting future AF could be used in conjunction with a systematic monitoring strategy to find AF early and potentially prevent AF-related stroke. We chose to group atrial flutter with AF because the clinical consequences of the 2 rhythms are similar, including the risk of embolization, and because the 2 rhythms often coexist. AF was considered to be new onset if it occurred at least 1 day after a baseline ECG that did not show AF in a patient with no known previous history of AF. This included patients with newly identified paroxysmal AF as well as incident AF. Electronic health record data were used to identify the most recent qualifying encounter date for censorship.

We designed a deep convolutional neural network using only digital ECG traces as input in 3 temporally coherent branches. The data were restructured into 0-to 5-s signals for leads I, II, V1, and V5 in the first branch, 5-to 7.5-s signals for leads V1, V2, V3, II, and V5 in the second branch, and 7.5-to 10-s signals for leads II, V1, V4, V5, and V6 in the third branch ( Figure I in the Data Supplement). The lead I signal between the 2.5-to 5-s interval was computed using the Goldberger equation 24 (-aVR=[I+II]/2) using signals from leads aVR and II.

The DNN model was designed to analyze the ECG signals to yield a predicted risk score for new-onset AF within 1 year of the ECG. The model architecture is illustrated in Figure  II For all experiments, data were divided into training, internal validation, and test sets. The composition of the training and test sets varied by experiment, as described in Study Design; however, the internal validation set in all cases was defined as a 20% subset of the training data to track validation area under the receiver operating characteristic curve (AUROC) during training to avoid overfitting (details in Methods in the Data Supplement).

The models were evaluated using the AUROC, which is a robust metric of model performance for binary classification. Higher AUROC suggests higher performance (with perfect discrimination represented by an AUROC of 1, and an AUROC of 0.5 equivalent to a random guess). We also computed a precision-recall curve, which summarizes the tradeoff between the true positive rate (sensitivity or recall) and the positive predictive value (precision) for the model at different thresholds. The area under the precision recall curve (AUPRC) was calculated as the average precision score by computing the weighted average of precisions achieved at each threshold by the increase in recall (with perfect discrimination represented by an AUPRC of 1 and random chance equivalent to the proportion of target class in the data-for example, 0.04 [ Figure 1 ]-for the holdout set defined in Study Design).

We performed 2 separate modeling experiments ( Figure 1 ):

(1) Proof-of-concept model: Using all ECGs from January 1984 to June 2019, a holdout set (20%) was identified at the beginning of the study ( Figure 1A ). The model was trained with the remaining 80% of the data. There was no overlap of patients between the holdout set and the training set. All ECGs with known time-toevent or at least 1 year of follow-up were used during model training, and a single random ECG was selected for each patient in the holdout set for model evaluation ( Figure IIIA in the Data Supplement), with results denoted as "M0". Two versions of the model architecture were compared: one with ECG traces alone as inputs (DNN-ECG), and a second with ECG traces, age, and sex (DNN-ECG-AS). For comparison, we implemented an extreme gradient boosting (XGBoost) 25 model using only age and sex as inputs. We also compared the DNN model with the published CHARGE-AF (Cohorts for Aging and Research in Genomic Epidemiology) 5-year risk prediction model 26 in a subset of patients who had all of the data necessary to calculate a CHARGE-AF score.

To establish model stability and generalizability, we performed 5-fold cross-validation within the M0 model training set to derive models M1 to M5 and evaluated each on the respective unique fold test set (cross-validation test sets). There was no overlap of patients between the training set and cross-validation test set in each fold. As earlier, all qualifying ECGs were used during model training, and a single random ECG for a patient was chosen from the cross-validation test sets so as not to overweight patients with multiple ECGs ( Figure 1A) .

We also performed Kaplan-Meier (KM) incidencefree survival analysis 27 with the available follow-up data in the holdout set stratified by the model prediction for all 3 of the models (age and sex only, DNN-ECG, and DNN-ECG-AS), using an optimal operating point to stratify the population into low-and high-risk groups for new-onset AF. The optimal operating point was defined as the point on the ROC curve on the highest iso-performance line (equal cost to misclassification of positives and negatives) in the internal validation set. Patients who did not develop AF were censored at the most recent encounter. We fit a Cox proportional hazard model 28 regressing time to development of AF on the modelpredicted classification of low-risk groups and high-risk groups. The hazard ratios (HR; adjusted for age and sex) were reported for the DNN model predictions, as well as for subpopulations defined by age groups (<50, 50-65, and ≥65 years), sex (men and women), and ECG type (normal and abnormal) for the holdout set.

(2) Simulated deployment model: To simulate a real-world deployment scenario-evaluating model performance in patients who later had an AF-related stroke-we used a second modeling approach ( Figure 1B) . Because a standard digital ECG contains information on age and sex, we used the DNN model that included age and sex for the deployment scenario. All ECGs from 1984 through 2009 were used as a training set. Next, we identified all patients with an ECG between January 1, 2010, and December 31, 2014. For each patient, we chose the ECG with the highest model prediction risk score, and those ECGs comprised the deployment test set. The dates were chosen to align with our institutional stroke registry, which began tracking patients in 2009 as described later.

To link deployment model predictions with potentially preventable stroke events, we leveraged an internal registry of patients diagnosed with acute ischemic stroke after 2009 at any of the 3 main Geisinger hospitals. From January 1, 2010, to December 31, 2017, representing the time interval included in this analysis, there were 6569 patients in the registry who were treated for an ischemic stroke. We used this registry to identify patients within the deployment model test set with an ischemic stroke subsequent to the test set ECG. A stroke was considered AF-related and potentially preventable if the following criteria were met: (1) the ECG in the test set was before the stroke, and the model predicted risk score was above the given operating point (ie, high risk for new-onset AF); and (2) previously undiagnosed AF was identified at the time of the stroke or up to 365 days after the stroke ( Figure IIIB in the Data Supplement). To allow for time lag on emergency department and hospital admission notes, we included AF that was identified up to 2 days before the date of the qualifying stroke encounter. To allow for adequate follow-up, we included strokes that occurred within 3 years of the ECG ( Figure 1B , Figure IIIB in the Data Supplement). A total of 96, 250, and 375 potentially preventable AF-related strokes were identified within 1, 2, and 3 years after ECG, respectively. We performed a chart review to determine whether those patients were on anticoagulation at the time of the stroke (details in Methods in the Data Supplement). We explored F β scores (for β = 0.5, 1, and 2) and the Youden index 29 for model operating points in the internal validation set ( Figure 

Multiple AUROCs were compared by bootstrapping 1000 instances (using random and variable sampling with replacement). Differences between models were considered statistically significant if the absolute difference in the 95% CI was >0.

The KM analysis and HR for proof-of-concept model were computed using the lifelines package (version 0.24.1) in Python (version 3.6.8) and R (version 4.0.0).

The AUROC and AUPRC of the proof-of-concept DNN models for the prediction of new-onset AF within 1 year in the holdout set (M0) were 0.83 (95% CI, 0.83-0.84) and 0.21 (95% CI, 0.20-0.22), respectively, for DNN-ECG; and 0.85 (95% CI, 0.84-0.85) and 0.22 (95% CI, 0.21-0.24), respectively, for DNN-ECG-AS ( Figure 2 We also computed an AUROC of 0.87 (95% CI, 0.86-0.88; DNN-ECG model) for AF presenting exclusively between 1 to 31 days after the baseline ECG, consistent with the findings of Attia et al for the identification of paroxysmal AF from sinus rhythm. 22 We recognize that the DNN model both detects paroxysmal AF and predicts truly incident AF, and this is covered in detail in the Discussion.

The KM curves and HR for the 3 AF-prediction models in Figure 2 are illustrated in Figure 3 with the operating points marked on the corresponding ROC curves (Figure 3A) . The DNN models showed HRs of 6.7 (95% CI, 6.4-7.0) and 7.2 (95% CI, 6.9-7.6) in DNN-ECG and DNN-ECG-AS, respectively ( Figure 3B ). Adjusting for age (in increments of 10 years) and sex (interactions with sex and model were significant), the HR remained significant: 3.7 (95% CI, 3.6-4.1) and 3.1 (95% CI, 2.7-3.4) in women and men, respectively, for the DNN-ECG model and 3.8 (95% CI, 3.6-4.1) and 2.9 (95% CI, 2.5-3.4) in women and men, respectively, in the DNN-ECG-AS model ( Figure 3C ). For unadjusted comparisons, the DNN models had higher HR than the XGBoost model (age and sex) within all subsets defined by sex, age groups, and ECG type (normal or abnormal). Age alone is a powerful predictor of AF, so we further investigated the performance of the DNN-ECG-AS model by stratifying survival curves by age groups. Figure 4 (top row) shows the KM curves for age groups <50, 50 to 65, and ≥65 years in men and women. As expected, in both sexes, the survival curves are substantially different in each age group. However, Figure 4 (bottom row) shows that in each age group, the DNN model retains its ability to discriminate between high-and low-risk populations for the development of new-onset AF. The superiority of the DNN model over age and sex alone is most evident in younger age groups, and we note that no patient <58 years old was predicted as high-risk by the XGBoost model.

We observed 3497 patients out of 181 969 (1.9%) with an ischemic stroke following an ECG within the were not on an anticoagulant at the time of the stroke, 32 were on anticoagulant medications for reasons other than AF, and 2 patients had insufficient records to determine whether they were being treated with anticoagulants at the time of the stroke. Hence, these 375 represent a cohort at risk of AF-related strokes at the time of ECG. To reemphasize, we hypothesized that the DNN would identify many of these ECGs as high-risk for AF. Applying the model (trained on data before 2010) to this deployment test set, we again observed good performance for the prediction of new-onset AF at 1 year (AUROC, 0.83; AUPRC, 0.17). Using an operating point determined by the F 2 score, the sensitivity was 69%, the specificity was 81%, and the number needed to screen (NNS) to find 1 case of new-onset AF at 1 year was 9 (Table) . In addition, 62% (231 of 375) of patients who had an AF-related stroke within 3 years of an ECG ORIGINAL RESEARCH ARTICLE were predicted to be high-risk for new-onset AF (Figure 5) . The NNS to identify AF in 1 patient who developed an AF-related stroke within 3 years of a high-risk prediction was 162. The Table also shows favorable test characteristics in subgroups defined by age, sex, race, comorbidities, clinical setting, and CHA 2 DS 2 -VASc score. 30 The model performance and test characteristics at other operating points in Figure 5 are summarized in Table II in the Data Supplement.

We have shown that a DNN, trained on >1 million 12lead resting ECGs, can predict new-onset AF within 1 year with good performance (AUROC, 0.85). We demonstrated that this DNN outperformed both a clinical model (CHARGE-AF) and an XGBoost model using age and sex within the same dataset. We similarly note the superiority of our performance compared with the reported performances of other models in previous studies: CHARGE-AF (AUROC, 0.77), ARIC (Atherosclerosis Risk in Communities) (AUROC, 0.78), and Framingham heart study (AUROC, 0.78). 26, 31, 32 Moreover, the shorter prediction interval of our model (1 year compared with 5-10 years) allows for a more actionable prediction, and this prediction retains significant prognostic potential over the next 3 decades. We have shown that a large proportion of patients who had an AF-related stroke were predicted to be high risk for new-onset AF before stroke by the DNN model, demonstrating an important proof of concept for potentially using this model to prevent strokes through enhanced AF screening. The DNN model is likely doing 2 different things: detecting paroxysmal AF and predicting incident AF. This is The top row shows the KM curves for subpopulations in age groups: <50 years, 50 to 65 years, and ≥65 years for men (left) and women (right). The bottom row shows the KM curves for the model-predicted (model M0 trained with ECG traces, age, and sex; DNN-ECG-AS) low-risk groups and high-risk groups for new-onset atrial fibrillation for each age group for men and women. It also reflects relative hazards between age groups. The horizontal dotted gray line represents incidencefree proportion of 50%, and vertical lines represent the median survival time for the respective curves. DNN indicates deep neural network; and KM, Kaplan-Meier.

distinct from the study by Attia et al that focused solely on the identification of paroxysmal AF without claiming the ability to predict incident AF. As noted, the results indicate that our DNN model is doing both. Intuitively, the characteristics of the ECG that lead to a high-risk prediction by the DNN will be more prevalent in patients who already have AF but are currently in sinus rhythm. With this in mind, we expect a higher model performance for identification of paroxysmal AF compared with prediction of incident AF, and this is exactly what we see. We also expect a declining rate of newonset AF over the course of 1 year. This is seen in Figure  VII in the Data Supplement and is consistent with rapid identification of paroxysmal AF followed by a slower identification of cases that represent incident AF. The largest piece of evidence supporting our assertion that the DNN model can predict incident AF is the continued separation of the KM incidence-free survival curves up to 30 years after the index ECG, as noted in Figures 3  and 4 . In a retrospective analysis such as this, it is impossible to quantify how much of the new AF found within 1 year was detection of preexisting paroxysmal AF and how much was prediction of truly incident AF. However, from the perspective of preventing AF-related stroke, any finding of newly discovered AF is important, as it allows the opportunity to initiate anticoagulation. More than 25% of all strokes are deemed a result of AF, and ≈20% of strokes caused by AF occur in Results are shown based on model predictions using the full test set, as well as specified population subsets with varying demographic, clinical setting, or comorbidity characteristics. AF indicates atrial fibrillation/flutter. individuals not previously diagnosed with AF. [33] [34] [35] Once AF is detected, anticoagulation is effective at preventing stroke, but screening for AF is difficult because of the paroxysmal nature of AF and the fact that it is often asymptomatic. Screening strategies involving patch monitors, wearables, and other devices can be used to detect AF but are most effective in populations with a high prevalence of AF. The underlying goal for developing this prediction model is to identify a high-risk population that can then be selected for additional monitoring with the goal of finding AF before a stroke.

We simulated such a real-world scenario by applying our model to all ECGs acquired within our large regional health system (Geisinger) over a 5-year period by cross-referencing predicted high-risk ECGs with future ischemic stroke incidences that were deemed potentially preventable (concurrent/subsequent identification of AF). We found that a high proportion (62%) of patients who had an AF-related stroke were correctly predicted as high-risk for AF. The NNS to identify AF in 1 patient who later had an AF-related stroke was 162. This compares favorably with other well-accepted screening tests, including mammography (NNS 476 to prevent 1 breast cancer death ages 60-69 years), 36 prostate specific antigen (NNS 1410 to prevent 1 death from prostate cancer), 37 and cholesterol (NNS 418 to prevent 1 death from cardiovascular disease). 38 Not all patients with AF are at high risk for stroke, and scoring systems such as CHA 2 DS 2 -VASc 30 are commonly used to determine the need for anticoagulation. A CHA 2 DS 2 -VASc score of 2 or greater is the cut point most commonly used to start an anticoagulant, and the Table shows that the model performs well within that subgroup, with an NNS of 8 to find 1 new case of AF. The Table also shows that 92% of patients predicted to be high-risk for AF who later had an AF-related stroke had a CHA 2 DS 2 -VASc score of 2 or greater and were potentially eligible for anticoagulation.

Three points are important to note in evaluating these findings. First, we have counted strokes occurring only at 3 Geisinger hospitals based on the exclusive use of an internal registry. Despite Geisinger's predominantly rural clinical population with low outmigration, some patients in the deployment test set likely had an incident stroke at another facility and were not captured in the registry. This leads to an underestimate of the number of patients at risk for stroke. Second, there was no systematic monitoring strategy to identify AF in the patients in our test set. Identification of new AF undoubtedly occurred in multiple ways, including fortuitous capture of asymptomatic AF as well as ECGs obtained in symptomatic patients. A systematic monitoring strategy implemented as part of the predictive model will capture more AF, as has been borne out in studies of continuous monitors. For example, in the mSTOPS trial (mHealth Screening to Prevent Stroke), monitoring with a patch monitor for up to 4 weeks identified new AF with an incidence of 6.7 per 100 personyears compared with 2.6 per 100 person-years without monitoring. 12 Third, our population of AF-related strokes was purposefully restricted by our definition that AF developed at the time of stroke or within 1 year after the stroke. We expect that some patients with an AF-related stroke would not have had their AF discovered in the 1 year after the stroke. For all of these reasons, we posit that the numbers we report for NNS with respect to both AF and stroke ascertainment represent worst-case scenarios of what would be prospectively realized. A prospective clinical trial is needed to confirm this speculation.

Once the ability to prevent strokes given this AF-prediction paradigm is demonstrated, this screening could be initiated in many different settings and performed through many different methods. With regard to setting, a promising opportunity-particularly for integrated care delivery systems-is the systematic screening of all ECGs in a health system. Specifically, the DNN could be incorporated into the existing workflow, such that every ECG is evaluated, and high-risk studies could be flagged for follow-up and surveillance. Such increased surveillance could take many different forms, including systematic pulse palpation, systematic ECG screening, continuous patch monitors worn once or multiple times, intermittent home screening with a device such as the Kardia mobile, or wearable monitors such as an Apple Watch. 12, 13, 39 Although these methods could be used in isolation to screen for AF, and many clinical trials are currently underway to that end, 40, 41 combination with a DNN predictive model could help to overcome the challenges associated Colored curves denote patients with strokes occurring within 1 (blue), 2 (orange), and 3 (green) years after ECG in the deployment test set. Gray dotted lines represent the corresponding optimal operating thresholds from Table II in the Data Supplement. AF indicates atrial fibrillation.

with the overall low incidence of AF in the general population, especially in younger age groups. Age is generally the predominant risk factor in guiding AF screening strategies, yet in our study, 38% of all new AF (within 1 year of ECG) and 36% of all potentially preventable strokes (within 3 years of ECG) occurred among those younger than 70 years of age ( Figure V in the Data Supplement). Our model can be used in all patients older than 18 years of age and outperformed a machine learning-based model that used age and sex alone. Our focus in this article has been on the potential to prevent AF-related stroke by early identification of newonset AF, but there are other ways in which a model that predicts future AF could be useful. AF is a frequent cause of arrhythmia-induced cardiomyopathy, and a hospital presentation with decompensated heart failure can be the first clinical manifestation of new-onset AF. 42 Enhanced surveillance in those predicted to be high-risk for future AF may therefore lead to a reduction in arrhythmia-induced cardiomyopathy. In addition to allowing early treatment for new-onset AF, a clinical risk prediction tool such as this could be used for the prevention of AF. A high-risk prediction of future AF could bring increased attention to modifiable risk factors such as obesity and obstructive sleep apnea, with the goal of avoiding AF altogether.

We acknowledge some limitations to our study. Although 10-s digital ECG traces are acquired during a resting 12-lead ECG, we had access to only 2.5-s for 9 of the leads and 10-s for the remaining 3 leads. A model using 10-s for all the leads could be considered in the future to maximize model training capabilities. Our analysis was limited to a single health system with a predominantly White population, so the generalizability to other organizations-particularly with a racially diverse populationmust be established. We refer to the strokes in this study as potentially preventable, but in reality, identification of AF alone will not prevent all AF-related strokes. Some patients will either have a contraindication to or are not eligible for anticoagulation, and some who are treated with anticoagulation will still have a stroke. A chart review of the patients identified as having a potentially preventable AF-related stroke revealed that 9% of them were already on anticoagulation for reasons other than AF at the time of the stroke. It is unknown whether a diagnosis of new-onset AF would have affected the treatment plan or outcome in this small subset of patients. A prospective clinical trial is needed to confirm how many strokes can be prevented using a screening strategy on the basis of enhanced monitoring as a result of an AF risk prediction. This DNN approach represents a black-box model such that we do not know the specific features forming the basis of model predictions. Structural changes occur in the atria of patients with AF, and it is possible the DNN is using ECG manifestations of this atrial myopathy to guide the prediction. 43 Although previous work has shown some initial results for model interpretability specific to ECG-based DNN models for mortality predictions, these methods are challenging to generalize on a population level. 23 Acceptance of this limitation is warranted at the present time as more interpretable machine learning methods are not designed to directly leverage the digital ECG data as the DNN does, and there are currently no robust methods available to provide this insight into DNNs, although it remains an active area of investigation.

We have shown that a DNN can automatically analyze data from a resting 12-lead digital ECG to predict the risk of new-onset AF within 1 year with good performance. The model can both detect paroxysmal AF and predict incident AF. This predictive performance surpasses that of currently available clinical models, persists even within ECGs interpreted as normal, and is associated with significant hazard for AF development over the next 30 years. Preliminary data simulating a real-world deployment scenario demonstrate that using this tool identifies a high-risk population for new-onset AF that can be targeted for increased screening and may prove useful for helping to prevent AF-related strokes.

Data sharing: All reasonable requests for raw and analyzed data and related materials, excluding programming code, will be reviewed by our legal department to verify whether the request is subject to any intellectual property or confidentiality constraints. Requests for patient-related data not included in the article will not be considered. Any data and materials that can be shared will be released through a Material Transfer Agreement.

Received April 28, 2020; accepted December 7, 2020. Continuing medical education (CME) credit is available for this article. Go to http://cme.ahajournals.org to take the quiz.

The Data Supplement, podcast, and transcript are available with this article at https://www.ahajournals.org/doi/suppl/10.1161/CIRCULATIONAHA.120.047829.

Research Institute of Neurointervention

Non-rheumatic atrial fibrillation as a risk factor for stroke

Epidemiologic assessment of chronic atrial fibrillation and risk of stroke: the Framingham study

A population-based study of the long-term risks associated with atrial fibrillation: 20-year followup of the Renfrew/Paisley study

Arrhythmia-induced cardiomyopathies: mechanisms, recognition, and management

Meta-analysis: antithrombotic therapy to prevent stroke in patients who have nonvalvular atrial fibrillation

RE-LY Steering Committee and Investigators. Dabigatran versus warfarin in patients with atrial fibrillation

ROCKET AF Investigators. Rivaroxaban versus warfarin in nonvalvular atrial fibrillation

ARISTOTLE Committees and Investigators. Apixaban versus warfarin in patients with atrial fibrillation

Asymptomatic arrhythmias in patients with symptomatic paroxysmal atrial fibrillation and paroxysmal supraventricular tachycardia

Incidence of previously undiagnosed atrial fibrillation using insertable cardiac monitors in a high-risk population: the REVEAL AF Study

ASSERT Investigators. Subclinical atrial fibrillation and the risk of stroke

Effect of a homebased wearable continuous ECG monitoring patch on detection of undiagnosed atrial fibrillation: the mSToPS randomized clinical trial

Apple Heart Study Investigators. Large-scale assessment of a smartwatch to identify atrial fibrillation

Mobile photoplethysmographic technology to detect atrial fibrillation

Prevalence, incidence and lifetime risk of atrial fibrillation: the Rotterdam study

The natural history of atrial fibrillation: incidence, risk factors, and prognosis in the Manitoba Follow-Up Study

Lifetime risk for development of atrial fibrillation: the Framingham Heart Study

Atrial fibrillation burden and short-term risk of stroke: casecrossover analysis of continuously recorded heart rhythm from cardiac electronic implanted devices

American Heart Association Stroke Council; Council on Cardiovascular and Stroke Nursing; Council on Clinical Cardiology; Council on Functional Genomics and Translational Biology; Council on Hypertension. Guidelines for the primary prevention of stroke: a statement for healthcare professionals from the

A randomised controlled trial and cost-effectiveness study of systematic screening (targeted and total population screening) versus routine practice for the detection of atrial fibrillation in people aged 65 and over. The SAFE study

ESC Guidelines for the management of atrial fibrillation developed in collaboration with EACTS

An artificial intelligence-enabled ECG algorithm for the identification of patients with atrial fibrillation during sinus rhythm: a retrospective analysis of outcome prediction

Prediction of mortality from 12-lead electrocardiogram voltage data using a deep neural network

The aVl, aVr, and aVf leads. A simplification of standard lead electrocardiography

XGBoost: A scalable tree boosting system

Simple risk model predicts incidence of atrial fibrillation in a racially and geographically diverse population: the CHARGE-AF consortium

Nonparametric estimation from incomplete observations

Regression models and life-tables

Index for rating diagnostic tests

Refining clinical risk stratification for predicting stroke and thromboembolism in atrial fibrillation using a novel risk factor-based approach: the Euro Heart Survey on atrial fibrillation

A clinical risk score for atrial fibrillation in a biracial prospective cohort (from the Atherosclerosis Risk in Communities [ARIC] study)

Development of a risk score for atrial fibrillation (Framingham Heart Study): a community-based cohort study

Stroke associated with atrial fibrillation-incidence and early outcomes in the north Dublin population stroke study

High prevalence of atrial fibrillation among patients with ischemic stroke

Newly diagnosed atrial fibrillation and acute stroke. The Framingham Study

Preventive Services Task Force. Screening for breast cancer: U.S. Preventive Services Task Force recommendation statement

Screening and prostate-cancer mortality in a randomized European study

Number needed to screen: development of a statistic for disease screening

Kardia Mobile applicability in clinical practice: a comparison of Kardia Mobile and standard 12-lead electrocardiogram records in 100 consecutive patients of a tertiary cardiovascular care center

A Study to Determine If Identification of Undiagnosed Atrial Fibrillation in People at Least 70 Years of Age Reduces the Risk of Stroke (GUARD-AF)

The EAST study: redefining the role of rhythmcontrol therapy in atrial fibrillation: EAST, the Early treatment of Atrial fibrillation for Stroke prevention Trial

Arrhythmia-induced cardiomyopathy: JACC State-of-the-Art Review

Human atrial fibrillation substrate: towards a specific fibrotic atrial cardiomyopathy

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

Adaptive subgradient methods for online learning and stochastic optimization

Network in network

Early stopping-but when? In: Neural Networks: Tricks of the Trade

This work was supported in part by funding from the Geisinger Clinic and Tempus Labs.

Geisinger receives funding from Tempus for ongoing development of predictive modeling technology and commercialization. Tempus and Geisinger have jointly applied for a patent related to the work. None of the Geisinger authors have ownership interest in any of the intellectual property resulting from the partnership. Tables I-II  References 44-48