key: cord-1042637-1nth9b9u authors: Dun, C.; Walsh, C.; Bae, S.; Adalja, A.; Toner, E.; Lash, T. A.; Hashim, F.; Paturzo, J.; Segev, D. L.; Makery, M. A. title: A Machine Learning Study of 534,023 Medicare Beneficiaries with COVID-19: Implications for Personalized Risk Prediction date: 2020-10-30 journal: nan DOI: 10.1101/2020.10.27.20220970 sha: 5e49cf6f928223f6157f1d5d984e5036f587b347 doc_id: 1042637 cord_uid: 1nth9b9u Background: Global demand for a COVID-19 vaccine will exceed the initial limited supply. Identifying individuals at highest risk of COVID-19 death may help allocation prioritization efforts. Personalized risk prediction that uses a broad range of comorbidities requires a cohort size larger than that reported in prior studies. Methods: Medicare claims data was used to identify patients age 65 years or older with diagnosis of COVID-19 between April 1, 2020 and August 31, 2020. Demographic characteristics, chronic medical conditions, and other patient risk factors that existed before the advent of COVID-19 were identified. A random forest model was used to empirically explore factors associated with COVID-19 death. The independent impact of factors identified were quantified using multivariate logistic regression with random effects. Results: We identified 534,023 COVID-19 patients of whom 38,066 had an inpatient death. Demographic characteristics associated with COVID-19 death included advanced age (85 years or older: aOR: 2.07; 95% CI, 1.99-2.16), male sex (aOR, 1.88; 95% CI, 1.82-1.94), and non-white race (Hispanic: aOR, 1.74; 95% CI, 1.66-1.83). Leading comorbidities associated with COVID-19 mortality included sickle cell disease (aOR, 1.73; 95% CI, 1.21-2.47), chronic kidney disease (aOR, 1.32; 95% CI, 1.29-1.36), leukemias and lymphomas (aOR, 1.22; 95% CI, 1.14-1.30), heart failure (aOR, 1.19; 95% CI, 1.16-1.22), and diabetes (aOR, 1.18; 95% CI, 1.15-1.22). Conclusions: We created a personalized risk prediction calculator to identify candidates for early vaccine and therapeutics allocation (www.predictcovidrisk.com). These findings may be used to protect those at greatest risk of death from COVID-19. Early results of new vaccines and therapeutics aimed at fighting the COVID-19 pandemic are promising, however the supply will be constrained given worldwide demand. 1, 2 Initial vaccine administration will be limited to high-risk populations and broader availability will likely be staged over time. On therapeutics, convalescent plasma and Remdesivir have already been supply constrained, placing clinicians in the difficult position of rationing the supply based on risk factors that have not yet been fully elucidated. 3 This dilemma will be further magnified with the market introduction of monoclonal and polyclonal antibodies which have not been preproduced as a part of Operation Warp Speed. 4 Advanced age is a well-known risk factor of COVID-19 death, however, prioritization for vaccine and therapeutic medication administration would be better informed by more complete data. 1,2 Using a machine learning approach, we explored a wide range of factors associated with COVID-19 death in a cohort of 534,023 COVID-19 patients over 65 years of age. A prediction calculator was created to help identify the personal risk of COVID-19 mortality given an individual's age, demographic information, and comorbidity profile. Through a special data arrangement with the U.S. Centers for Medicare and Medicaid Services (CMS), we accessed the Medicare data server which contains 100% Medicare fee-forservice claims in the U.S. We identified people age 65 years or older who were diagnosed with COVID-19 from carrier, inpatient, and outpatient claims between April 1, 2020 to August 31, 2020 . A diagnosis of COVID-19 was identified using the International Statistical Classification of Disease and Related Health Problem, Tenth Revision (ICD-10) code U07.1 (2019-nCoV acute . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted October 30, 2020. ; https://doi.org/10.1101/2020.10.27.20220970 doi: medRxiv preprint respiratory disease). 5 The Medicare Master Beneficiary Summary File (MBSF), which includes beneficiary enrollment information, was used to identify a patient's demographic characteristics and geographic location. The primary outcome for this analysis was mortality among hospitalized patients who were diagnosed with COVID-19. Patient death was identified if either of the following two criteria were found in a patient's inpatient claims: 1) a discharge status was listed as "expired" or 2) a patient status indicator was listed as "died". Patients' sex, age, race, and geographical location were identified from MBSF. Race was classified as White, Black, Hispanic, Asian, which included Pacific Islander, and Other/Unknown, which included American Indian/Alaska Native, other, unknown. We categorized age groups as 65-69, 70-74, 75-79, 80-84, and 85 years or older. Residential zip codes were used to account for geographic differences. Comorbidities of patients were captured using the CMS Chronic Conditions Data Warehouse (CCW). The CMS CCW is a database that lists each of the 67 available comorbidity diagnoses that have been assigned to a Medicare beneficiary between January 1, 1999 and December 31, 2018. 6 We conducted a random forest model, which is a machine learning tool to predict an outcome by creating numerous uncorrelated decision trees to incorporate randomness as a "forest" of trees. During this procedure, the contribution of each predictor towards the predictive accuracy of the resulting model was evaluated; this quantity was referred to as the variable . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted October 30, 2020. ; https://doi.org/10.1101/2020.10.27.20220970 doi: medRxiv preprint importance. We used the variable importance as an auxiliary means to identify the clinical factors with the highest contributions to the predictive accuracy of the model. Hence, by using a random forest model, we were able to capture conditions and comorbidities that were not emphasized in prior studies. Our adjusted multivariate model included comorbidities identified from the random forest model, comorbidities with a prevalence greater than 30%, and comorbidities that had been reported in previous studies. Geographic clustering was performed by using a logistic multilevel random intercepts model of mortality with patients nested within counties. Mixed-effects multivariable logistic regression with a random intercept for the county was conducted to identify the odds ratio of death. Odds ratios and 95% confidence intervals were reported for each risk factor. A personalized risk prediction nomogram was constructed based on the coefficients from the multivariable logistic model to calculate relative risk using the following formula: probability = exp ( From The median age of a COVID-19 diagnosed patient was 77 years (IQR 70-85), and the median age of patients who died was 80 years (IQR 73-87). Each of six comorbidities was present in . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted October 30, 2020. ; https://doi.org/10.1101/2020.10.27.20220970 doi: medRxiv preprint the majority of COVID-19 patients: 80% (n=423,808) had hypertension, 76% (n=402,979) had hyperlipidemia, 63% (n=335,413) had anemia, 62% (n=332,422) had a cataract, 61% (n=325,498) had rheumatoid arthritis/osteoarthritis, and 54% (n=286,025) had ischemic heart disease ( Table 1) . Over 65% of patients that died (n=24,927) had at least one of these comorbidities. Risk factors associated with COVID-19 death which had a low prevalence in the Medicare population over 65 years of age included pressure ulcers and chronic ulcers A random forest model identified an accurate classification rate of 92.9%, and risk factors that are more likely to be a good fit in a model to predict COVID-19 mortality higher than the norm were chronic kidney disease, prostate cancer, the patient's race, pressure ulcers and chronic ulcers, acute myocardial infarction, the patient's sex, and heart failure. We included 20 comorbidities with highest variable importance in the multivariate regression. In an adjusted multivariate model, COVID-19 death was associated with advanced age Table 2) . We used the coefficients of patients' age, sex, race, and all reported risk factors to create a personalized COVID-19 mortality risk prediction calculator. Based on this calculator, an 80-year-old Hispanic man with chronic kidney disease has a mortality risk 6 times higher than a 66-year-old white woman with no comorbidities. The risk calculator is available at www.predictcovidrisk.com. Our study is the largest comorbidity analysis of COVID-19 patients in the U.S to date. Using a large sample size, geographic clustering, and the encompassing pool of independent variables captured in this study, we identified that the comorbidities of sickle cell disease, chronic kidney disease, leukemias and lymphomas, heart failure, and diabetes are all associated with higher rates of COVID-19 death. These results revealed high risk individuals who should be considered for prioritization of vaccine and therapeutic medication allocation. These findings were also used to develop a risk calculator to allow clinicians to identify patients who were at a highest risk of COVID-19 mortality. Our results expanded on findings reported in prior studies describing COVID-19 mortality risk factors. 7, 8, 9 A large UK-based cohort study of 10,926 COVID-19 deaths found that increased age, male sex, and Black race were associated with a higher risk of mortality. 10 Our results showed that with every five-year increase over age 65 years, there was an approximately 20% . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted October 30, 2020. ; https://doi.org/10.1101/2020.10.27.20220970 doi: medRxiv preprint increase in mortality. As consistent with prior reports, we also found racial and ethnic minority groups had a higher risk of COVID-19 mortality compared with White patients. A study in New York of 4,312 COVID-19 patients observed that a Black patients had the highest relative risk compared to other race groups. 11 Other analyses have found Black patients to have the greater risk of COVID-19 death, however we found that Hispanic patients had the highest mortality risk and that black patients were a close second in this Medicare population. [11] [12] [13] The disproportionate impact of COVID-19 on minorities may be related three possibilities. First, there may be a disproportionate burden of undiagnosed comorbidities such as diabetes, HIV, liver disease, cardiovascular disease, asthma, and kidney disease in minority communities. 14 Second, minority patients may be acquiring the infection with a higher viral load given denser settings in which minority populations live, commute and work. The size of the viral load involved in a patient has been associated with the infection fatality rate. 15, 16 These populations may also be more likely to live in multigenerational households where ventilation may be poor and effective social distancing may not be feasible. 17 Finally, minorities may have poorer access to quality health care. [11] [12] [13] Another U.K. study of 20,133 UK patients hospitalized with COVID-19 observed that death was associated with the pre-existing comorbidities of chronic cardiac disease, chronic non-asthmatic pulmonary disease, chronic kidney disease, obesity and liver disease. 18 Similarly, a U.S. cohort study of 11,721 hospitalized patients with COVID-19 in 38 states found the comorbidities of chronic kidney disease and cardiovascular disease to be associated with an increased odds of mortality. 19 A U.S.-based study of 521 patients with chronic kidney disease who became critically ill with COVID-19 described a hazard ratio of 1.25. 20 Our study strongly reinforces the hypothesis that chronic kidney disease is major risk factor for COVID-19 death. We observed that chronic kidney disease was the second leading risk factor among all comorbidities with an adjusted odds ratio of 1.32, second only to sickle cell disease which is rare in the Medicare population over age 65 years. . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted October 30, 2020. ; https://doi.org/10.1101/2020.10.27.20220970 doi: medRxiv preprint Using a large sample size, our study affirmed the association of hypertension, diabetes, chronic obstructive pulmonary disease, cardiovascular disease, obesity, and lung cancer with COVID-19 death, in line with previous large size comorbidities studies. 9-11,21-26 However, different from prior studies, our sample revealed sickle cell disease and leukemias and lymphomas to be major risk factors for COVID-19 death. Previous case studies have reported that sickle cell disease patients had favorable outcomes and suggested that mortality risk in this population was inconclusive. 27, 28 One explanation for our finding may be that sickle cell disease is associated with impaired oxygen exchange, which may be further impeded during the inflammatory phase of the infection. Obesity has been well-described to be a risk factor for COVID-19 death. 10, 18, 19 One study showed that COVID-19 patients with a BMI between 30-34 kg/m2 and >35 kg/m 2 were 1.8 and 3.6 times more likely to require critical care, respectively, when compared to individuals with a BMI of <30 kg/m 2 . 29 We observed a risk lower than described in other studies. We believe the weaker association we report to be due to the well-known low sensitivity of obesity codes in claims data to detect obese and overweight individuals. 30 Therefore, we believe the impact of obesity, overweight, and undocumented metabolic syndrome or pre-diabetes on poor COVID-19 outcomes to be under-reported in this claims based study. Some of the risk factors we identified may be co-linear with institutionalized patients. Specifically, cerebral palsy, chronic ulcers, blindness, Alzheimer's disease and related dementias, and mobility impairments are likely more prevalent in persons living with assisted or nursing care. Higher risk in these populations may be due to the mode of transmission in residential facilities. We reported the first COVID-19 risk factor analysis using machine learning algorithms. Various studies have established the random forest approach as a useful method for modelling risk prediction. [31] [32] [33] We incorporated a random forest approach into the COVID-19 risk factor . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted October 30, 2020. ; https://doi.org/10.1101/2020.10.27.20220970 doi: medRxiv preprint 1 0 exploration to identify variables that could be used to predict a better model. Our results revealed that 60% of comorbidities (n=12) captured from random forest model were also in the list of top 20 comorbidities had the highest odds ratio of COVID-19 death. Overall, random forest as a machine learning approach had a preponderance on selecting important variables that could improve the model fitness. 34 This study had some important limitations. First, we were unable to make conclusions about the infection fatality rate in the community because claims data are specific but not sensitive for infection with COVID-19. Second, we only included inpatient hospital deaths since there is a time lag in the availability of death data outside the hospital setting in the Medicare dataset. As COVID-19 vaccines and therapeutics become available, the risk factors we report could inform the allocation of these limited resources until they become more widely available. The application of these results to the personalized risk calculator may help educate clinicians and the public on which patients aged 65 years or older are at highest risk of COVID-19 mortality. We thank West Health Institute for their research support on this study. We also thank Dr. Brian W. Weir and Dominique Vervoort for their help in preparing this manuscript. . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted October 30, 2020. . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted October 30, 2020. . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted October 30, 2020. ; https://doi.org/10.1101/2020.10.27.20220970 doi: medRxiv preprint Safety and Immunogenicity of SARS-CoV-2 mRNA-1273 Vaccine in Older Adults Safety and immunogenicity of the ChAdOx1 nCoV-19 vaccine against SARS-CoV-2: a preliminary report of a phase 1/2, single-blind, randomised controlled trial Gilead's Remdesivir Supply Will Fall Short of U.S. Need This Summer, Analyst Says Explaining Operation Warp Speed. US Department of Health and Human Services WHO | Emergency use ICD codes for COVID-19 disease outbreak Condition Categories -Chronic Conditions Data Warehouse Clinical course and risk factors for mortality of adult inpatients with COVID-19 in Wuhan, China: a retrospective cohort study Defining the Epidemiology of Covid-19 -Studies Needed Factors associated with COVID-19-related death using OpenSAFELY The association of race and COVID-19 mortality Hospitalization and Mortality among Black Patients and White Patients with Covid-19 The Disproportionate Impact of COVID-19 on Racial and Ethnic Minorities in the United States COVID-19 and Racial/Ethnic Disparities Declining Trend in the Initial SARS-CoV-2 Viral Load final.mp4 Overall decrease of SARS-CoV-2 viral load and reduction of clinical burden: the experience of a Northern Italy hospital. Clinical Microbiology and Infection Features of 20 133 UK patients in hospital with covid-19 using the ISARIC WHO Clinical Characterisation Protocol: prospective observational cohort study Patient Characteristics and Outcomes of 11 721 Patients With Coronavirus Disease 2019 (COVID-19) Hospitalized Across the United States Characteristics and Outcomes of Individuals With Pre-existing Kidney Disease and COVID-19 Admitted to Intensive Care Units in the United States Comorbidity and its impact on 1590 patients with Covid-19 in China: A Nationwide Analysis Prevalence of comorbidities and its effects in patients infected with SARS-CoV-2: a systematic review and meta-analysis Prevalence and Associated Risk Factors of Mortality Among COVID-19 Patients: A Meta-Analysis Epidemiological, comorbidity factors with severity and prognosis of COVID-19: a systematic review and meta-analysis Risk factors of critical & mortal COVID-19 cases: A systematic literature review and meta-analysis Factors associated with COVID-19 hospital deaths in EspĂ­rito Santo, Brazil, 2020 Sickle cell trait and the potential risk of severe coronavirus disease 2019-A mini-review COVID-19 infection and sickle cell disease: a UK centre experience Obesity in Patients Younger Than 60 Years Is a Risk Factor for COVID-19 Hospital Admission Coding of obesity in administrative hospital discharge abstract data: accuracy and impact for future research studies Spatio-temporal estimation of the daily cases of COVID-19 in worldwide using random forest machine learning algorithm Random Forests for Classification in Ecology Evaluation of random forest and regression tree methods for estimation of mass first flush ratio in urban catchments Can machine-learning improve cardiovascular risk prediction using routine clinical data? International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity