key: cord-0776207-nqenxgn8 authors: Rozenfeld, Yelena; Beam, Jennifer; Maier, Haley; Haggerson, Whitney; Boudreau, Karen; Carlson, Jamie; Medows, Rhonda title: A model of disparities: risk factors associated with COVID-19 infection date: 2020-07-29 journal: Int J Equity Health DOI: 10.1186/s12939-020-01242-z sha: bb80c3f09b0d9b541214871e807784a738c73dca doc_id: 776207 cord_uid: nqenxgn8 BACKGROUND: By mid-May 2020, there were over 1.5 million cases of (SARS-CoV-2) or COVID-19 across the U.S. with new confirmed cases continuing to rise following the re-opening of most states. Prior studies have focused mainly on clinical risk factors associated with serious illness and mortality of COVID-19. Less analysis has been conducted on the clinical, sociodemographic, and environmental variables associated with initial infection of COVID-19. METHODS: A multivariable statistical model was used to characterize risk factors in 34,503cases of laboratory-confirmed positive or negative COVID-19 infection in the Providence Health System (U.S.) between February 28 and April 27, 2020. Publicly available data were utilized as approximations for social determinants of health, and patient-level clinical and sociodemographic factors were extracted from the electronic medical record. RESULTS: Higher risk of COVID-19 infection was associated with older age (OR 1.69; 95% CI 1.41–2.02, p < 0.0001), male gender (OR 1.32; 95% CI 1.21–1.44, p < 0.0001), Asian race (OR 1.43; 95% CI 1.18–1.72, p = 0.0002), Black/African American race (OR 1.51; 95% CI 1.25–1.83, p < 0.0001), Latino ethnicity (OR 2.07; 95% CI 1.77–2.41, p < 0.0001), non-English language (OR 2.09; 95% CI 1.7–2.57, p < 0.0001), residing in a neighborhood with financial insecurity (OR 1.10; 95% CI 1.01–1.25, p = 0.04), low air quality (OR 1.01; 95% CI 1.0–1.04, p = 0.05), housing insecurity (OR 1.32; 95% CI 1.16–1.5, p < 0.0001) or transportation insecurity (OR 1.11; 95% CI 1.02–1.23, p = 0.03), and living in senior living communities (OR 1.69; 95% CI 1.23–2.32, p = 0.001). CONCLUSION: sisk of COVID-19 infection is higher among groups already affected by health disparities across age, race, ethnicity, language, income, and living conditions. Health promotion and disease prevention strategies should prioritize groups most vulnerable to infection and address structural inequities that contribute to risk through social and economic policy. severe illness, such as older adults living in long term care facilities, those with a BMI of forty or higher, and immunosuppressed individuals, including people withHIV/AIDS [8] . However, most risk models have not incorporated clinical, sociodemographic, and environmental variables, which may be predictive of community spread within the U.S. As with other infectious diseases, predictors of COVID-19 infection may include employment status, education level, income, and housing conditions [9] , which could influence the ability to seek care, adhere to treatment, and practice physical distancing measures. Thus, effective strategies for predicting risk factors for community transmission should include both clinical and social factors [10] . The latter factors in particular remain understudied, especially among communities of lower socioeconomic status [10] . Emerging data already show that communities of color and/or low socioeconomic status are experiencing disproportionate rates of serious illness if infected, due to preexisting economic and health inequities [11, 12] . By performing large scale analyses, healthcare systems can play a role in investigating patient and population differences in disease susceptibility, distinct from mortality risk. The purpose of this study was to use collated data from an entire health system to identify the apparent sociodemographic and environmental, as well as clinical predictors of the risk of COVID-19 infection and their relevance to persistent health disparities across race, ethnicity, socioeconomic status, language, and age [13] . This study was conducted at Providence Health System, the third largest not-for-profit health system in the U.S., servicing more than five million people across seven states located in the Western and Southwestern portion of the U.S. Data were collected from the Providence enterprise data warehouse. The data elements that were collected were informed by a comprehensive review of prior scientific studies that documented mortality risk factors and the CDC list of groups at higher risk for severe illness [8] . Variables included patient demographic, social, and behavioral history information; chronic conditions documented in clinical history; current conditions; prescribed medications; laboratory testing results; and acute and ambulatory healthcare utilization. To study sociodemographic and environmental variables, electronic medical record (EMR) data was utilized to link patients' locations to the U.S. Census Bureau's 2018 American Community Survey and the CDC air quality data. To join these datasets to EMR data, patient addresses were geocoded, and matched at the census block group or tract level. Glottolog, a repository for the world's languages, was used to assign language groups. Geographic regions and clinical symptoms were also included as variables. Census data on educational attainment and financial insecurity were used to assess socioeconomic status. Patients residing in Alaska, Washington, Oregon, Montana, and California (Los Angeles and parts of Orange County) who were tested for acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infection between February 28, 2020 and April 27, 2020 were included in the data set. Testing mechanisms included swabs from respiratory specimens appropriate for viral RNA testing from eight testing platforms. The principle dependent variable for our model was COVID-19 infection, as indicated by a positive lab test. Distributions of all continuous variables including age, BMI, number of medications, and neighborhood financial insecurity were examined for normality and transformed into categorical attributes. Comorbidities were determined by problem list documentation or clinical encounter diagnoses using standard International Classification of Diseases, Tenth Revision, Clinical Modification (ICD-10-CM) nomenclature and further summarized into a measure of disease severity using total number of chronic conditions. Substance, tobacco, and alcohol consumptions were captured from social history assessments and clinician documentation. The following variables were used as indicators of physical proximity to other people (i.e., structural barriers to social distancing): transportation insecurity, relationship status, employment, housing insecurity, and age-stratified communal living. Descriptive statistics were used to summarize study participants. Continuous variables were described by means and standard deviations, while categorical variables were described using frequencies and percentages. We conducted bivariate analysis to assess a significant effect of each factor on the outcome. All covariates with p < 0.25 in the bivariate analysis were considered for model inclusion since use of a more traditional level of 0.05 often fails to identify variables whose association with the outcome could become stronger in the presence of other variables [14] . In addition, all variables of known clinical importance found in previous studies that could make an important contribution were included to improve upon previous models [1] . Beginning with all variables of interest, a stepwise selection with backward elimination was used to create a multivariable logistic regression model for predicting risk of infection. Initial parameters for the model were identified in the training set and then tested at the subsequent step, with data randomly partitioned into two independent data subsets: 80% for training and building the model and another 20% for testing. Missing data was recoded as unknown and included in the analysis. Detailed covariate definitions and data sources are shown in the supplement. The model's ability to discriminate COVID-19 infection in the validation data set was evaluated using the area under the receiver operating characteristic curve and Hosmer-Lemeshow goodness-of-fit statistic. The observed and expected frequencies within each decile of risk was compared [14] . All data manipulation and modeling were completed in SAS EG (SAS Institute, Carry NC). For all independent predictor subgroups, the risk of COVID-19 infection was quantified with odds ratios (OR) and 95% confidence intervals. These risks were calculated using the entire data set. A total of 34,503 COVID-19 tested patients were included in the study ( Table 1 ). The average age was 50 years old (SD 20), 59.6% (21,209) were female, 12% (4183) were identified as non-white race, and 66% (22, 610) had at least one comorbidity. Within the study population, 7.5% (2578) patients tested positive and 92.5% (31,925) tested negative for COVID-19. Of patients testing positive, 36% (924) were hospitalized and 9% (240) died during the study period. Table 2 shows the twenty-nine sociodemographic, clinical, and environmental covariates associated with odds of infection. Comparatively, individuals between 50 and 59 years of age ( Patients living in areas with low air quality (OR 1.01; 95% CI 1.0-1.04, p = 0.05), financial insecurity (OR 1.10; 95% CI 1.01-1.25, p = 0.04), transportation insecurity (OR 1.11; 95% CI 1.02-1.23, p = 0.03), or housing insecurity (OR 1.32; 95% CI 1.16-1.5, p < 0.0001) were at higher risk of infection. Living in senior living facilities was associated with greater infection risk (OR 1.69; 95% CI 1.23-2.32, p = 0.001). The model performed consistently across training and testing data sets with a receiver operating characteristic area under the curve of 0.78 and the Hosmer-Lemeshow chi-square of 4.4 (p = 0.81). The probabilities of infection was partitioned into "deciles of risk" (i.e. equal groups from smallest to the largest) did not highlight any "underperforming" areas. This retrospective study of the risk of COVID-19 infection identified several clinical risk factors also associated with serious illness in prior studies, including older age [3] , male gender [15] , diabetes [7] , chronic kidney disease [16] , high BMI [17] , and immunosuppression [18] . However, some factors previously found to increase mortality risk, such as hypertension [3] , and cardiovascular disease, liver disease, lung disease, or asthma [8] , were not significant factors associated with initial COVID-19 infection. Surprisingly, being prescribed more than ten medications or having a greater number of chronic conditions was associated with less infection risk, suggesting possible risk reduction behavior based on perceived risk. Further research is needed to understand the differences between factors associated with initial infection risk and those associated with serious illness and mortality once the infection occurs. Healthcare access through a relationship with an internal primary care provider was associated with a lower infection risk; however, this may be a result of higher rates of testing for COVID-19 compared to individuals with no primary care provider. Patients without a primary care provider may have only been tested for COVID-19 after respiratory and other possible COVID-19 symptoms became conspicuous, thus increasing the probability of a positive test. Receiving secure electronic communication through the EMR was associated with lower risk of infection, suggesting that access to health advice and education may reduce risk. Serious mental illness and drug and tobacco use were associated with lower risk; however further study is necessary to understand the mechanisms behind such associations. Race and ethnicity appeared to be important predictors of risk. Higher risk of infection among Black, indigenous, and/or people of color may be associated with other sociodemographic and environmental characteristics found to also be significant in this study. African Americans and Latinos are more likely to live in communities with poor air quality [19] , work in jobs that cannot telecommute [20] , and lack access to healthcare [21] which may increase the risk of infection and contribute to racial disparities in mortality. Additionally, chronic conditions such as obesity, stroke, and diabetes, and premature death also affect African Americans and Latinos disproportionately compared to whites [13] . Communities of color are also more likely to experience lower socioeconomic status [22] , and be employed as essential workers [10] . Additionally, for these and other vulnerable groups, lack of personal transportation is both a barrier to healthcare access [23] and social distancing, further exacerbating infection risk. For these reasons, communities of color experience more structural barriers to social distancing measures and are more vulnerable to severe illness. Having limited English proficiency can be a barrier to accessing health services and understanding health information, especially when written translations and/or trained translators are not available [24] . Over the course of the pandemic, health information has changed rapidly (e.g., mandates for masking), which can create barriers to accessing information and could leave indigenous and immigrant communities uninformed. During the Ebola epidemic in West Africa, language barriers were an obstacle to slowing the spread of the disease [25] . People with LEP are also more likely to have low health literacy compared to English speakers and are at a higher risk of poor health [26] . Culturally and linguistically appropriate interventions are essential, including communication materials of differentformats and reading levels developed through the collaboration of native language speakers and English speakers, as well as the use of community health workers that can engage with underserved groups [27] . Older age may be considered both a clinical and an environmental risk factor, as it moderates both comorbidities (e.g., dementia) requiring caregiving and housing situations (e.g., living in senior communities). Our results showed that some sociodemographic patient characteristics that influence environmental exposure to social contact were also associated with increased rates of COVID-19 infection, such as being married or having a significant other, being employed, lacking access to a personal vehicle, and living in overcrowded housing, each of which significantly increased infection risk. Religious affiliation was also associated with increased risk, which may be attributed to attendance of large religious services or other behaviors associated with religious identity. People experiencing housing insecurity may experience challenges with physical distancing, especially when housing is crowded. These individuals may also lack hand washing facilities and/or running water [28] . Both factors could facilitate community spread of infectious diseases. Regional differences in infection risk were evident, with Southern California and the Western Washington having the highest infection rates (15.7 and 11.3% of tested patients) while Oregon and Alaska (4.3 and 4.7%) had the lowest rates. These regional differences may reflect some combination of population density, proximity to the initial points of COVID-19 entry into the U.S., and state-specific COVID-19 precautions. This study was limited to patient data from the Providence Health System, and publicly available data sets. Although the organization serves a diverse patient population across seven Western U. S states, the generalizability of this study to the entire U.S is unclear. With limited testing available and evolving screening guidelines, clinical discernment and personal bias may have impacted which individuals received testing and thus may have influenced the rates of testing in certain populations. Additionally, it is impossible to correlate patient data to measures of individual patient behaviors, such as mask use or adherence to social distancing recommendations. Finally, this study focused on factors associated with initial infection risk, however other factors may further influence outcomes such as disease severity, time in hospital, and mortality. Our construction of a multi-faceted prediction model of COVID-19 infection risk in our large, multi-state population has important implications for healthcare systems, public health departments, and city and state governments to further reduce the risk of infection and prevent the spread of COVID-19 in communities that may be disproportionately impacted. Knowledge of the complex mixture of clinical, ethnic, linguistic, and environmental factors that contribute to infection risk should enable more targeted public health approaches to decrease COVID-19 infection. Linguistically and culturally appropriate prevention education, healthcare access including routine care and Prediction models for diagnosis and prognosis of covid-19 infection: systematic review and critical appraisal Clinical course and risk factors for mortality of adult inpatients with COVID-19 in Wuhan, China: a retrospective cohort study Case fatality rate in COVID-19: a systematic review and meta-analysis Predictors of mortality for patients with COVID-19 pneumonia caused by SARS-CoV-2: a prospective cohort study Risk factors of fatal outcome in hospitalized subjects with coronavirus disease from a nationwide analysis in China Risk factors for severity and mortality in adult COVID-19 inpatients in Wuhan Risk factors associated with clinical outcomes in 323 COVID-19 patients in Wuhan, China People who are at higher risk for severe illness. National Center for Immunization and Respiratory Diseases (NCIRD), Division of Viral Diseases Infectious disease, social determinants and the need for intersectoral action Importance of collecting data on socioeconomic determinants from the early stage of the COVID-19 outbreak onwards Low-income and communities of color at higher risk of serious illness if infected with coronavirus: Kaiser Family Foundation COVID-19 hospitalizations and deaths across New York City boroughs. JAMA. 2020:e207197 Board on Population Health and Public Health Practice. Chapter 2: the state of health disparities in the United States Applied logistic regression Predictive factors for disease progression in hospitalized patients with coronavirus disease 2019 in Wuhan Chronic kidney disease is associated with severe coronavirus disease 2019 (COVID-19) infection Clinical characteristics and outcomes of 112 cardiovascular disease patients infected by 2019-nCoV COVID-19 with different severity: a multicenter study of clinical features Temporal trends in air pollution exposure inequality in Massachusetts Workers who could work at home, did work at home, and were paid for work at home, by selected characteristics, averages for the period Health disparities: gaps in access, quality and affordability of medical care Understanding associations among race, socioeconomic status, and health: patterns and prospects Traveling towards disease: transportation barriers to health care access Health Literacy: A Prescription to End Confusion Ebola outbreak, Sierra Leone: communication: challenges and good practices Low health literacy, limited English proficiency, and health status in Asians, Latinos, and other racial/ethnic groups in California Integrating literacy, culture, and language to improve health care quality for diverse populations Environmental health disparities in housing Committee on Capitalizing on Social Science and Behavioral Research to Improve the Public's Health. Introduction Publisher's Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations COVID-19 testing, and efforts to address substandard housing and hazardous working conditions are essential to reducing risk among vulnerable groups, especially communities of lower socioeconomic status which experience a greater incidence of infectious diseases [29] . Now, and as communities seek to "re-open," addressing the disparities in infection that contribute to rates of serious illness and mortality are needed to alleviate the disproportionate burden of the pandemic and persisting health disparities. Supplementary information accompanies this paper at https://doi.org/10. 1186/s12939-020-01242-z. Authors' contributions YR and JB were responsible for study design, data collection, data management, and data analysis. All authors were responsible for data interpretation. YR, HM and WH wrote the first draft of the manuscript. HM, and JC were responsible for the scientific literature review. All authors contributed to the final draft. All authors read and approved the final manuscript.Author's information All authors work in the area of Population Health, focusing on care management approaches for patients, communities and populations, especially the most poor and vulnerable. This was an internally funded study, with no external financial interest. The study was aimed to improve patient and population outcomes and support the healthcare system's response to COVID-19. The corresponding authors had full access to all data in the study and had final responsibility for the decision to submit for publication. The datasets generated and analyzed during the current study are not publicly available as stipulated by the Providence IRB that all patient level data would reside within Providence secured computer network, only accessible to the study investigators, and locked up on Providence property. The publicly available data source was accessed via a proprietary data vendor, which cannot be shared publicly due to their contractual agreement. The underlying publicly available data sources include the 2018 American Community Survey, the Centers for Disease Control and Prevention Air Quality and Glottolog.Ethics approval and consent to participate The Providence Institutional Review Board (IRB) approved this study for all gathered data and analysis. In accordance with 45 CFR 46.116(d), a waiver of informed consent a Waiver of Authorization were approved in accordance with 45 CFR 164.512(i) [2] (ii) on 4/2/2020 under Expedited Review Procedures. The IRB was satisfied that the use or disclosure of protected health information involved no more than a minimal risk to the privacy of individuals. Not applicable. The authors declare that they have no competing interests.