key: cord-0921711-j18rw71q authors: McLaughlin, John M; Khan, Farid; Pugh, Sarah; Angulo, Frederick J; Schmidt, Heinz-Joe; Isturiz, Raul E; Jodar, Luis; Swerdlow, David L title: County-Level Predictors of COVID-19 Cases and Deaths in the United States: What Happened, and Where Do We Go from Here? date: 2020-11-19 journal: Clin Infect Dis DOI: 10.1093/cid/ciaa1729 sha: c01492ccc736d062d444b21801cf6651f52c73b7 doc_id: 921711 cord_uid: j18rw71q BACKGROUND: The United States has been heavily impacted by the COVID-19 pandemic. Understanding micro-level patterns in US rates of COVID-19 can inform specific prevention strategies. METHODS: Using a negative binomial mixed-effects regression model we evaluated the association between a broad set of US county-level sociodemographic, economic, and health-status-related characteristics and cumulative rates of laboratory-confirmed COVID-19 cases and deaths between January 22, 2020 and August 31, 2020. RESULTS: Rates of COVID-19 cases and deaths were higher in US counties that were more urban or densely-populated or that had more crowded housing, air pollution, women, 20–49-year-olds, racial/ethnic minorities, residential segregation, income inequality, uninsured, diabetics, or mobility outside the home during the pandemic. CONCLUSIONS: To our knowledge, this study provides the most comprehensive multivariable analysis of county-level predictors of rates of COVID-19 cases and deaths conducted to date. Our findings make clear that ensuring that COVID-19 preventive measures, including vaccines when available, reach vulnerable and minority communities and are distributed in a manner that meaningfully disrupts transmission (in addition to protecting those at highest risk of severe disease) will likely be critical to stem the pandemic. The pandemic of 2019 novel coronavirus disease caused by severe acute respiratory syndrome-related coronavirus (SARS-CoV-2) is ongoing, and the United States has been heavily impacted. The US population, however, is geographically and sociodemographically diverse, and understanding micro-level patterns in rates of COVID-19 cases and deaths can inform specific prevention strategies and the titration of public-health responses at the federal, state, and local levels. This need is heightened as the US economy, schools, and daily-life begin re-opening against the backdrop of uncertainty about whether a resurgence of COVID-19 will emerge with the upcoming flu season. Although previous studies have evaluated the impact of various sociodemographic or environmental factors on risk of developing or dying from COVID-19 (e.g., race/ethnicity [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] , poverty [2] , air pollution [12] , mobility [13] , population density [14] , chronic medical conditions [15] [16] [17] [18] )-these factors have largely been examined in isolation. Moreover, most analyses were conducted early in the pandemic. The Centers for Disease Control and Prevention (CDC) recently presented preliminary data describing the association between an aggregated 'social vulnerability index' and the likelihood of becoming a CDC-designated COVID-19 'hotspot' [19] . However, additional comprehensive evaluations of COVID-19 disease trends are needed to inform future public-health strategies against the complexities of COVID-19. To help pinpoint prevention strategies, including vaccination once available, we evaluated the association between a broad set of county-level environmental, sociodemographic, economic, and health-status-related characteristics on rates of COVID-19 cases and deaths in the United States. A c c e p t e d M a n u s c r i p t 4 We obtained county-level records of the cumulative number of COVID-19 laboratoryconfirmed cases and deaths from the Johns Hopkins University Coronavirus Resource Center available between January 22, 2020 and August 31, 2020. This source tracks and makes publicly available county-level COVID-19 data reported by CDC and state health departments. Cumulative county-level rates of COVID-19 cases and deaths through August 31, 2020 were expressed per 100,000 county residents. County-level environmental, sociodemographic, economic, and health-status characteristics hypothesized to be associated with transmission or mortality of COVID-19 were obtained from several publicly-available databases maintained by the US government or private institutions. These data were collated and then combined with Johns Hopkins countylevel COVID-19 data to form the analysis database. Environmental factors included population density, urbanicity, residential crowding (housing with >1 person per room [20] ), and air pollution (particles per million [PPM]). Sociodemographic and economic variables included gender, age, race/ethnicity, a residential housing segregation index (0-100 scale, with 100 being most-segregated counties between whites and non-whites [21] ), high school education status, unemployment status, state-adjusted median household income, and income inequality (ratio of household incomes at 80 th vs 20 th percentiles [22] ). Health-status-related variables included prevalence of diabetes, obesity, and smoking, and, as a potential indicator of risky close-contact behavior, rates of sexually transmitted infections (STI) [23] . Finally, as a proxy for adherence to stay-at-home orders and recommendations to minimize travel [24], we obtained Google Community Mobility Reports describing percent change in county-level A c c e p t e d M a n u s c r i p t 5 travel to non-residential locations during the pandemic compared to a pre-pandemic baseline period [25] . The baseline period was defined as the median value from the five-week period between January 3, 2020 and February 6, 2020 [25] . A list of all exposure variables, including definitions and data sources, are provided in eTable 1. County-level characteristics were summarized with descriptive statistics. Missing county-level characteristics (in <1% of the US population) were imputed using state-level values (eTable 1). Google mobility data, when missing from the least-populous counties due to privacy concerns, were not imputed (eTable 1). Using the menbreg command in Stata version 14.0 (StataCorp LLC, College Station, Texas), we fit negative binomial mixed-effects regression models (which allow for overdispersion) [26] to estimate county-level predictors of cumulative rates of COVID-19 cases and deaths. To estimate rates, we modeled cumulative cases and deaths by county, controlling for county population size as an independent variable. For all models, we included state (n=51; 50 states and the District of Columbia) as a group-level random intercept to account for potential correlation in counties within the same state (e.g., state-level testing practices, lockdown measures, and other healthrelated, social, and cultural differences). Because exposure variables were likely to independently predict COVID-19 rates and confound the relationship between one another, we constructed univariate and multivariable models. If a large change in point estimates occurred between univariate and multivariable models, we constructed stepwise parsimonious models to understand which covariates were key confounders. We assessed multicollinearity using variance inflation factors (VIF) to ensure multivariable models were not overfitted. A c c e p t e d M a n u s c r i p t 6 Between January 22, 2020 and August 31, 2020, the numbers of laboratory-confirmed COVID-19 cases and deaths in the United States were 5,916,357 and 180,886, respectively. Cases across 3142 US counties ranged from 0 to 241,768, with Los Angeles County, California having the most (4% of all US cases). Only 41/3142 (1%) counties reported no cases. No deaths were reported in 686/3142 (22%), however, these counties made up only 3% of the US population. The most deaths, 7290, occurred in Kings County, New York. Table 1 summarizes county characteristics. Google mobility data were not available for 309/3142 (10%) counties, which accounted for <1% of the US population. County-level rates of COVID-19 cases ranged from 0 to 14,338 per 100,000 persons, with mean=1422 (95%CI=1377-1466) and median=1059 with interquartile range (IQR)=568-1897 ( Table 1 ). The highest COVID-19 rate occurred in Trousdale County, Tennessee driven by an outbreak of >1300 cases at a prison [27] . Overall, 33/51 (65%) and 44/51 (86%) states had ≥1 county in the top decile and quartile of rates, respectively. eTable 2 compares county characteristics by quartiles of COVID-19 rates. In univariate results, counties with higher proportions/rates of the following: population density, urbanicity, crowded housing, air pollution, females, 30-49-year-olds, racial/ethnic minorities, residential housing segregation, adults without a high school degree, obesity, STIs, and travel outside the home during the pandemic had higher rates of COVID-19 cases (all P<.05; Table 2 ). Counties with higher proportions of adults 50-64 and ≥80 years, diabetes, and that had higher household income had lower rates at the univariate level. Multivariable models (n=2833 when restricted to counties with Google mobility data; 51 A c c e p t e d M a n u s c r i p t 7 states) that adjusted for all exposure variables simultaneously revealed generally similar trends to univariate results, however, the magnitude of some variables (i.e., population density, crowded housing) was reduced in multivariable models ( Table 2; eTable 3) . Additionally, while significant in univariate results, in the multivariable model, the following were no longer related to COVID-19 rates and seemed to be explained by other factors in the model: Asian race, age groups 50-64 and ≥80 years, high school education, obesity, and STI rates ( Table 2) . eTable 3 shows stepwise modeling for independent variables with large changes in the point estimate between univariate and fully-adjusted models (i.e., population density, crowded housing, and Asian race) to elucidate which other covariates were key confounding factors in these instances. Table 2 ). Additionally, for each 1 PPM increase in air pollution or 10% increase in county-level urbanicity, income, proportion racial/ethnic minorities, residential housing segregation, income inequality, or diabetes, COVID-19 rates were 1.09-1.24 times higher in the multivariable model (all P<.05; Table 2 A c c e p t e d M a n u s c r i p t 8 County-level rates of COVID-19 deaths ranged from 0 to 461 per 100,000 (Table 1) , with the highest rate in Hancock County, Georgia-driven by nursing home outbreaks in a rural, predominately-minority, and underserved community [28] . eTable 4 compares county characteristics by quartiles of mortality rate. In univariate results, all county-level variables except diabetes prevalence were related to mortality rates ( Table 2 A c c e p t e d M a n u s c r i p t 9 To our knowledge, this study provides the most comprehensive multivariable analysis of county-level predictors of rates of COVID-19 cases and deaths conducted to date. Our findings, current through the end of August 2020, have significant implications for COVID-19 prevention strategies, including vaccination. While many county-level factors were related to COVID-19 rates, there are two key takeaways from our research. First, our findings confirm and expand upon earlier reports [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] and preliminary data from CDC [19] that the pandemic has taken a disproportionate toll on minority and other vulnerable [29] US populations. Specifically, rates of COVID-19 cases and deaths were higher in counties with more racial/ethnic minorities, residential housing segregation, income inequality, uninsured persons, air pollution, and adults with diabetes. Our findings on this topic, however, are novel in that they confirm these disparities exist even after adjustment for other potentially-confounding factors. For example, even after adjustment for mobility during the pandemic, population density, urbanicity, crowded housing, age, education, employment, and health insurance status, and the prevalence of diabetes, obesity, and smoking, for every 10% increase in the proportion of a US county that was black or Hispanic, there was a corresponding 9% and 17% increase in the rate of COVID-19 cases and a 16% and 24% increase in mortality, respectively. Compounding this, more residential housing segregation and income inequality were both independently related to higher county-level rates of cases and deaths. These findings confirm there may be larger structural forces behind racial/ethnic differences in COVID-19 rates beyond the factors we measured and warrants continued research. Recent reports have highlighted that many of the vulnerable populations we identified at increased risk for COVID-19 (e.g., minorities, uninsured, and those without a high school degree) disproportionately serve in 'essential' pandemic front-line jobs (e.g., grocery clerks, These jobs often cannot be done at home, which increases workplace exposure to SARS-CoV-2 [5, 6] . Indeed, we confirmed that counties with more travel outside the home during the pandemic had higher rates of COVID-19 cases and deaths. Future studies should evaluate the link between vulnerable and minority communities and workplace exposure with individual-level data, and more studies of occupation-specific risks for COVID-19 are needed. In the near term, redirecting public-health resources (e.g., testing, contact tracing, Although our finding that counties with higher state-adjusted household incomes had higher rates of COVID-19 cases and deaths initially seemed counterintuitive to our other findings highlighting vulnerable communities-several potential explanations for this exist. For example, COVID-19 hit coastal counties-where incomes are highest-especially hard early on. Further, nursing home death rates were also especially high among high-income states on the East coast [35] . Additionally, there may be better access to testing (and thus more confirmed cases) in areas with higher income [36] . Another possibility is that countylevel income inequality, rather than income level alone, may better predict vulnerable communities, as we found that higher county-level income inequality predicted higher A c c e p t e d M a n u s c r i p t 11 COVID-19 rates. This is consistent with previous reports showing that even within counties with high median household incomes, vulnerable pockets of communities with more economic and social stress and less access to medical care can exist and often experience disparate health outcomes [37] . This finding ultimately suggests that identifying populations at increased risk for COVID-19 is multifaceted and that a multivariable approach like ours or multidimensional risk-score approach (as is being explored by CDC [19] ) will be needed to accurately pinpoint areas at high-risk of becoming COVID-19 hotspots. Our second major finding was that our study confirms anecdotal reports [38] that efforts to interrupt COVID-19 transmission, including with vaccination when available, may be equally as impactful on mortality as is protecting individuals at highest risk for severe disease (e.g., the elderly and those with comorbidities [39, 40] ). Specifically, we identified several county-level factors (e.g., population density, urbanicity, crowded housing, and mobility outside of the home during the pandemic) that independently predicted county-level COVID-19 mortality rates, despite not being related to COVID-19 case-fatality or the development of severe disease [39, 40] . One interpretation is that COVID-19 has hit hardest in communities where adhering to social-distancing guidelines may be more difficult due to high population density, an urban setting (with potentially more reliance on public transit and multi-unit housing), or crowded living arrangements (e.g., multigenerational families [41] ). These readily-available metrics could be used to prioritize early vaccination efforts when the number of doses may be limited. Moreover, while it was perhaps not surprising that counties with more 20-49-year-olds seemed to have higher rates of COVID-19 illness (given presumably more exposure or a perceived lower risk for severe disease and thus take socialdistancing guidelines less seriously), it was unexpected that higher proportions of 20-49- year-olds also predicted higher county-level mortality rates. Because individual-level casefatality rates are markedly lower in this age group [42] , this finding suggests that adults <50 A c c e p t e d M a n u s c r i p t 12 years of age are likely driving transmission (and thus indirectly impacting county-level mortality rates). Similarly, although individual-level reports have previously identified men at increased risk of developing severe COVID-19 [43] , we unexpectedly found that counties with more women had higher rates of COVID-19 cases and deaths. Future studies should also explore the role of women in driving transmission (e.g., disproportionately working in healthcare or other 'essential' jobs [30] or caring for children or other family members during the pandemic). Finally, while the proportion of children <20 years of age was not related to higher rates of COVID-19, this age group will be returning to daycare and school, and engaging in more extracurricular activities over the coming months. Thus, their role in determining COVID-19 rates should be continuously monitored to further elucidate the role children play in driving community-level disease rates and the impact that interrupting transmission in this age group might have [44] . It remains unclear whether communities with higher COVID-19 rates to date would again be at highest risk during a potential resurgence this fall or winter, or if herd immunity in these communities is approaching levels needed to meaningfully slow transmission [45, 46] . For example, a recent report suggested that in some hard-hit, vulnerable communities in New York City, antibody levels could already be >50% [47] . Thus, despite our findings to date, it is also possible that communities with lower rates of COVID-19 until now may be more susceptible (because of lower levels of immunity) to future waves of COVID-19. However, while it was hypothesized that communities first hit hard in the spring during the H1N1 influenza pandemic would be less likely to experience a subsequent 'second wave' during the following influenza season (due to higher levels of herd immunity), this was not the case-suggesting that elevated spring illness did not protect against an autumn resurgence [48] . Thus, continually monitoring whether the same trends in COVID-19 rates we report A c c e p t e d M a n u s c r i p t 13 here are observed again throughout the rest of 2020 may be an indication of the level of immunity in communities that have been most susceptible to date. Our study was ecological, and our findings should be confirmed with individual-level data. We did not have county-level data about specific social-distancing measures like maskwearing, bar, restaurant, and retail closures, or other local-level restrictions on large gatherings. However, we included county-level data describing mobility during the pandemic-which is a proxy for social-distancing measures [13] . Another limitation is that our data, apart from our outcome variables and Google mobility data, were historical. Thus, data about unemployment and health insurance status, household income, and other sociodemographic and environmental factors did not necessarily reflect the situation during the pandemic. Additionally, not all exposure data came from the same year. However, we obtained the most-recent estimates from all data sources and most of the data describing county-level characteristics were based on estimates from the last two years. When modeling COVID-19 mortality rates, 22% of counties reported no deaths. These counties, however, accounted for only 3% of the US population. Moreover, negative binomial regression models, which we used in our analysis, allow for overdispersion (which can result from excess zeros) and straightforward interpretation, and have been shown to model count data with zeros as well as other zero-inflated Poisson models [49] . Finally, we did not have data describing county-level SARS-CoV-2 testing practices. Vulnerable communities-which had higher COVID-19 rates in our study-have historically had reduced access to healthcare [8] and to SARS-CoV-2 testing [36]. Thus, disparities in COVID-19 rates among vulnerable and minority communities could be more pronounced after adjusting for local testing practices. Lower testing rates in minority neighborhoods [36] may also explain why we saw more pronounced racial/ethnic differences in mortality compared to rates of confirmed cases. More A c c e p t e d M a n u s c r i p t 14 research about community-specific testing and its impact on disparities in COVID-19 rates is needed. Our study gives a comprehensive, granular, and contemporary overview of which areas were most affected by COVID-19 in the United States through the summer of 2020. While the outbreak has now spread across the entire country at a macro level without a great deal of discrimination, micro-level county-by-county disparities in how the pandemic spread were more pronounced. A vaccine is likely the only alternative to balancing restrictive measures like forced lockdowns and closures to protect vulnerable and minority populations who have been disproportionately impacted by the COVID-19 pandemic to date, and the dire economic consequences these measures often bring to the same working-class communities. Our findings make clear that ensuring that COVID-19 preventive measures, including vaccines when available, reach vulnerable and minority communities and are distributed in a manner that meaningfully disrupts transmission (in addition to protecting those at highest risk of severe disease) will likely be critical to stem the pandemic. Historically speaking [38, 50, 51] , this too will be a formidable public-health challenge. M a n u s c r i p t Table 1 Summary of county-level characteristics across 3142 US counties A c c e p t e d M a n u s c r i p t 27 Assessing differential impacts of COVID-19 on black communities Assessment of Community-Level Disparities in Coronavirus Disease 2019 (COVID-19) Infections and Deaths in Large US Metropolitan Areas Hospitalization and Mortality among Black Patients and White Patients with Covid-19 Disparities in Incidence of COVID-19 Among Underrepresented Racial/Ethnic Groups in Counties Identified as Hotspots During Characteristics of Adult Outpatients and Inpatients with COVID-19 -11 Academic Medical Centers Patients In A Large Health Care System In California Racial and Ethnic Disparities Among COVID-19 Cases in Workplace Outbreaks by Industry Sector -Utah The Structural and Social Determinants of the Racial/Ethnic Disparities in the U.S. COVID-19 Pandemic: What's Our Role? COVID-19 and African Americans Association of Race With Mortality Among Patients Hospitalized With Coronavirus Disease 2019 (COVID-19) at 92 US Hospitals Characteristics of Persons Who Died with COVID-19 -United States Exposure to air pollution and COVID-19 mortality in the United States: A nationwide cross-sectional study Association between mobility patterns and COVID-19 transmission in the USA: a mathematical modelling study Does Density Aggravate the COVID-19 Pandemic? Preliminary Estimates of the Prevalence of Selected Underlying Health Conditions Among Patients with Coronavirus Disease 2019 -United States Hospitalization Rates and Characteristics of Patients Hospitalized with Laboratory-Confirmed Coronavirus Disease 2019 -COVID-NET, 14 States Clinical Characteristics and Morbidity Associated With Coronavirus Disease 2019 in a Series of Patients in Metropolitan Detroit Coronavirus Disease 2019 Case Surveillance -United States Disparities in COVID-19 Incidence, Severity, and Outcomes Department of Housing and Urban Development, Office of Policy Development and Research County Health Rankings & Roadmaps. Residential segregation -white vs non-white Where health disparities begin: the role of social and economic determinants--and why current policies may make matters worse A Vaccine That Stops Covid-19 Won't Be Enough Patient Characteristics and Outcomes of 11,721 Patients with COVID19 Hospitalized Across the United States Characteristics Associated with Hospitalization Among Patients with COVID-19 Covid-19 Stalks Large Families in Rural America: Remote regions with crowded households have turned deadlier than some city blocks COVID-19 Hospitalization and Death by Age Men and COVID-19: A Biopsychosocial Approach to Understanding Sex Differences in Mortality and Recommendations for Practice and Policy Interventions Epidemiology and transmission dynamics of COVID-19 in two Indian states A mathematical model reveals the influence of population heterogeneity on herd immunity to SARS-CoV-2 Pre-existing immunity to SARS-CoV-2: the knowns and unknowns Million Antibody Tests Show What Parts of N.Y.C. Were Hit Hardest Investigating the effect of high spring incidence of pandemic influenza A(H1N1) on early autumn incidence Logistic Regression Using SAS: Theory and Application Racial and Ethnic Disparities in Vaccination Coverage Among Adult Populations in the U.S Disparities in uptake of 13-valent pneumococcal conjugate vaccine among older adults in the United States A c c e p t e d M a n u s c r i p t 16 A c c e p t e d M a n u s c r i p t 19 and-economic-factors/family-social-support/residential-segregation-non-whitewhite. Accessed September 1. County Health Rankings & Roadmaps. Income inequality. Available at:https://www.countyhealthrankings.org/explore-health-rankings/measures-datasources/county-health-rankings-model/health-factors/social-and-economicfactors/income/income-inequality. Accessed September 1. i n = 2833. Google mobility data were not available for 309/3142 (10%) counties (due to privacy concerns in less-populous counties), which accounted for <1% of the US population. These missing values were not imputed.