key: cord-0875120-moqga8cm authors: Tanaka, S. title: Long-Term Downwind Exposure to Air Pollution from Power Plants and Adult Mortality: Evidence from COVID-19 date: 2020-11-24 journal: nan DOI: 10.1101/2020.11.23.20237107 sha: dc6be23e8af0e2aca3a2b410af090538af6be27a doc_id: 875120 cord_uid: moqga8cm We estimate the causal effects of long-term exposure to air pollution emitted from fossil fuel power plants on adult mortality. We leverage quasi-experimental variation in daily wind patterns, which is further instrumented by the county orientation from the nearest power plant. We find that the average county's fraction of days spent downwind of plants within 20 miles in the last 10 years is associated with a 27.6 percent increase in mortality from COVID-19. This effect is more pronounced in fence line communities with high poverty rates and a large proportion of Black population. There is a growing interest in the linkages between pollution from energy facilities and public health in the U.S. Fossil fuel power plants emit some of the largest amounts of hazardous pollutants into the ambient air, and concerns regarding the health impacts of exposure have resulted in a variety of public policies targeting the electric power generation industry. Existing evidence suggests that people living in close proximity to fossil fuel power plants suffer a wide range of adverse health outcomes (Schneider and Banks 2010; Liu et al. 2012; Ha et al. 2015; Gupta and Spears 2017; Amster and Levy 2019) . However, the observed relationship using distance as a proxy for exposure may reflect strategic siting of power plants that confounds unobserved heterogeneities in the underlying health and/or tastebased residential sorting into the neighborhoods of plants (Davis 2011; Heblich et al. 2016 ). Consequently, a recent review of the literature over the past 30 years concluded that the health costs of fossil fuel power plants remain unknown (Kravchenko and Lyerly 2018) . In this study, we conduct the first quasi-experimental investigations of the effects of longterm exposure to pollution emissions from power plants on adult mortality, while shedding light on its health consequences during a pandemic of COVID-19. To address endogeneity in the extent of pollution exposure from power plants, we leverage quasi-random variation in the exposure generated by daily wind patterns at power plants. Our empirical strategy effectively compares counties with a greater fraction of days spent downwind in the last 10 years with counties with fewer such days within a given distance radius from power plants. Further, we instrument for downwind frequency using the bearing angle of the county's orientation from the nearest power plant, whose effects are allowed to vary by geography. The identification assumption is that, after controlling for an extensive set of county characteristics and weather variables, the county's orientation to the nearest power plant is unrelated to variation in the 1 county's mortality except through its influence on downwind frequency. We find statistically and economically significant associations between downwind frequency and the COVID-19 mortality rates at the county level. Our estimates suggest that the average county's fraction of days spent downwind of power plants within 20 miles in the last 10 years is associated with a 27.6 percent increase in daily mortality from COVID-19. The robustness checks confirm that the effects are stronger in areas directly downwind than in areas lying at a greater angle from that direct line, relative to areas completely upwind within a 20-mile radius. These findings are consistent with what we find in the associations between wind patterns and PM2.5 concentrations at the monitor levels of analysis. Further analysis highlights that these effects are amplified in counties with high poverty rates and a large proportion of Black population. The falsification tests find no evidence of a comparable association in proximity to nuclear power plants or systematic correlations between downwind frequency and county characteristics that could cause a spurious correlation. In addition, we find little effect of short-term exposure to air pollution from power plants on We contribute to the recent, yet a small number of, studies that exploit quasiexperimental methods for causal inference on the health burdens of fossil fuel power plants. These studies exclusively focus on infants, children, or short-term exposure. For instance, Luechinger (2014) show that the mandated installation of desulfurization systems at power plants in Germany resulted in improved SO 2 pollution and infant mortality rates; Yang et al. (2017) and Yang and Chou (2018) find that the shutdown of a coal-fired power plant in New Jersey resulted in lower likelihood of low birthweight births; Cesura et al. (2018) show that the displacement of coal by natural gas as an energy source led to reduced mortality among adults and the elderly. Among them, Barreca et al. (2017) uniquely explores how the contemporaneous effects of the U.S. Acid Rain Program evolve over years since its inception. However, there still exists no quasi-experimental study evidencing the causal effect on adult mortality of long-term exposure to pollution emissions from power plants. We contribute new data that offer insights on spatial variation in wind patterns to establish long-term exposure to pollution emissions to certain fence line populations living in proximity to power plants. Our study is also one of an increasing number of studies that explain spatial variations in the effects of COVID-19 across the U.S. counties. The high incidence of COVID-19 mortality in low-income, minority communities has raised questions about the role air quality and pollution play in the epidemiology of COVID-19 mortality. Existing evidence suggests that COVID-19-associated hospitalization and mortality are greater for adults, in particular those aged 65 and above, and those with other underlying medical conditions that are related to long-term exposure to air pollution such as chronic lung disease, asthma, and cardiovascular disease (Center for Disease Control and Prevention 2020; Dockery et al. 1993; Pope et al. 2002; Chen et al. 2013; Anderson 2020) ). An unpublished work by Wu et al. (2020) suggests that county-level long-term averages in PM2.5 are associated with COVID-19 mortality. Yet, Knittel and Ozaltun (2020) finds that such an association suffers from an omitted variable bias, making it unclear whether the observed relationships reflect a causal effect of air pollution exposure on COVID-19 mortality or whether other factors associated with lower air quality could explain greater mortality. We overcome such endogeneity issue by identifying pollution sources at power plants and exploit quasi-random wind-generated variation as well as the instrumental variable approach in measuring the long-term exposure to air pollution. Such quasi-experimental estimates are an important first step to understanding whether ongoing ambient pollution from power stations in the U.S. raises COVID-19 mortality risk in fence line communities. Further, such data will assist U.S. policymakers to rebuild the economic recovery in a manner that best prepares for future pandemics. The rest of the paper is structured as follows. Section II describes the data sources and presents the summary statistics of key variables. Section III presents our empirical strategy. Section IV presents the empirical results, first on the effects of pollution emissions from power plants on air quality, and then on the effects of long-term pollution exposure on COVID-19 mortality, followed by evidence from the robustness checks, falsification tests, and heterogeneities in the effects. Section V concludes. A. Data sources COVID-19 deaths. We use the publicly-available county-level cases of COVID-19 deaths from the Center for Systems Science and Engineering Coronavirus Resource Center at Johns Hopkins University (Dong et al. 2020) . The data report the daily incidence of confirmed cases and deaths at the county level in the U.S. based on various sources, including the World Health Organization (WHO), the Centers for Disease Control and Prevention (CDC), and local state health departments. We compute the death rates per 100,000 population based on the estimated population from the most recent 2018 round of the American Community Survey. Power plants. We compiled the comprehensive list of all power plants that operated in 2010-2019 from various sources. First, we obtained the list of all power plants from the Emissions & Generation Resource Integration Database (eGRID). The dataset reports the geocoded addresses, as well as primary fuel sources, of almost all electric power generated in 3 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted November 24, 2020. ; the U.S. every two years between 2010 and 2018. Where plants have switched their primary fuel sources over time, we define the primary fuel source at each plant by the fuel sources that are reported most frequently during these years. Because many power plants have switched their primary fuels over our study period, with coal having been largely displaced by natural gas in particular, it is not meaningful for our research purpose to separately identify coalfired and natural gas power plants; we thereby consider all fossil fuel-fired power plants as comparable pollution emission sources. For this study, we include power plants that use coal, natural gas, and oil as the primary fuel sources. We removed from the list power plants whose sum of net generation in 2010-2019 is zero or less as reported in Energy Information Administration (EIA)-923. We further refine the plant activities by accounting for the operation status at the plant-year-month level based on the initial operation month and year and retirement month and year reported in EIA-860. In total, there are 2,740 fossil fuel power plants in the U.S. during our study period, of which 514 plants are coal-fired, and 1,642 use natural gas as their primary fuel source. Meteorological data. The daily wind direction data are obtained from GRIDMET, which reports high-spatial resolution (about a 4 km by 4 km grid cell) surface meteorological data over the contiguous U.S. (Abatzoglou 2013) . We match each power plant with the nearest grid cell in GRIDMET and construct the daily wind direction at each plant as measured in degrees clockwise, "toward" which the wind blows from power plants, normalized to be zero at northward. The average distance between power plants and the nearest grid cell is 1.0 mile, with a standard deviation of 1.1 miles. In addition, GRIDMET reports daily maximum and minimum temperatures and relative humidity, from which we construct the daily averages at the county level by taking the arithmetic averages of all grid cells that fall within each county. Air quality. We obtain ambient air quality data in 2010-2019 from the publicly available Air Quality System managed by the U.S. Environmental Protection Agency (EPA). We use daily concentrations in the ambient air of PM2.5, atmospheric particulate matters with diameters of less than 2.5 micrometers. Across the U.S., more than 1,000 monitoring stations monitor PM2.5. Demographics. We use the 1-year and 5-year estimates of the 2018 American Community Survey to obtain an extensive set of county characteristics. The list of individual county characteristics is presented under Table 1 . We compute population density in persons per square mile using the land area from the 2010 Census. Hospital capacity. We collect the data on the number of intensive care unit (ICU) beds from the Homeland Infrastructure Foundation-Level Data. We aggregated data to the county level. 4 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The summary statistics are presented in Online Appendix Table A .1. 1 Because the wind data are available only in the contiguous U.S., our sample in the analysis excludes Alaska and Hawaii. Our main sample is further restricted to counties whose population-weighted centroids are located within a 20-mile radius of power plants (explained below). These exclusions narrow the sample down to 1,604 counties, or 58.8% of all counties in the contiguous U.S. with the COVID-19 mortality data. In total, 25,261 people died from COVID-19 in our sample counties on April 20, 2020, when the daily death count was at or near the peak. New York City (NYC) had by far the largest death toll (14,604) by itself, whereas the second largest death toll is less than 10% of the value in NYC, 2 and for this reason our sample excludes NYC as an extreme outlier. Overall, our sample accounts for 94.1% of total COVID-19 deaths on April 20, 2020 in the contiguous U.S. excluding NYC. The average mortality rate is 9.32 deaths per 100,000 population, and the average confirmed case rate is 217.6 per 100,000 population. Figure 1 Panel A illustrates the distributions of mortality rates in the U.S. Our main analysis relies on the presupposition that counties downwind of power plants have greater exposure to pollutants emitted from power plants relative to counties upwind for a given distance from power plants. A number of epidemiological studies on the effects of power plants on health reviewed by Amster and Levy (2019) have found adverse health effects within a 20-mile radius. This guides us to start our main analysis below with counties within 20 miles of power plants and to empirically determine distance over which significant associations between downwind frequency and COVID-19 mortality are detected. This subsection explores whether our data support associations between downwind frequency and air quality at these hypothesized radii. The analysis uses the daily concentrations of PM2.5 in the ambient air recorded at groundlevel monitoring stations in 2010-2019. We focus on PM2.5 as a measure of air quality because these fine particulates are widely known to be most harmful to human health. 3 The estimated model is similar to the main analysis (as described by Equation (4) below) except that the 1 See Online Appendix Section A for detailed description. 2 The second largest death toll is 1,329. In terms of mortality rates, it is 896.7 per 100,000 people in NYC, and the second largest value is 122.7 per 100,000 people. 3 https://www.epa.gov/pm-pollution/particulate-matter-pm-basics 5 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted November 24, 2020. ; https://doi.org/10.1101/2020.11.23.20237107 doi: medRxiv preprint model is based on the panel data at the daily monitor level. In particular, we estimate: where the outcome variable is the PM2.5 concentration level in the ambient air at monitor i on date t. The independent variable of interest, Downwind, is a dummy variable and similarly defined as in the analysis on mortality below; the air quality monitoring station is defined as downwind of power plants on the day when it is located within 45 degrees of a ray running from the power plants to the wind direction. 4 While the wind direction by itself is exogenous, the model includes a rich set of potential confounding variables. First, the time fixed effects, τ t , include month-of-year fixed effects and day-of-week fixed effects to control for seasonality effects and macroeconomic effects for given month of the year as well as for trend patterns across days of the week. Second, the inclusion of monitor fixed effects, λ i , addresses endogenous placement of monitoring stations where the EPA is concerned about high pollution by effectively comparing pollution concentrations for given monitors when they are and are not downwind. Last, the included meteorological variables, W it , are those commonly considered to affect pollution dispersion and biochemical processes of pollutant transformation, such as daily precipitation, daily average humidity, and daily average temperature in every 5-degree Celsius bin. ε is an idiosyncratic error. The standard errors are clustered at the monitor level. Note that the analysis above is conducted at the daily level. Pollutants can travel over days, and areas farther downwind may or may not experience a lag in exposure to pollution emissions. One potential way of accounting for this variation in time exposure is to aggregate the data and conduct a cross-sectional analysis as in the main analysis for mortality. However, a limitation is that pollution concentrations are not recorded every day, making inference based on the aggregated data prone to recording frequency. With this proviso noted, we analyze the relationship between downwind frequency in the past 10 years and the average PM2.5 by estimating: where the control variables include distance to the nearest plant (Dist), the weather variables (W ), which include the average precipitation and the fraction of days in each 5-degree Celsius bin, and the time effect (DOW ), the fraction of days in each day of week, all of whose averages are taken based on the days when the observations on PM2.5 are recorded. The independent 4 When there are multiple plants within 20 miles of an air quality monitor, the monitor is considered to be downwind when it is downwind of at least one power plant. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted November 24, 2020. ; variable of interest, Downwind i , measures the fraction of days the monitor spent downwind of power plants over the ten years between 2010-2019 again based on the days when the observations on PM2.5 are recorded. The regressions are weighted by the total number of observations over the past 10 years. The heteroskedasticity-adjusted standard errors are computed. The analysis so far uses dichotomous distinctions to define the treatment status. Instead, we now plot the marginal impacts based on the continuous measures in differences in angles between the wind direction and monitor orientation, Angle, distance from the nearest power plants, and the interactions of these angles and distance. In particular, we estimate: (3) We are interested in testing whether long-term exposure to pollution emissions from power plants contributes to COVID-19 mortality. A simple comparison of counties that are near and far from power plants would generate a spurious correlation due to unobserved differences in characteristics that are correlated with distance to power plants. The ideal-though practically unfeasible-experiment to test our hypothesis would be to randomly allocate power plants across counties that are otherwise similar. Instead, our analysis makes a close approximation to such an experiment by leveraging the wind direction as the plausibly exogenous source of variation in exposure to pollution emissions from power plants across counties that are similarly close to power plants. In particular, using counties whose population-weighted centroids are within a 20-mile radius of power plants, we estimate: The primary outcome of interest, Y c , is the mortality rate per 100,000 population at county c on April 20, 2020, when the daily mortality count was among the highest in the U.S. We focus on deaths rather than confirmed cases because the latter is likely to suffer from classical and non-classical measurement errors. For instance, the number of confirmed cases crucially depends on the number of tests conducted; moreover, according to the current knowledge of COVID-19, between 17.9% and up to 50% of positive cases remain asymptomatic (Jagodnik et al. 2020; Gudbjartsson et al. 2020) . In this case, a greater number of tests is likely to be conducted among high-risk populations, causing a reverse causality to bias the estimates (Borjas 2020; Schmitt-Grohé et al. (2020) ). On the other hand, the 7 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted November 24, 2020. ; number of deaths is far less subject to these concerns. The explanatory variable of interest, Downwind c , is the fraction of days the county centroid spent downwind of power plants over the ten years between 2010-2019. Consistent with the air quality analysis above, the county is considered to be "downwind" of power plants on the day when a county's centroid is located within 45 degrees of a ray from the power plants to the wind direction, i.e., 22.5 degrees both eastward and westward. As robustness checks, we also experiment with alternative angles to define downwind, e.g., 90 and 180 degrees. We consider all power plants within 20 miles of county centroids in constructing the dummy variable for being downwind on the daily basis between 2010 and 2019. 5 Finally, the constructed variable is aggregated to measure the fraction of days a county centroid spent downwind of power plants in 2010-2019. The parameter of interest is β, which estimates the effect on the COVID-19 mortality rate of the fraction of days over the last 10 years spent downwind of power plants within a 20-mile radius. The identification assumption for causal inference requires that, after controlling for the covariates, a county's fraction of days spent downwind of power plants in the neighborhood is unrelated to factors explaining the county's mortality from COVID-19 except through air pollution. Our richest model includes the state fixed effects, µ s , and an extensive set of county characteristics, X c . The state fixed effects help control for heterogeneities in responses to COVID-19 at the state level. For instance, states have been responsible for mitigating the effects of an outbreak through declaring states of emergency, funding and expanding COVID-19 testing, and enacting and implementing statewide stay-at-home orders and other legislation related to COVID-19. The list of individual county characteristics included is presented under Table 1 . ε is an idiosyncratic error. The notion that downwind frequency is plausibly exogenous to county characteristics assumes that unconditional exogeneity must hold. We test this assumption in two ways. First, we explore changes in the effect of downwind frequency as we add more controls. Significant changes in the estimates of β with controls indicate potential correlations between downwind frequency and controls. Second, we directly test associations between downwind frequency and county characteristics. Nevertheless, inclusion of these control variables helps explain substantial variation in the dependent variable without compromising the statistical power for estimating β because there remains substantial variation in the downwind frequency even after accounting for all these variables. Following the convention, all regressions are weighted by the county population, as counties with greater population allow precise estimates of averages, and the heteroskedasticity-robust standard errors are computed. A potential threat to the identification based on the OLS framework is that low-income 5 See Online Appendix A.2 for illustrative figures. 8 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this this version posted November 24, 2020. ; households may have sorted into areas with poor air quality determined by the prevailing winds over time. While the extensive evidence presented below does not support that such an issue, if any, would bias our estimates to explain the main results, the analysis based on the selection on observables does not preclude a potential selection on unobservables. Thus, we employ an instrumental variable (IV) strategy. Motivated by Deryugina et al. (2019) , we use the orientation of county centroid to the nearest fossil fuel plant as an instrument for downwind frequency and allow the effect of the bearing angle on downwind frequency to vary by geography. 6 In particular, our first stage is: The excluded instruments are 1[G c = g], Bearing 90b c , and their interactions. The variable 1[G c = g] is an indicator variable for county c belonging to county group g from the set of county groups G. These county groups are constructed by k-means cluster algorithm that partitions all U.S. counties into 90 spatial clusters based on their centroid coordinates. 7 The variable Bearing 90b c classifies the bearing angle of country centroid from the nearest fossil fuel power plant into four categories with a 90-degree interval of [90b, 90b + 90). The omitted category is the bearing angle in [270, 360) . Intuitively, if the winds are prevailing, the orientation of county centroids to the nearest plant affects the propensity to be downwind conditional on that these counties are spatially proximate with each other. The exclusion restriction assumes that the bearing angle of county's orientation from the nearest power plant is not associated with adult mortality other than its influence on air quality. 6 Deryugina et al. (2019) uses the daily wind direction in the county as an instrument for PM2.5 whose relationships are allowed to vary across geography in estimating the effect of air pollution on adult mortality. 7 Online Appendix Figure A .1 illustrates the distributions of these groups. Note that there is little theoretical guidance as to how many clusters are optimal other than that the monotonicity assumption must hold within each cluster. We chose 90 as these clusters classify the entire country into spatially proximate groups, within which the effects of bearing angle on downwind frequency are similar as evidenced in the strong first stage results. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this this version posted November 24, 2020. ; https://doi.org/10.1101/2020.11.23.20237107 doi: medRxiv preprint C. Estimating the effect of short-term downwind frequency on The analysis thus far concerns whether the long-term exposure to pollution emissions from power plants are significantly associated with COVID-19 mortality. However, a question remains as to whether the short-term exposure has any effects on COVID-19 mortality. We answer this question based on the panel data analysis. In particular, we use the sample of daily observations from April 1, 2020, when the spatial coverage of the mortality data is nearly complete, to April 20, 2020, whose mortality rate is used in the main analysis. The regression model is; where the outcome is the mortality rate per 100,000 population at county c on date t. The main independent variable of interest, Downwind ct , is the fraction of days over the last 2 weeks the county centroid spent downwind of power plants within a 20-mile radius. We also control for the average humidity and temperature in every 5-degree Celsius bin in the past 2 weeks (W ct ). The richest model includes the county fixed effects, λ c , which absorb any time invariant heterogeneities at the county level including X c and µ s . The standard errors are clustered at the county level to allow correlations in the error term at the county level over time. The identification assumption with the county fixed effects is that after controlling for weather, day-to-day variation in wind patterns in the past 2 weeks is uncorrelated with other determinants of COVID-19 mortality. We first explore whether our data support associations between downwind frequency and air quality at these hypothesized radii. 8 The analysis uses the daily concentrations of PM2.5 in the ambient air recorded at ground-level monitoring stations in 2010-2019. We find that monitors situated closer to directly downwind of power plants are more exposed to pollutants emitted from power plants, whereas monitors situated 90-180 degrees from directly downwind still detect pollution (Online Appendix Table A .2). For instance, we find that PM2.5 concentrations increase by 3.5% (p = 0.000, n = 1,476,800) on the days when monitors are 8 See Online Appendix B for further discussions. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this this version posted November 24, 2020. ; downwind of power plants within 20 miles of power plants, whereas our estimated effect confirms no significant downwind effect (β = -0.150, p = 0.352, n = 195,616) on pollution concentrations at monitors 20 to 50 miles away. Figure 2 plots the marginal impacts based on the continuous measures in differences in angles between the wind direction and monitor orientation, distance from the nearest power plants, and the interactions of these angles and distance. It makes clear that PM2.5 concentrations are the highest in close proximity to a plant, degrade more slowly over distance in downwind than upwind, yet decay substantially at 20 miles even in downwind. We now test whether long-term exposure to pollution emissions from power plants explains COVID-19 mortality. Our empirical strategy effectively compares counties with a greater fraction of days spent downwind in the last 10 years with counties with fewer such days for a given distance from power plants. Figure 1 Panel B illustrates substantial variation in downwind frequency even across spatially proximate counties. We start by illustrating the relationship between downwind frequency and COVID-19 mortality in Figure 3 . Each dot represents the population-weighted average mortality rates in each decile of downwind frequency, after being residualized by the control variables included in the regressions. The figure reveals a clear positive correlation between the two variables, indicating that a greater fraction of days spent downwind is associated with greater mortality rates from COVID-19. Table 1 presents the regression results. 9 We find statistically and economically significant associations between downwind frequency and the COVID-19 mortality rates at the county level. The preferred estimate in Column (4), which includes both a full set of county characteristics and the state fixed effects, imply that counties 45-degree downwind of power plants on average experienced 2.57 (p = 0.003, n = 1,604) more deaths per 100,000 population, or 27.6% more, than those outside of these areas within the same distance from plants. To put this into context, a back-of-the-envelope calculation indicates that such an increase in the mortality rates are associated with a 0.287 µg/m 3 (p = 0.059, n = 1,051) increase in the average PM2.5 concentration level over the last 10 years, whose mean is 8.730 and standard deviation is 1.948. 10 These calculations confirm the large effects of pollution exposure that are manifested during a pandemic. 9 See Online Appendix Table A .3 for all other coefficients and Section C.1 for further discussions. 10 We must sound a cautionary note that the air quality analysis is based on sparse distributions of selected monitors, which may not be directly comparable to rather comprehensive mortality analysis. For this reason, the conventional two stage least square estimates are also not feasible. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted November 24, 2020. ; Importantly, the stability in the estimated impact of being downwind reinforces our assumption of unconditional exogeneity in that downwind frequency is orthogonal to these characteristics. Further, whereas these control variables add great explanatory power to the regression, the statistical power to estimate the effects of downwind frequency is enhanced, assuring substantial residual variation in downwind exposure even after controlling for such an extensive set of other factors. To address a remaining concern as to whether our estimates are driven by unobserved differences in county characteristics and meteorological conditions that may be correlated with both the wind direction and mortality, we conduct the IV strategy. Column (5) provides a 2SLS estimate, which is larger in magnitude than, yet its range overlaps with that of, the OLS estimates. This result gives confidence to the robustness of the OLS estimates. 11 We test the robustness of the main results above to various alternative specifications. 12 We find that the main results are robust to: i) alternative dependent variables; ii) confirmed cases as the dependent variable; iii) alternative angles to define downwind; iv) controlling for the number of power plants within 20 miles; v) controlling for the number of industrial plants that emit toxic pollutants to address a concern that Black Americans often live near other hazardous sites; and vi) controlling for the short-term downwind frequency to control for contemporaneous effects of pollution exposure. Additional analyses based on Equation (6) 13 show that the statistical correlations between downwind frequency in the past 2 weeks and COVID-19 mortality disappear once we control for the county fixed effects, suggesting that other time invariant factors related to downwind frequency in the past 2 weeks explain COVID-19 mortality. Further, downwind frequency in the past 2 weeks is not significantly associated with COVID-19 mortality when we control for the long-term downwind frequency that remains statistically significantly associated with COVID-19 mortality. Overall, these findings suggest that the long-term exposure to pollution emissions from power plants is an important determinant to explain across-county variation 11 Note that the 2SLS estimates using various other numbers of geographical clusters consistently result in greater estimates than the OLS estimates, while the excluded instruments inevitably become weaker as the number of clusters decreases due to the violation of the monotonicity assumption. Deryugina et al. (2019) classifies the entire U.S. counties into 100 clusters. In our case, while a larger number of clusters leads to a larger first stage F -statistic, the use of more than 90 clusters does not allow for estimating the first-stage F -statistic, yet the 2SLS estimates are very close to the OLS estimates with 100 and 120 clusters. Given the consistency in the OLS estimates as well as ambiguity in selecting these clusters, we consider the conservative OLS estimates as the main result that provides a lower bound effect. 12 See Online Appendix Table A .4 and Section C.2 for further discussions. 13 See Online Appendix Table A .7 and Section C.5 for further discussions 12 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted November 24, 2020. ; https://doi.org/10.1101 https://doi.org/10. /2020 in COVID-19 mortality, whereas the short-term exposure appears to have negligible effects. Together, these findings confirm that the main results are not driven by inappropriate specifications. Next, we attest the validity of the identification assumption by conducting several falsification tests. 14 The causal interpretation of the results so far hinges on the assumption that the wind direction creates "as good as random" variation in the extent to which counties are exposed to pollution from power plants. Such assumption is violated if county characteristics differ proportional to downwind exposure in a systematic way that is correlated with other elevated health risks, and/or individuals have sorted into locations in consideration of whether they were upwind or downwind of power plants. Ultimately, either scenario, if it is true, should lead to significant associations between downwind frequency and county characteristics. The falsification tests find no evidence of an association in proximity to nuclear power plants that typically do not produce air pollution hazardous to public health, suggesting that other pollution sources that are typically near power plants do not confound our estimates. Further, we find that the estimated effect is quantitatively negligible for counties 20-50 miles away from power plants, suggesting that other pollution sources, e.g., vehicular emissions from major roads, do not explain our estimates. Last, we find no evidence of systematic correlations between downwind frequency and county characteristics that could cause a spurious correlation. Overall, these pieces of evidence together confirm that downwind frequency is associated with the COVID-19 mortality rates only within a certain distance of fossil fuel power plants in which pollution exposure can be heightened by the winds. Last, we find that the effects are pronounced in counties with a greater share of men, a greater percent of Black population, greater population density, greater poverty rates, a greater percent of aged population above 65 without health insurance, and a greater percent of population with less than a high school degree. 15 These findings highlight that the greater burden of a pandemic is borne by people who are already at high risk from other hazardous sites (Villarosa 2020). This study presents the first evidence that substantiates the health costs of long-term exposure to pollution emissions from fossil fuel power plants in the context of COVID-19. These findings hold important policy implications for analysis needed on how to rebuild the U.S. energy system in the aftermath of the pandemic. In particular, the clean energy industry has been identified as a key area of long-run stimulus not only because wind and solar were the fastest-growing industries before the pandemic but also because climate change continues to be a global threat. Our findings suggest the possibility that investments in clean energy could contribute to building a society more resilient to airborne disease and thereby lessen the threat of pandemics. These results raise important questions about future regulations of older fossil fuel power plants in the U.S. and what must be studied in determining their costs to society and environmental impacts. Further, our findings contribute important implications for the public health risks unfettered pollution from energy facilities could pose to fence line communities. This kind of epidemiological study is particularly pressing in light of considerations of environmental justice and social equity in the U.S. To quote a recent statement by the United Nations Sectary-General, "[F]iscal firepower must shift economies from grey to green, making societies and people more resilient through a transition that is fair to all and leaves no one behind". 16 Further study could be helpful in determining what facilities should be prioritized for closure and replaced with cleaner energy sources as part of energy policy and future economic stimulus spending to reduce the impacts of airborne illness in the U.S. 16 https://www.un.org/press/en/2020/sgsm20063.doc.htm 14 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted November 24, 2020. ; https://doi.org/10.1101/2020.11.23.20237107 doi: medRxiv preprint is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted November 24, 2020. ; https://doi.org/10.1101/2020.11.23.20237107 doi: medRxiv preprint . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted November 24, 2020. ; https://doi.org/10.1101/2020.11.23.20237107 doi: medRxiv preprint (3) with the color intensity ranging from blue (the best air quality) to red (the worst air quality) within a 20-mile radius of a plant located at the center, when the wind blows from north to south. The left histogram indicates the number of monitors at every 1-mile distance bin from the plants. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted November 24, 2020. ; https://doi.org/10.1101/2020.11.23.20237107 doi: medRxiv preprint . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted November 24, 2020. ; https://doi.org/10.1101/2020.11.23.20237107 doi: medRxiv preprint (4). The sample size is 1,604 counties. The level of observations is at the county level. Controls include demographic variables (median age, gender composition, the percentage of population aged 65 and above, population density, the percent of whites and blacks), economic variables (the log of median household income, unemployment rate, the percent of population under the federal poverty line, median housing value), education (the percent without a high school degree, with a high school degree, and with a bachelor's degree and above), health insurance (the percentage without health insurance, with Medicare, and with Medicaid, among population aged 65 and above), geographic variables (latitude and longitudes of population-weighted centroid), the health facility capacity (the number of ICU beds), and meteorological factors (daily average precipitation, humidity, and temperature in every 5-degree Celsius bin). Additional coefficients are presented in Online Appendix Table A .3. Column (5) presents the 2SLS estimate based on Equation (5). The F -statistic of excluded instruments in the first stage is 645.24. The heteroskedasticity-adjusted standard errors are reported in the parentheses. ***p < 0.01 **p < 0.05 *p < 0.1 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted November 24, 2020. ; https://doi.org/10. 1101 Development of gridded surface meteorological data for ecological applications and modelling Impact of Coal-fired Power Plant Emissions on Children's Health: A Systematic Review of the Epidemiological Literature As the Wind Blows: The Effects of Long-term Exposure to Air Pollution on Mortality Long-Run Pollution Exposure and Adult Mortality: Evidence from the Acid Rain Program Demographic Determinants of Testing Incidence and COVID-19 Infections in New York City Neighborhoods Can natural gas save lives? Evidence from the deployment of a fueldelivery system in a developing country Evidence on the Impact of Sustained Exposure to Air Pollution on Life Expectancy from China's Huai River Policy The Effect of Power Plants on Local Housing Prices and Rents The Mortality and Medical Costs of Air Pollution: Evidence from Changes in Wind Direction An Association Between Air Pollution and Mortality in Six U.S. Cities An interactive web-based dashboard to track COVID-19 in real time Spread of SARS-CoV-2 in the Icelandic Population Health externalities of India's expansion of coal plants: Evidence from a national panel of 40,000 households Associations Between Residential Proximity to Power Plants and Adverse Birth Outcomes East Side Story: Historical Pollution and Persistent Neighborhood Sorting Correcting under-reported COVID-19 case numbers: estimating the true scale of the pandemic What does and does not correlate with COVID-19 Death Rates The impact of coal-powered electrical plants and coal ash impoundments on the health of residential communities Association between Residential Proximity to Fuel-Fired Power Plants and Hospitalization Rate for Respiratory Diseases Air pollution and infant mortality: A natural experiment from powerplant desulfurization Lung Cancer, Cardiopulmonary Mortality, and Long-Term Exposure to Fine Particulate Air Pollution COVID-19: Testing Inequality in The Toll From Coal: An Updated Assessment of Death and Disease from America's Dirtiest Energy Source Pollution Is Killing Black Americans. This Community Fought Back Exposure to air pollution and COVID-19 mortality in the United States The impact of environmental regulation on fetal health: Evidence from the shutdown of a coal-fired power plant located upwind of New Jersey