key: cord-338819-wkb318sq
authors: Saez, Marc; Tobias, Aurelio; Barceló, Maria A.
title: Effects of long-term exposure to air pollutants on the spatial spread of COVID-19 in Catalonia, Spain
date: 2020-09-12
journal: Environ Res
DOI: 10.1016/j.envres.2020.110177
sha: 
doc_id: 338819
cord_uid: wkb318sq

BACKGROUND: The risk of infection and death by COVID-19 could be associated with a heterogeneous distribution at a small area level of environmental, socioeconomic and demographic factors. Our objective was to investigate, at a small area level, whether long-term exposure to air pollutants increased the risk of COVID-19 incidence and death in Catalonia, Spain, controlling for socioeconomic and demographic factors. METHODS: We used a mixed longitudinal ecological design with the study population consisting of small areas in Catalonia for the period February 25 to May 16, 2020. We estimated Generalized Linear Mixed models in which we controlled for a wide range of observed and unobserved confounders as well as spatial and temporal dependence. RESULTS: We have found that long-term exposure to nitrogen dioxide (NO(2)) and, to a lesser extent, to coarse particles (PM(10)) have been independent predictors of the spatial spread of COVID-19. For every 1 μm/m(3) above the mean the risk of a positive test case increased by 2.7% (95% credibility interval, ICr: 0.8%, 4.7%) for NO(2) and 3.0% (95% ICr: -1.4%,7.44%) for PM(10). Regions with levels of NO(2) exposure in the third and fourth quartile had 28.8% and 35.7% greater risk of a death, respectively, than regions located in the first two quartiles. CONCLUSION: Although it is possible that there are biological mechanisms that explain, at least partially, the association between long-term exposure to air pollutants and COVID-19, we hypothesize that the spatial spread of COVID-19 in Catalonia is attributed to the different ease with which some people, the hosts of the virus, have infected others. That facility depends on the heterogeneous distribution at a small area level of variables such as population density, poor housing and the mobility of its residents, for which exposure to pollutants has been a surrogate.

(i.e. chronic) exposure to air pollutants (measured as the spatial variation of air pollutants in a given geographic area over a relatively long period of time), Fattorini and Regoli (2020) (in 71 Italian provinces) and Coccia (2020) (in the 55 Italian cities he studied) found that exposure to air pollutants was significantly associated with an increase in the number of cumulative cases of COVID-19. With regard to short-term (i.e. acute) exposure (measured as the temporal variation of air pollutants), Xu et al. (2020) (in 33 locations in China), Zhu et al. (2020) (in 120 cities in China), Jiang et al. (2020) (3 cities in China), Li H et al. (2020) (2 cities in China) and Adhikari et al. (2020) (in Queens, New York) found significantly positive associations between the short-term exposure to air pollutants with newly confirmed cases. Furthermore, as Domingo et al. (2020) point out, in a very recent review, the results of most of the studies suggest that the long-term exposure to air pollutants might lead to more severe and lethal forms of COVID-19.

Most of the studies evaluating socioeconomic and demographic variables (four of the six studies) considered population density (Coccia, 2020; Ahmadi et al., 2020; Pequeno et al., 2020; and You et al. 2020) . Three studies evaluated the influence of income (Azar et al., 2020; You et al., 2020; and Price-Haywood et al., 2020) . You et al. (2020) considered the percentage of the population aged 65 or over and the ratio of total floor area to the number of residential buildings. Bontempi (2020a) considered the role of commercial exchanges, as a proxy for social interactions.

In Spain, the region of Catalonia has been the second most affected by the COVID-19 pandemic (the Madrid region being the most affected), both by number of cases (63,042 cases, 25% of all cases in Spain, 821.37 cases per 100,000 inhabitants -537.17 cases per 100,000 in Spain), and by deaths (5,675 deaths, 20% of all deaths in Spain, 73.94 deaths per 100,000 inhabitants -60.49 deaths per 100,000 in Spain) (Secretaría General de Sanidad, 2020; INE, 2020a) . The geographical distribution of the spread of the pandemic has not been spatially homogeneous in the Catalan territory either, and important differences at the small area level have been observed.

Catalonia is mainly an urban region. Sixty percent of the population reside in 23 cities with more than 50,000 inhabitants (comprising 6.62% of the total Catalan territory) and 52% in 14 cities with more than 100,000 inhabitants (comprising 4.64% of the territory) (IDESCAT, 2020). Among these include the secondlargest city in Spain, Barcelona, and the 36 adjacent municipalities making up the Barcelona Metropolitan Area, i.e. 41.75% of the population of Catalonia (representing only 1.97% of the territory). All have a high population density and exhibit different levels of urban air pollution whose primary source of emission is from road traffic.

Our hypothesis is that the heterogeneity in the spatial distribution of the pandemic could be attributed to the fact that the risk of infection and death by COVID-19 could be associated with, on the one hand, a heterogeneous distribution at a small area level of environmental factors and, on the other hand, to different socioeconomic and demographic factors in each of those small areas.

Our objective in this paper was to investigate, at a small area level and controlling for socioeconomic and demographic factors, whether long-term exposure to air pollutants, such as particulate matter (PM 10 , coarse particles with a diameter of 10 µm or less) and nitrogen dioxide (NO 2 ), increased the risk of COVID-19 incidence and death in Catalonia, Spain.

This work presents as novelties, the use of spatio-temporal models to evaluate the effects of long-term exposure to air pollutants on the spatial spread of COVID-19. In the models we control for a wide range of confounders, both observed and unobserved, and for spatial and temporal dependence. Finally, we have exclusively used open data.

We used a mixed longitudinal ecological design, in which the response variables were observed several times at different points in time for each small area involved in the study.

The study population consisted of small areas of Catalonia: for incidence, the Basic Health Area (ABS, for its acronym in Catalan from here on) and for mortality the 'comarca', an administrative territorial division of Catalonia (region, hereinafter). The study period was between February 25 and May 16, 2020. The design was unbalanced, since neither the ABSs nor the regions were observed the same number of times.

Catalan health planning defines an ABS as the elementary territorial unit through which primary health care services are organized (Atenció Primària Girona, 2020) . The ABSs are either made up of neighbourhoods or districts in urban areas or by one or more municipalities in rural areas. Their delimitation is determined by geographical, demographic, social and epidemiological factors and, in particular, based on the accessibility the population has to services and the efficiency in the organization of health resources (Atenció Primària Girona, 2020) . Each ABS has a Primary Care Team, consisting of general practitioners, paediatricians, dentists, nurses and nursing assistants, social workers and administrative support staff. Depending on the size of the ABS, the number of municipalities and the dispersion of the population, the ABS will have one Primary Care Centre (CAP) and may have more clinics that are organically dependent on the CAP (Atenció Primària Girona, 2020). Specifically, we considered 372 of the 378 ABSs in Catalonia (because in six of the ABSs no positive test was reported), with a population between 371 and 72,321 inhabitants (mean 20,266 inhabitants, standard deviation 13,391, median 18,457 inhabitants, first quartile -Q1-10,554, third quartile -Q3-27,529). The population density was in the range of 0.31-34,590.58 inhabitants/km 2 (mean 3,486.36, standard deviation 6,719.23, median 309.18, Q1 44.83, Q3 3,752.54) . Of the 178 municipalities considered, 46 were divided into more than one ABS, 37 of them into a maximum of five ABSs, eight between six and 14 ABSs and one, (the city of Barcelona) into 67 ABSs (IDESCAT, 2020).

To avoid the identification of the deceased and to guarantee their confidentiality, no cases of mortality are provided at the level of the ABS but rather at the 'region' level (Open Government, 2020) . We had mortality data for all 42 regions of Catalonia, with a population between 3,802 and 2,252,862 inhabitants (mean 176,556 inhabitants, standard deviation 376,639, median 50,943 inhabitants, Q1 19,739, Q3 171,874) . The population density was in the range 0.36-3,997. inhabitants/km 2 (mean 170.38, standard deviation 642.44, median 30.63, Q1 11.16, Q3 83 .52) (IDESCAT, 2020).

To evaluate the incidence, we used daily incident positive cases, which were those that tested positive on some diagnostic test (PCR or fast test) (Open Government, 2020) , at the level of the ABS. For mortality we used daily deaths obtained from the funeral homes' reports to the Catalan Health Department (Open Government, 2020) . These reports declare positive cases as well as suspicious ones. Suspicious cases corresponded to people who presented symptoms at some point and a health professional classified them as a possible case, but they did not have a diagnostic test with a positive result. All data were obtained from the RSAcovid19 records from the Catalan Health Department (Open Government, 2020) .

We used long-term exposure to air pollutants, particulate matter (PM 10 ) and nitrogen dioxide (NO 2 ) in each of small areas from 2011 to 2019.

It is important to note that we evaluated long-term exposure to these air pollutants. That is to say, a subject, by virtue of residing in a certain small area, has been exposed to an (average) level of various environmental variables in the small area where, in our case, they resided at least during the period 2011-2019. We were interested in the effects that geographical variation of such exposure may have on the spatial spread of COVID-19.

We obtained information on the levels of air pollution for 2011-2019 from the 144 monitoring stations in the Catalan Network for Pollution Control and Prevention (XVPCA) located throughout Catalonia (Dades obertes Catalunya, 2020). We predicted the levels of air pollutants to which the inhabitants of each ABS and each region had been exposed to from 2011 to 2019 using a joint Bayesian model. The dependent variables were PM 10 and NO 2 . As explanatory variables we included air pollutants other than the dependent variable (fine particles with a diameter of 2.5 µm or less -PM 2.5 -nitrogen oxide -NO-, sulphur dioxide -SO 2 -, carbon monoxide -CO-, ozone -O 3 -, benzene -C 6 H 6 -, benzopyrene -B-, lead, arsenic, nickel, cadmium; in addition to PM 10 and NO 2 when the dependent variable was NO 2 and PM 10 , respectively) and the surface of the ABS or the region. In the model we controlled for spatial heterogeneity and both spatial and temporal dependence. This model allowed us to avoid the J o u r n a l P r e -p r o o f problems caused by spatial misalignment. If this problem is not corrected properly (as we did), there will be a complex form of measurement error leading to biased (i.e. asymptotically biased) and inconsistent estimates and erroneous standard errors in the estimates of the parameters. Further details concerning this can be found in Barceló et al. (2016) .

In fact, although PM 10 was observed at stations located at only 94 of the 372 ABSs and NO 2 at 66 of them, the model fit was very good in both cases (Figures A1 and A2 in supplementary material) .

We controlled for socioeconomic and demographic variables in the last year (or period) available at the census tract level (the only small area below the municipality for which data are provided). In particular: With the exception of population density, all other variables were observed at the census tract level. Using the population of each of the census tracts as weights (the source of the total population of the census tract and of the population of the census tract by sex was INE (2020a) we calculated the weighted average of the values at the census tracts that composed the ABS and the region to obtain their value at ABS and region levels.

J o u r n a l P r e -p r o o f We specified two generalized linear mixed models (GLMM) with variable response from the Poisson family; one for daily incident positive cases and the other for daily mortality.

Conditional to the true risk in the small area (ABS or region) ݅ on day ‫,ݐ‬ ‫ݔ‬ ௧ , the cases of the response variable (ܻ ௧ ) occurring in each of the small areas on each day was distributed as a Poisson.

where ߠ ௧ , is the mathematical expectation of ܻ ௧ , ‫ܧ‬ሺܻ ௧ ሻ = ߠ ௧ ; ݅ = 1, … , ݊; ‫,…,2,1=ݐ‬ 82; where t = 1 corresponded to February 25 and t = 82 to May 16, 2020; and ‫݊݅ݐ݈ܽݑܲ‬ was the population at risk of being a case (positive case or death) in the small area ݅ and on day ‫.ݐ‬

The link functions of the GLMMs were as follows:

-Daily incident positive cases

Where the subindexes ݅ and ‫ݐ‬ indicated the ABS, and the day, respectively; pollutant i denoted the predicted levels of air pollutant (PM 10 and NO 2 ) in ABS ݅; incomeQ i the average income per person in ABS ݅ (in quartiles, taking the fourth quartile as the reference category); unemploymentQ i the unemployment rate (in quartiles, taking the first quartile as the reference category); aged population i the percentage of population aged 65 and over; foreigners i the percentage of foreigners; poor housing i percentage of houses with less than 45 m 2 of living area in 2011; single person i the percentage of single person households; densityQ i the population density (in quartiles, taking the first quartile as the reference category); Population i the population of the ABS ݅; dummy1 i a dummy variable that collected the ABS where a possible outlier was located (explained below); ߟ , ܵ, ߬ ௧ , ߮ ௧_ denoted random effects (explained below); and ‫ݏߚ‬ were the coefficients of the explanatory and control variables (݁ ఉ was the relative risk associated with each of them).

-Daily mortality J o u r n a l P r e -p r o o f

In addition to considering total deaths, the model for daily deaths was stratified by sex.

Where the subindexes ݅ and ‫ݐ‬ indicated the region, and the day, respectively; pollutant i the air pollutant (PM 10 , included linearly in the model, and NO 2 , included in quartiles, considering the first two as the reference category); income i the average income per person in the region ݅ (included linearly in the model); Population i the population of the region (in the stratified models, the total number of men and the total number of women, both in 2019 (INE, 2020a)); dummy2 i a dummy variable that collected the region where a possible outlier was located (explained below). The definition of the rest of the parameters, variables, and the random effects is the same as described previously.

In order to assess whether the effect of the exposure to pollutants was different along the epidemic curve, the models for daily incident cases and daily deaths (total, men and women) were re-estimated including the interaction between the pollutant variable and the week of the year (from week 9, that of February 25, until week 20, that of May 16, 2020).

When we represented both the number of positive cases and the number of deaths per 100,000 inhabitants by ABS and region, respectively, we observed the existence of an outlier (see Figure A3 in supplementary material). This outlier corresponded to an ABS in the city of Barcelona in which the largest hospital in Catalonia is located. We chose to control this outlier, including dummy variables in the models. These variables took the value 1 when the ABS, or the region, coincided where the outlier was located and 0 otherwise.

The number of positive cases did not correspond to the number of subjects. The problem being that the Government of Catalonia, although it provides the daily data of diagnosed subjects, only publishes them at the level of the whole of Catalonia, without disaggregating. Therefore, the number of positive cases can only be considered as an estimator of the incidence. The measurement error of this estimator, however, was not random. If this error were not controlled, the estimators would be inconsistent.

We constructed a new variable, daily number of tests per confirmed case, t_c, dividing the daily total number of tests (for all of Catalonia) by the daily number of subjects with a confirmed diagnosis (these data were obtained from Open Government, 2020) . This variable indicated the daily number of tests that were performed to detect a subject with a confirmed diagnosis of COVID-19 (at the level of Catalonia). During the study period, an average of 8.74 tests per person were performed each day in Catalonia (standard deviation: 8.62; median 5.5, Q1 3.03, Q3 10.31). Note, however, that as of mid-April the behaviour of this variable ceased to be stationary in mean. That is, the number of tests grew faster than the number of diagnosed subjects. Furthermore, the difference in the growth of both variables increased over time ( Figure A4 in supplementary material).

To control the measurement error resulting from using an estimator of the incidence and not the incidence by itself, in the model we included for daily incident positive cases a random effect indexed in the daily number of tests per confirmed case, t_c, ߮ ௧_ . To control the non-stationarity in mean explained above, we interacted this effect with ߬ ௧ , another random effect indexed on time (see the model for daily incident positive cases).

We included four random effects in the models. First, ߟ , a random effect indexed on the small area (ABS in the model for daily incident positive cases, and region in the model for daily mortality). This random effect was unstructured (independent and identically distributed random effects, (iid), and captured individual heterogeneity, that is to say, unobserved confounders specific to the small area and invariant in time.

Second, in the model we included ߮ ௧_ , a structured random effect (random walk of order one, rw1). Following the integrated nested Laplace approximations (INLA) approach (Rue et al., 2009 and when, as in our case, the random effects are indexed on a continuous variable, they can be used as smoothers to model non-linear dependency on covariates in the linear predictor.

We also included ߬ ௧ , a structured random effect (rw1) indexed on time, in order to control the temporal dependency (that is, the shape of the curve itself).

Finally, we included the structured random effect S(small area) to control spatial dependency. That is to say, the fact that small areas that are close in space show more similar incidence and mortality than areas that are not close.

Following the INLA approach, random effects were defined using a multivariate Gaussian distribution with a zero mean and precision matrix kΣ, where k was a constant and Σ was a matrix that defined the dependence structure of the random effects [21, 22] . In unstructured random effects (iid) Σ was a diagonal matrix of 1s; and in random walk random effects Σ was defined assuming that increments (in rw1, Δ‫ݑ‬ = ‫ݑ‬ ௧ − ‫ݑ‬ ௧ିଵ ) followed a Gaussian distribution with zero mean and a constant precision k (Gómez-Rubio, 2020).

The spatially structured random effect S was normally distributed with zero mean and a Matérn covariance function:

where Κ ఔ is the modified Bessel function of the second type and order ߥ > 0. ߥ is a smoother parameter, ߪ ଶ is the variance and ߢ > 0 is related to the range (ߩ = √8 ߥ ߢ ⁄ ), the distance to which the spatial correlation is close to 0.1 (Lindgren et al., 2011) .

We carried out various sensitivity analyses in which we included explanatory and control variables linearly and non-linearly (quartiles). When we included the variables non-linearly, we tried different reference categories. We tested the interaction between air pollutants and the average income per person. In the model for daily incident positive cases we included, as an additional control variable, an indicator of the number of inhabitants of the city to which the ABS belonged. We estimated the models with or without the dummy variables that controlled the outlier. We estimated the model both without controlling and controlling the measurement error due to the use of the daily number of tests per confirmed case. In the latter case, we tested without interacting and interacting with the random effect that controlled temporal dependence. Also, in this last case, we tried different smoothers (specifically autoregressive of order 1 and random walk of order 2, with curves smoother than that of a rw1).

We compared the models using their predictive accuracy and evaluated them using the Watanabe-Akaike information criterion (WAIC) (Watanabe, 2010) and the Deviance Information Criterion (DIC) (Spiegelhalteer et al., 2002) .

To evaluate the existence of a geographical pattern in the incidence of and mortality from COVID-19, we represented the relative risks (RRs), estimated in the GLMM for daily incident positive cases and for daily mortality, on a map of the region under study (i.e. Catalonia). We represented the maps at two points in time: before and after the peak of the pandemic (which occurred in the first week of April). Maps at the ABS and at the region levels were obtained from the Department of Health, Government of Catalonia (2020) and from the Institut Geogràfic i Geològic de Catalunya (2020), respectively.

We also computed exceedance probabilities which are the probability that the smoothed relative risks were above 1 (Richardson et al., 2004) . Richardson et al. recommend using, as a specific interpretation rule, the cut-off 80% (and 20%). In this sense, when the exceedance probability is greater than 80% (less than 20%), a reasonable sensitivity will be achieved. That is, a large proportion of areas in which an excess (or a defect) of risk has been estimated correspond to areas that actually have an excess (or a defect) of risk (Richardson et al., 2004) . This cut-off can be used as a measure of the statistical significance of the smoothed risk and, furthermore, to help assess the existence of agglomerations of excess cases (i.e. clusters). The exceedance probabilities were also represented on a map of the study area.

Inferences were made following a Bayesian perspective, using the INLA approach (Rue et al., 2009 and . We used priors that penalize complexity (called PC priors). These priors are robust in the sense that they do not have an impact on the results and, in addition, they have an epidemiological interpretation .

All analyses were carried out using the free software R (version 4.0.0) (R Core Team, 2020), through the INLA package (Rue et al., 2009 and R INLA project, 2020) . The maps were represented using the leaflet package (Cheng et al., 2018) .

Descriptive results are shown in Tables 1 (bivariate analyses) and A1 (univariate analyses) (in supplementary material). All variables have fairly asymmetric distributions. For this reason, only robust summary statistics (median and quartiles) should be interpreted.

The results of the bivariate analyses show that, in general, the most polluted ABSs and regions were those that presented the highest number of positive cases and deaths per 100,000 inhabitants, respectively. However, note that they grew more or less monotonically as PM 10 levels did, only when deaths per 100,000 inhabitants were considered (Table 1b ). In the rest of the cases, the possible relationships between air pollutants and the number of positive cases and deaths (both per 100,000 inhabitants) did not appear to be monotonous. In this sense, the highest number of positive cases per 100,000 inhabitants occurred in the third quartile of PM 10 and NO 2 (Table 1a) . Note, in particular, the possible relationship between NO 2 and deaths per 100,000 inhabitants at the region level, with an imperfect V-shape where the right wing is longer than the left (Table 1b) .

The higher the average income per person in the ABS and in the region, the greater the number of positives and deaths per 100,000 inhabitants, respectively. The greater the population density of the ABS or the region, the greater the number of positive cases and deaths per 100,000 inhabitants. The possible relationship between the unemployment rate and the number of positive cases and deaths (both per 100,000 inhabitants) was nonlinear, with a maximum in the second (number of positive cases per 100,000 inhabitants, Table 1a ) and in the third quartile (deaths per 100,000 inhabitants, Table 1b ). Note that, after the maximum, the highest number of positive cases was in the first and third quartiles (Table 1a ) and the highest number of deaths in the second quartile (Table 1b) .

In the sensitivity analysis, regarding the functional form of the explanatory variables (linear or nonlinear, the latter represented parametrically, including the variable in quartiles) and the reference categories when the variable was entered non-linearly, the models with a better fit (lower WAIC and DIC) corresponded to those specified above. The interaction between air pollutants and the average income per person was not statistically significant in the two models. In the model for daily incident positive cases, the indicator of the number of inhabitants of the city to which the ABS belonged was not statistically significant either. Furthermore, both in the interaction and in the indicator, the models that included them did not show a better fit. However, the models that included the dummy variables to control the outlier, as well as those that controlled the measurement error due to the use of the daily number of positive cases (in interaction with the random effect that controlled temporal dependency) provided a much better adjustment (lower WAIC and DIC). Finally, the best smoother for the random effect indexed by the number of tests per day (in Catalonia) was the random walk of order 1 (rw1).

In Tables 2 and 3 we show the estimation results of the GLMM models with which we specified the association between air pollutants and the daily incident positive cases and daily deaths, controlling, in both cases, for socioeconomic and demographic variables, unobserved confounders and the spatial and the temporal dependency.

To facilitate the interpretation of the Relative Risks (RR) in the case of the variables introduced in the models linearly, we centred these variables, subtracting the mean. Furthermore, in Tables 2 and 3, in addition to the RR and their credibility intervals at 95% (95% ICr, from now on), the probability of the parameter estimator (the log (RR)) as an absolute value being more than 0 is also shown. Unlike the p-value in a usual environment (i.e. frequentist), this probability allows us to make inferences about possible associations.

For NO 2 , for every 1 µm/m 3 above the mean, the risk of a positive result increased by 2.7% (Relative Risk 1.027, 95% ICr: 1.008-1.047). In the case of PM 10 , for every 1 µm/m 3 above the mean, the risk of a positive result increased by 3.0% (95% ICr: 0.986-1.074) (Table 3) . However, in the latter case, the association was found only marginally significant (i.e. the 95% credibility interval contained the unit, but not the 90% credibility interval).

Associations between the levels of air pollution and the risk of dying were found for NO 2 (total deaths and deaths of men) and for PM 10 (only in deaths of men), although in all cases they were only marginally significant (Table 4) . Regions with levels of NO 2 exposure in the third quartile had 28.8% (95% ICr: 0.872-1.902) (total deaths) and 38.6% (95% ICr: 0.915-2.108) (deaths in men) greater risk than the regions located in the first two quartiles. The relative risk was even higher in the regions located in the fourth quartile, with a 35.7% (95% ICr: 0.848-2.161) greater risk in total deaths and 44.3% (95% ICr: 0.899-2.303) in male deaths. For every 1 µm/m 3 above the mean for PM 10 , the risk of death in men increased by 12.6% (95% ICr: 0.925-1.350).

With regard to the socioeconomic and demographic control variables, with the exception of average income per person and poor housing, both in mortality (Table 3) , associations were found for both the risk of a positive case and of death, although many of them were only marginally significant. The higher the population density, the greater the percentage of population aged 65 years old and the higher the percentage of poor housing in the small area (ABS for the three variables, region only in the first two), the greater the risk of a positive result and death. Conversely, the higher the unemployment rate and the percentage of foreigners in the small area (both in ABS and in the region), the lower the risk of a positive result and death. Note that with respect to the most economically favoured ABS (quartile 4), those ABSs located in the third and in the second quartiles (in this order) had the highest risk of a positive result (Table 2 ) (the average income per person was not associated with the risk of death). The higher the percentage of single person households, the lower the risk of a positive result (Table 2 ), but the higher the risk of death (Table 3) .

At least in terms of risk of death (data on small areas segregated by sex are only provided for mortality), sex could be a confounding variable of the association of the risk with air pollutants, single person households, and poor housing, since in these cases, the credibility intervals of 90% and 95% in the stratum of women contained the unit (Table 3) .

The association between the air pollutants and the risk of a positive case and of a death throughout the evolution of the pandemic are shown in Figures 1 and 2 , respectively. We observed that RRs increased up to a few weeks after the lockdown and then decreased (albeit oscillating). For mortality this week was from March 23 to 29 (third week after lockdown) (Figure 2 right) and for positive results it was somewhat earlier, the week from March 16 to 22 (second week after lockdown) for PM 10 and the week from March 9 to 15 (the week after lockdown) for NO 2 (Figure 2 left) . It should be noted, however, that the credibility intervals of the RRs overlapped (that is, we could not reject the null hypothesis that the RRs were the same) and that there were few statistically significant RRs (especially for PM 10 both in daily incident positive cases and in deaths) and most of them were only marginally significant (especially in NO 2 in deaths) (Figures 2).

The maps of the smoothed relative risks of the study area and of the exceedance probabilities are shown in Figures 3 (positive result) and 4 (death). The smoothed relative risks should be interpreted in relation to the smoothed RR average throughout Catalonia in the period considered (before and after the peak of the pandemic). Both the maps of smoothed relative risk of a positive result ( Figure 3 ) and those of a death (Figure 4) , closely resemble the maps of the exceedance probabilities (Figures A5a and Figure A5b in supplementary material, respectively). Most of the small areas (ABSs or regions), in which we estimated either an excess (RR> 1, exceedance probability>80%) or a risk defect (RR<1, exceedance probability<20%) corresponded, effectively, with areas with an excess or a defect of risk, respectively. In the maps of the smoothed relative risks of a positive result, the same geographic pattern was observed both before and after the peak of the pandemic (Figure 3 ). It is observed how the largest smoothed RRs were concentrated in three parallel J o u r n a l P r e -p r o o f axes from north-south direction, one on the coast, another in the interior and a third on the western limit of Catalonia. The first two coincided with densely populated ABSs and large road and rail infrastructures. The third, however, was made up of ABSs with very little population, suggesting that smoothed RRs could be very unstable. In the maps of the smoothed relative risks of death ( Figure 4 and Figure A5b in supplementary material), the highest (and statistically significant, i.e. exceedance probabilities>80% or <20%) smoothed RRs are observed in an area in the center of Catalonia, with the lower limit in the city of Barcelona ( Figure A5b in supplemental material).

In Figure A6 (in supplementary material) we show those ABSs and regions with statistically significant variation in the smoothed Relative Risks before and after the peak of the pandemic (first week of April 2020). Very few ABSs, and even fewer regions, varied their smoothed RR. Specifically, 18 ABS (out of a total of 372) went from having a smoothed RR of a positive result greater than unity to a RR less than unity, and most of them were ABS from the city of Barcelona (11 of 18). Only five went from having a smoothed RR less than unity to one greater than unity and only four regions out of a total of 42 went from having a smoothed RR of death greater than one to another less than one. All of these regions had very low population densities (Alt Camp, Noguera, Urgell and Vall d'Aran). No region was found in the opposite case.

We have found that long-term exposure to nitrogen dioxide (NO 2 ) and, to a lesser extent, to coarse particles (PM 10 ), have been independent predictors of the spatial spread of COVID-19 in Catalonia, from late February to mid-May 2020.

As we have found, significantly positive associations between air pollutants and COVID-19 cases were found by all studies that, in the systematic review we carried out, evaluated the association with air pollutants. The problem is that not all of them are totally comparable. Only Coccia (2020) and Fattorini and Regoli (2020) evaluated, as we did, long-term exposure to air pollutants. However, the dependent variable used by both studies (number of cumulative cases) differed from ours. In addition, as a difference with respect to our study, Fattorini and Regoli (2020) 

allowed, as in our case, non-linear relationships (approximated in a non-parametric way through GAM models) between the pollutants and the new daily cases of COVID-19. Secondarily, part of the discrepancy could also be due to the probability family used for the response variable (Gaussian in Zhu et al., 2020; and in Li H et al., 2020; Poisson in Jiang et al., 2020) . Adhikari et al. (2020) , who evaluated the influence between meteorological variables and air pollutants on the incidence and mortality from COVID-19, found that, among pollutants, daily maximum eight-hour ozone concentrations were significantly and positively associated with new confirmed cases related to COVID-19 (although not so PM 2.5 ). However, neither meteorological variables nor air pollutants showed significant associations with deaths related to COVID-19.

All studies evaluating the association between population density and COVID-19, found, as we did, that increases in population density were associated with increased COVID-19 morbidity and mortality. Although they differ in the study population (55 Italian cities in Coccia, 2020; Iran in Ahmadi et al., 2020 ; the 27 state capital cities of Brazil in Pequeno et al., 2020;  and the 13 districts of Wuhan, China in You et al., 2020) , in the response variable (number of cumulative cases (Coccia, 2020; Pequeno et al., 2020; You et al., 2020-; infection rate -Ahmadi et al., 2020-: and deaths -Coccia, 2020-) , in the variables they controlled for (meteorological - Coccia, 2020; Ahmadi et al., 2020; Pequeno et al., 2020-; air pollutants -Coccia, 2020-; and socioeconomic variables -Pequeno et al., 2020; You et al., 2020-) Only three studies evaluated the influence of income. Unlike us, Azar et al. (2020) and You et al. (2020) , found that the most economically favored areas show less morbidity from COVID-19. You et al. (2020) found that increasing GDP per unit of land area (13 districts of the city of Wuhan, China) was associated with a decreased COVID-19 morbidity rate. Azar et al. (2020) , whose study population consisted of a cohort of individual patients residing in 21 counties from Northern California, USA (ten of them in San Francisco Bay), found that COVID-19 positive patients residing in ZIP codes within the third and fourth quartiles of income were less likely to be admitted to hospital than those residing in the bottom ZIP code quartile. However, Price-Haywood et al. (2020) , who also used an individual data cohort from the Ochsner Medical Centre, New Orleans, Louisiana, USA, found that black race was not associated with higher in-hospital mortality than white race, after adjustment for differences in sociodemographic and clinical characteristics on admission.

However, we have important differences with respect to Azar et al. (2020) and You et al. (2020) . Firstly, the dependent variable differed from ours, the COVID-19 morbidity rate (number of cumulative cases divided by the population of the district) in You et al. (2020) Azar et al.) . You et al. (2020) , as in our case, found that both the percentage of the population aged 65 or over and the ratio of total floor area to the number of residential buildings (indicator of poor housing) were risk factors of morbidity due to COVID-19.

None of the studies reviewed considered the unemployment rate, the percentage of foreigners or of single person households in the area.

The Instituto de Salud Carlos III (Ministry of Health, Government of Spain) (2020) has already carried out two rounds of the National Study of sero-Epidemiology of SARS-CoV-2 Infection in Spain (ENE-Covid). ENE-Covid is a large population-based sero-epidemiological longitudinal study, whose objectives are to estimate the prevalence of SARS-Cov2 infection in Spain by determining antibodies against the virus and evaluating its temporal evolution. The estimated prevalence of IgG antibodies against SARS-Cov-2 in Spain is 5.2% (95% CI:4.9%-5.5%) (5.0%; 95% CI: 4.7%-5.4%, in the first round). When stratified by employment situation, the highest prevalence is presented by retirees (6.0%, 95% CI:5.3%-6.7%), followed by active workers (5.7%, 95% CI:5.5.2%-6.2%). The lowest prevalence, after those who are dedicated to charitable activities (2.7%, 95% CI:0.6%-10.8%), is presented by the unemployed (3.2%, 95% CI: 2.5%-4.0%; men 3.8%, 95% CI: 2.8%-5.0%; women 2.9%, 95% CI:2.1%-3.9%).

These results are in line with ours. The higher the percentage of unemployment the small area (ABS or region) has, the less positive results and fewer deaths from COVID-19 it presented. Although it is true that the information on the unemployment rate referred to 2011 (the worst year of the Great Recession in Spain) and that, today, the percentages are much lower, the spatial distribution of the rate is practically identical (that is, those areas with lower rates continue to have lower rates and vice versa).

None of the studies reviewed evaluated the influence of the percentage of foreigners. Our results, however, are in line with those we found for the average income. The higher the percentage in a certain small area, the lower the income of that area and, therefore, the lower incidence and mortality from COVID-19.

Nor did any study evaluate the percentage of single person households in the small area. Our findings, although contrary in positive results and deaths, could be explained by Figure A7 (in supplementary material) . When considering ABSs (positive results), that is the general population, single person households are inhabited by high-income people. However, when the region (deaths), that is the elderly population, is used as a small area, single person households are inhabited by low-income people. Thus, the RRs for this variable were in line with the RR for income.

It has been argued that there could be potential biological mechanisms that may explain the association between air pollutants and respiratory viral infections, including influenza, pneumonia, and SARS (Ciencewicki and Jaspers, 2007) . Wu et al. (2020) , in a very recent study evaluating the effects of long-term average exposure to fine particulate matter (PM 2.5 ) on the risk of COVID-19 death in the United States at the small area level (3,087 counties), point out that exposure to PM 2.5 adversely affects the respiratory and cardiovascular systems, increasing mortality risk. Furthermore, exposure exacerbates the severity of COVID-19 infection symptoms and worsens the prognosis of COVID-19 patients .

Although we do not rule out that these mechanisms could partially explain our findings, we are more inclined to suppose that, in our work, air pollutants have actually been surrogates of another variable, the mobility of residents in the small area (ABS or region). There are several pieces of evidence that support this. First, that NO 2 was the pollutant that we found statistically significantly associated in the two small areas (that is, for a positive case result and for death). PM 10 was not found statistically significant when we considered the region (that is, deaths) and, in addition, in this case NO2 was found only marginally significant. This suggests that the effects of exposure to air pollutants that we found were related to urban traffic. In fact, in the city of Barcelona, while 60% of NO 2 originates from traffic, this origin is only 21% for PM 10 (Barcelona City Council, 2020). Conversely, while 71% of PM 10 is generated outside the municipality, only 13% of NO 2 is generated outside (Barcelona City Council, 2020). It is not unreasonable to suppose that these figures can be extrapolated to the entire Barcelona Metropolitan Area, which comprises 41.75% of the total population of Catalonia. Residing in an area with significant urban traffic (i.e. with high long-term exposure to the pollutants generated by it, especially NO 2 ) means residing in an area with greater mobility and, therefore, a higher risk of contagion.

On the other hand, the fact that we found that sex has been a confounding variable of the association between exposure to air pollutants and the risk of dying from COVID-19, contributes to this first evidence. Deaths, in both men and women, occurred mainly in the elderly population. However, elderly women in this age group are much less mobile than men (for whom statistically significant relationships were found).

This evidence can also be confirmed by the fact that, after an increase in RRs up to one to three weeks after the lockdown (the incubation time, first, and worsening for the seriously ill and eventual death, later) the RRs associated with exposure to air pollutants barely varied (credibility intervals overlapped), even though exposure to air pollutants was substantially reduced. The same phenomenon could be behind the finding that the smoothed RRs were very similar before and after the peak of the pandemic (the first week of April).

The second piece of evidence is provided by the effects the socioeconomic variables have on the risk of a positive outcome and death. In the most economically favored areas (with more average income per person, lower percentage of foreigners from low-and middle-income countries, and a lower J o u r n a l P r e -p r o o f unemployment rate) mainly people with middle-high incomes reside there. In the period that we have analyzed, these people have been mobile very little. During the confinement, they have stayed at home, teleworked, and those who had to go anywhere have done so with their own vehicles. Residents in the most economically disadvantaged areas have also been mobile very little. Most of those who were working have lost their jobs due to the lockdown and the subsequent closure of economic activity. However, residents in the intermediate areas (those located in quartiles 2 and 3 of average income per person) have been the most mobile. They had to go to work (at least since the restrictions on work in non-essential activities were lifted after Easter in the second week of April) and they commuted, mainly, by public transport.

High mobility, and therefore a greater possibility of contact, has been found to be associated with a higher spread of the disease (Ahmadi et al., 2020; Pequeño et al., 2020) .

Another argument in favor of our hypothesis could be drawn from the discussion about the route of transmission of the virus. Some authors have suggested that the rapid spread of the SARS-CoV-2 could be explained not only by person-toperson transmission, but also by air pollution-to-human transmission (ie airborne transmission) (Frontera et al., 2020; Hadei et al., 2020; Morawska and Cao, 2020) . Coccia (2020) , even, has suggested that the transmission dynamics of COVID-19 could be due to air pollution-to-human transmission, rather than the direct human-to-human transmission (Domingo et al., 2020) . Bontempi (2020b) , however, argues that it is not possible to conclude that COVID-19 diffusion mechanism also occurs through the air, by using PM 10 as a carrier. In particular, she showed that Piedmont cities, presenting lower detected infections cases in comparison to Brescia and Bergamo in the investigated period, had most sever PM 10 pollution events in comparison to Lombardy cities. In fact, Bontempi et al. (2020) argue that the current pandemic's diffusion patterns are caused by a multiplicity of environmental, economic and social factors and that the spread of the infection is more nonlinear and depends on measures, health care system's efficiency and other factors, which could even explain different trajectories. For example, Bontempi (2020a) found a strong correlation (even higher than with air pollutants) between commercial exchanges (that exemplify the social interactions) and COVID-19 diffusion in Italy. Thus, it could happen that the contagion was greater in those areas that, in addition to greater mobility, had high economic/commercial exchanges.

Our study might have some limitations. First of all, we used an ecological design. The potential ecological fallacy should be taken into account and, when interpreting the results, no inferences should be made at the individual level. Also, there might still be unmeasured confounding bias inherent in this type of design. However, we have tried to control for bias, including in the models both observed variables, demographic and socioeconomic variables, as well as, unstructured and structured confounders, random effects, that captured unobserved confounders at the small area level, and spatial and temporal dependence, respectively. On the other hand, although data are not yet available at the individual level, we are sure that when we have the data, we will be able to contrast the findings of our study with them.

Second, all the variables we had were measured with error. As we explained, we control for non-random measurement errors, both those associated with using an estimator of the incidence of COVID-19, and those associated with estimating long-term exposure to air pollutants. However, we were unable to control other measurement errors. Governments and agencies do not use the same COVID-19 death definition. For example, according to the Ministry of Health (Government of Spain) in Catalonia, until June 21, 2020, there were 5,666 deaths (Secretaría General de Sanidad, 2020). However, according to the Government of Catalonia there were 6,456 deaths (Open Government, 2020) . For the Government of Catalonia, one death by COVID-19 is both the one who had a positive result on some test (PCR or fast test) and the one who presented symptoms at some point and a sanitary professional classified them as a possible case, but they did not have a diagnostic test with a positive result (Open Government, 2020) . For the Government of Spain, until May 21, one death by COVID-19 was someone who presented a positive result by PCR. From then on, it uses the same definition as in Catalonia, although it has not yet updated the information prior to that date.

As we said, with the exception of population density, all other socieconomic and demographic variables were observed at the census tract level and then they were averaged, weighted by population, to obtain their value in the small area. However, not all residents in the small area actually had these mean values of the variables, leading to a measurement error, most likely random. If the explanatory variables are measured with error, the estimators will be inconsistent (Greene, 2018) . Nevertheless, if the between-area variability of the variable measured with error is much greater than the within-area variability of such variable then the effect of measurement error on the estimator consistency may be negligible (Elliott and Savitz, 2008) . In our case, this occurred with the ABSs but not with the regions. As we pointed out, the boundaries of the ABSs are determined based on geographical, demographic, social and epidemiological factors, as well as on the accessibility the population has to services. A region is an administrative unit that, in most cases, is quite heterogeneous. In any case, when, as in the region, this criterion is not met, the presence of measurement errors tends to underestimate the effect of the variable measured with error (Greene, 2018) .

Third, we were unable to distinguish whether the effects of the variables that we found associated with the spread of COVID-19 were more related to initial contagion or subsequent diffusion mechanisms (Bontempi et al., 2020) . Our hypothesis is that we have provided evidence of the different starting conditions with which they found the small areas (through both observed and unobserved confounders, such, in particular, individual heterogeneity) for, in the words of Bontempi et al. (2020) , the development of the different contagion trajectories in each of those small areas.

Fourth, there are a number of limitations inherent in spatio-temporal ecological designs. In them it is essential to minimize within-area exposure variability and maximize between-area exposure variability [49] . As we have mentioned, this is true when we used ABSs. On the other hand, these designs assume that the exposure that a person has suffered is the same in that of the area in which their residence has been identified. In our case, that person's residence was identified from their health card (Open Government, 2020) . However, the person does not have to have always resided in the same area and/or could have been exposed in different areas. Fortunately, this measurement error, although unavoidable in any spatio-temporal ecological design, is also random. Another problem is known as the 'modifiable areal unit problem' (MAUP). The MAUP, which refers to data aggregation in units for analysis, is a potential source of bias that affects spatial studies using aggregated data. Wang and Di (2020) found that the association between NO2 and COVID-19 deaths varies when the data is aggregated at different levels. Thus, they found that NO 2 was a protective factor on mortality in the Hubei Province, however, it was a risk factor when considering an aggregation of Wuhan districts (from 13 to four districts). Similar differences were discovered in the Henan Province, where the positive city-level association became negative when the aggregation strategy was employed (Wang and Di, 2020) . In our case, perhaps part of the nonsignificance of PM 10 in daily mortality could be attributed to MAUP.

Finally, from a methodological point of view, in the models we have assumed that space and time were independent. In other words, the spatial variability of the pandemic did not vary over time. Although the results seem to confirm our assumption, we estimated another GLMM for daily incident positive cases, with the same specification, but assuming that, although space and time were not independent, they were separable. The estimators of the RRs were very similar to those shown in this work. However, we believe that these could vary with a dependent and non-separable time and space specification. In any case, this aspect deserves further research.

We believe that these limitations are offset by the strengths of our study. In particular, we highlight four. First, although we are not the only ones to consider small areas below the municipality (You et al., 2020 , analyzed the districts of Wuhan, China), we are in using a spatio-temporal model to evaluate the effects of long-term exposure to air pollutants on the spatial spread of COVID-19. Second, we used models in which we control for a wide range of confounders, observed and unobserved, and for spatial and temporal dependence, that is, for the spread of the disease in the territory and over time. Third, we have shown the robustness of our results to a different specification errors (i.e. outliers, nonrandom measurement errors). Finally, we have exclusively used open data.

Although it is possible that there are biological mechanisms that explain, at least partially, the association between long-term exposure to air pollutants and COVID-19, we hypothesize that the spatial spread of COVID-19 in Catalonia is attributed to the different ease with which some people, the hosts of the virus, have infected others. That facility depends on the heterogeneous distribution at a small area level of variables such as population density, poor housing and the mobility of its residents, for which exposure to pollutants has been surrogate. 

Adjusted by the dummy variable to control the outlier, by the number of tests per day, by individual heterogeneity (at the Basic Health Area level), by spatial dependence and by temporal dependence.

Reference category in square brackets.

Prob(abs(log(RR))>0) higher than 0.95. Prob(abs(log(RR))>0) higher than 0.90.

J o u r n a l P r e -p r o o f 

Adjusted by the dummy variable to control the outlier, by individual heterogeneity (at the region level), by spatial dependence and by temporal dependence. 95%CrInt 95%: Credibility interval. Reference category in square brackets. Prob(abs(log(RR))>0) higher than 0.95. Prob(abs(log(RR))>0) higher than 0.90.

J o u r n a l P r e -p r o o f 

Short-term effects of ambient ozone, PM 2.5 and meteorological factors on COVID-19 confirmed cases and deaths in Queens

Investigation of effective climatology parameters on COVID-19 outbreak in Iran

The coronavirus pandemic and aerosols: Does COVID-19 transmit via expiratory particles?

Institut Català de la Salut

Disparities in outcomes among COVID-19 patients in a large health care system in California

Long term effects of traffic noise on mortality in the city of Barcelona

Barcelona Air Quality Improvement Plan

Commercial exchanges instead of air pollution as possible origin of COVID-19 initial diffusion phase in Italy: More efforts are necessary to address interdisciplinary research

First data analysis about possible COVID-19 virus airborne diffusion due to air particulate matter (PM): the case of Lombardy

Understanding COVID-19 diffusion requires an interdisciplinary, multi-dimensional approach

Centers for Disease Control and Prevention (CDC). Coronavirus Disease 2019 (COVID19. Symtoms of Coronavirus

Epidemiological and clinical characteristics of 99 cases of 2019 novel coronavirus pneumonia in Wuhan, China: A descriptive study

leaflet: Create Interactive Web Maps with the JavaScript 'Leaflet' Library

Air pollution and respiratory viral infection

Factors determining the diffusion of COVID-19 and suggested strategy to prevent future accelerated viral infectivity similar to COVID-19

Immission data from the measurement points of the Air Pollution Monitoring and Forecasting Network

Layer with the basic areas of health, health sectors, areas of care management, and health regions

Influence of airborne transmission of SARS-CoV-2 on COVID-19 pandemic. A review

Design issues in small-area studies of environment and health

European Centre for Disease Prevention and Control (ECDC)

Role of the chronic air pollution levels in the Covid-19 outbreak risk in Italy

Regional air pollution persistence links to COVID-19 infection zoning

Bayesian Inference with INLA

Econometric Analysis, 8th Edition

A letter about the airborne transmission of sars-cov-2 based on the current evidence

Nacional de Estadística. Indicators for census tracks

Spanish Census of Population and Housing

4736177012&menu=resultados&secc=1254736195461&idp=1254734710990#!t abs-1254736195557

INE. Instituto Nacional de Estadística. Household Income Distribution Atlas 2020b

Enterprises/Downloads/Geoinformation-layers/Administration-boundaries, last accessed on

Ministry of Health. Government of Spain. National Study of sero-Epidemiology of SARS-CoV-2 Infection in Spain (ENE-Covid)

Effect of ambient air pollutants and meteorological variables on COVID-19 incidence

Air pollution and temperature are associated with increased COVID-19 incidence: A time series study

Clinical characteristics of 25 death cases with COVID-19: A retrospective review of medical records in a single medical center

An explicit link between Gaussian fields and Gaussian Markov random fields: the stochastic partial differential equation approach

Airborne transmission of SARS-CoV-2: the world should face the reality

Case-Fatality Rate and Characteristics of Patients Dying in Relation to COVID-19 in Italy

Generalitat de Catalunya. Open data and COVID-19

Air transportation, population density and temperature predict the spread of COVID-19 in Brazil

Hospitalization and mortality among Black patients and White patients with Covid-19

R: A language and environment for statistical computing. R Foundation for Statistical Computing

Interpreting posterior relative risk estimates in disease-mapping studies

Approximate Bayesian inference for latent Gaussian models using integrated nested Laplace approximations (with discussion)

Bayesian computing with INLA: A review

Update 135. Coronavirus disease (COVID-19)

Penalising model component complexity: A principled, practical approach to constructing priors (with discussion)

Bayesian measures of model complexity and fit (with discussion)

United Nations Development Program

Modifiable areal unit problem and environmental factors of COVID-19 outbreak

Asymptotic equivalence of Bayes cross validation and widely applicable information criterion in singular learning theory

WHO Virtual press conference on COVID-19

Exposure to air pollution and COVID-19 mortality in the United States: A nationwide cross-sectional study

Possible environmental effects on the spread of COVID-19 in China

Distribution of COVID-19 morbidity rate in association with social, and economic factors in Wuhan, China: Implications for urban development

Association between short-term exposure to air pollution and COVID-19 infection: evidence from China

This study was carried out within the 'Cohort-Real World Data' subprogram of CIBER of Epidemiology and Public Health (CIBERESP). We appreciate the comments of two anonymous reviewers of a previous version of this work who, without doubt, helped us to improve our work. The usual disclaimer applies.J o u r n a l P r e -p r o o f

The manuscript is an original contribution that has not been published before, whole or in part, in any format, including electronically. All authors will disclose any actual or potential conflict of interest including any financial, personal or other relationships with other people or organizations, that could inappropriately influence or be perceived to influence their work, within three years of beginning the submitted work.

Data and code are available upon request to Marc Saez (marc.saez@udg.edu).

Not applicable.

MS had the original idea for the paper. MS designed the study. The bibliographic search and the writing of the introduction were carried out by all the authors. The methods and statistical analysis were chosen and performed by MS. MAB created the tables and figures. All authors wrote the results and the discussion. The writing and final editing was done by all authors. All authors reviewed and approved the manuscript.