key: cord-1026171-fh311q2n authors: Pineda Rojas, A. L.; Cordo, S. M.; Saurral, R. I.; Jimenez, J. L.; Marr, L. C.; Kropff, E. title: Relative humidity predicts day-to-day variations in COVID-19 cases in the city of Buenos Aires date: 2021-02-01 journal: nan DOI: 10.1101/2021.01.29.21250789 sha: 4925ae67d8ab78bc5c9def2812b56363f5ed74ce doc_id: 1026171 cord_uid: fh311q2n Possible links between the transmission of COVID-19 and meteorology have been investigated by comparing positive cases across geographical regions. Little is known, however, about the degree to which meteorological conditions drive the daily dynamics of COVID-19 spread at a given location. The main limitation is that individual waves of the disease are typically abrupt and eventful, making correlations somewhat anecdotal. In contrast, we here present a long-term case study for the city of Buenos Aires, which has suffered a single prolonged wave of spread during 2020, with most significant changes in policy and population behavior taking place before the main local outbreak. We found that humidity plays a prominent role in modulating the variation of COVID-19 positive cases through a negative-slope linear relationship, with an optimal lag of 9 days between the meteorological observation and the positive case report. This relationship is specific to winter months, when relative humidity predicts up to half of the variance in positive cases. Our results provide a tool to anticipate local surges in COVID-19 cases after events of low humidity. More generally, they add to accumulating evidence pointing to dry air as an important driver of global COVID-19 transmission. Increasing evidence points to aerosols as a main mode of transmission of COVID-19, mostly occurring in indoor environments 1, 2 . Factors such as temperature and humidity can influence the transmission of respiratory viruses, as suggested by studies in laboratory-controlled conditions 3, 4 . The stability of influenza and other viruses in suspended aerosols, mimicking airborne transmission conditions, increases with low temperature and humidity 5, 6 . The exchange of water and heat with the environment also affects the physics of the droplets, determining their fate through deposition or dispersion 7, 8 . In addition, low levels of humidity also affect the immune response of the host, as demonstrated in laboratory animals 9 . Experiments of airborne transmission between animals confirm the overall facilitatory role played by low humidity in this complex chain of events 10 . Since the start of the pandemic, several studies have assessed the relationship between the number of daily COVID-19 cases and meteorological conditions in different regions of the world [11] [12] [13] [14] [15] [16] . Indoor air conditions are not generally monitored, but widely available outdoor meteorology can be used as a proxy of indoor conditions, albeit mediated by levels of heating and ventilation 3 . Variables such as humidity, temperature, solar radiation, precipitation or wind speed have been found to co-vary with positive cases. These studies have typically taken two approaches. The first is to compare the spread of the disease across geographical regions. In this kind of study, variability in meteorological conditions is attained by including enough spatially distant locations, but differences other than meteorological (e.g., mitigation policies) between locations are a confounding factor. The second approach is to correlate the rising phase (or bursts) of the outbreak on a single location with a number of meteorological variables that might have modulated it. The challenge in this kind of study relates to the fact that many meteorological variables have marked yearly variations, and correlations with an abrupt increase in the number of COVID-19 cases might be spurious, simply reflecting two parallel but independent trends. In addition, the dynamic of local mitigation policies and the response of the population represent confounding elements for this approach. Due to these shortcomings, the question of whether or not meteorology can influence COVID-19 spread is still open, with sensible arguments on both sides 17, 18 . Here we present a long-term study of the influence of meteorological variables on the spread of COVID-19 in the city of Buenos Aires (CBA), Argentina. CBA has a humid subtropical climate with four distinct seasons, and is a useful case study for several reasons. First, over 5% of its 2.9 million inhabitants have suffered from COVID-19 in 2020, as All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted February 1, 2021. ; https://doi.org/10.1101/2021.01.29.21250789 doi: medRxiv preprint confirmed by PCR tests, during a single long wave of the spread starting at the beginning of May and reaching its peak by the end of August ( Figure 1A) . Second, although the area of the city is considerable (209 km 2 ), it is located on a flat terrain, so that surface meteorological conditions can be assumed to be horizontally homogeneous 19 . Third, the most disruptive policies were enforced when there were still fewer than 50 daily cases (lockdown: March 20 th ; facemasks mandatory in public spaces including shops, etc.: April 15 th , 2020) in the hope of flattening the curve. Once it was clear that cases were rising, these measures were renewed roughly every 15 days, effectively extending the lockdown until November 9 th , 2020. This period included the whole winter (June to September). Although during this period the population developed a certain level of relaxation toward some of the guidelines, this change in attitude had a slow progression, minimizing its possible influence on day-to-day variations in positive cases. Meteorological data corresponding to the station located at the domestic airport (AEP; World Meteorological Organization station number 87582) from January to November, 2020 was obtained from public databases provided by the National Oceanic and Atmospheric Administration (NOAA, US) 20 and OGIMET (www.ogimet.com). We included as meteorological variables for our study the surface daily mean values of relative humidity (RH), temperature (T), wind speed (WS), pressure (P), precipitation (PP) and sky cover (SC), as well as the surface daily minimum (Tmin) and maximum (Tmax) values of hourly temperature. Data on COVID-19 cases (SARS-CoV2 positive PCR tests) in CBA for the same period were obtained from the daily reports by Health Ministry of Argentina, compiled and curated by Sistemas Mapache (github.com/SistemasMapache/Covid19ar Data). Data on the date in which symptoms began, available for only 60% of positive cases, were obtained from the Health Ministry of Argentina open data webpage (datos.salud.gob.ar/dataset). All data were processed through custom scripts written in MATLAB (MathWorks, Natick, MA). Pre-processing In order to study the mid-range dynamics of variability of new COVID-19 positive cases in CBA, avoiding the slow (wave of the pandemic) and fast (day-of-week effects) modulations, we defined the weekly difference of any given variable X(t) as the value taken by this variable on day t minus its value 7 days before: Equation (1) was applied to daily COVID-19 positive cases to obtain DCovid(t) and to its meteorological counterparts (e.g. DRH(t) for relative humidity). To study the relationship between variables we used linear models and the Pearson correlation coefficient, including a potential lag between the day of the meteorological observation and that of the COVID-19 positive case report. Significance for a linear relationship against the null hypothesis of no contribution of a variable was obtained as a p-value through a t-test using the MATLAB function fitlim(). The optimal lag for a given variable was defined as the lag that maximized the significance and the absolute value of its correlation with DCovid(t). Note that the D transformation introduced in Equation 1, while convenient for a number of reasons, has a small disadvantage. Since a given value of RH(t) impacts on both DRH(t) and DRH(t+7), a strong correlation at a given lag is typically mirrored by moderately strong correlations of opposite signs at lags 7 days earlier or later. To avoid any ambiguity, in all cases the optimal lag was determined by the global maximum of significance, regardless of other local maxima. Furthermore, while the D transformation is central to our results, an independent analysis that did not use it (see section 'eventtriggered averages' below), was applied to the same data to confirm the typical size of optimal lags. Optimization of the multivariate linear model Full 10-fold cross-validation of all possible combinations of the 8 meteorological variables (all linear models with a number of variables ranging from 0 to 8) was implemented. For each combination of variables, the data was divided into 10 random subgroups of equal size. Each subgroup was once set aside and used as a test set for the linear model trained with the remaining 9 groups. The one-standard-error criterion was used to choose the best model, defined as the one with fewest variables that had a test sum of squared residuals not higher than the lowest one plus one standard error 21 . All linear combinations of variables were tested, including up to first order interactions, for two variations of the data: a) each variable with its own optimal lag and b) all variables using the optimal lag for DRH. To corroborate the results we repeated the analyses using the MATLAB lasso() function, which implements the choice of the optimal model by penalizing the sum of the absolute value of linear coefficients under the one-standard-error criterion 21 . Event-triggered averages To study the evolution of meteorological variables relative to peaks of DCovid(t) or the evolution of COVID-19 positive cases relative to extreme events of RH, the relevant events were identified, temporal windows of regular size defined around them, and variables averaged by overlapping these temporal windows. The 7-day average of COVID-19 positive cases was used to understand the evolution of cases relative to extreme RH events along these short temporal windows. This evolution was computed as a percentage relative to the value on the day of the event by normalizing the 7-day average for each window by its value on that day. The curve of daily new COVID-19 cases in CBA exhibits a rich dynamic with different scales of temporal evolution ( Figure 1A ). As in many such datasets, fast and slow dynamics are observed. The slow dynamic reflects waves of spread of the pandemic, which in the particular case of CBA took during 2020 the form of a single, slowly modulated wave starting in May and reaching its peak in August. Day-of-week fluctuations (especially weekend vs. weekday differences) drive the fast dynamic, reflecting a multiplicity of factors that presumably range from the number of potentially dangerous social encounters to variations in testing and bureaucratic processing of the data. In addition, a third, intermediate dynamic range is apparent in this dataset, represented by fluctuations with an irregular periodicity in the range of 2-4 weeks. In the present work, we characterized these mid-range fluctuations following the hypothesis that they are at least in part rooted on meteorological conditions. To do so, we applied Equation 1 to all variables. Compared to other methods for filtering out day-of-week fluctuations, such as the 7-day average, this procedure has the advantage of allowing for instantaneous day-to-day fluctuations, without smoothing the variables along the temporal dimension. We first studied the Pearson correlation between DCovid(t) and its meteorological counterparts ( Figure 1B ). Since meteorology is not expected to affect the outcome of positive COVID-19 cases reported on the same day, but rather those reported some days later, lags ranging from 0 to 20 days were included in the analysis. We found that DRH, DT and DTmin had a significant correlation (p < 0.01) with DCovid with an optimal lag of 9 days. The same happened for DTmax with a lag of 1 day, for DWS with a lag of 14 days and for DP with a lag of 15 days. As discussed in the Methods section, some variables exhibited a 7-day periodicity in the p-value curve, a direct consequence of Equation 1, but only the global minimum was considered to determine the optimal lag. Other variables did not exhibit a significant correlation with DCovid(t) regardless of the lag. Of all variables, DRH(t -9days) exhibited the most extreme correlation with DCovid(t) (Pearson correlation: -0.47; t(218): -7.9; p: 2 x 10 -13 ). The evolution of DCovid(t) and negative DRH(t -9 days) was strikingly similar ( Figure 1C) , especially so when considering the numerous confounders expected to mediate a relationship between meteorology and the reporting of COVID-19 positive cases (individual variability in symptomatic response, test processing time or indoor vs. outdoor RH relationship to cite only a few). To understand the relative importance of DRH compared to the other potential predictors, we applied two cross-validation methods (full 10-fold cross-validation and Lasso) to identify the linear combination of 8 meteorological variables (including individualized lags and first order interactions) that best modelled DCovid(t). Both methods led to an identical conclusion, pointing to as the best linear model of DCovid(t), where bX stands for the linear coefficient for variable X. The optimal lag of 9 days includes time for the development of symptoms, testing and bureaucratic processing of data. To dissect this time window, we analyzed a second database that included, for around 60% of confirmed positive cases, the date in which symptoms began. Defining DCovidS in terms of number of patients with symptoms triggered at a given date, we found that its optimal lag relative to DRH was of 5 days (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted February 1, 2021. ; https://doi.org/10.1101/2021.01.29.21250789 doi: medRxiv preprint classified the data by month of the COVID-19 report date (t) from March to October, 2020. For each month, we plotted DCovid(t) vs. DRH(t -9 days) and obtained the individual slopes (linear coefficient bRH in Equation 2) and correlation coefficients ( Figure 1D ). We observed that the modulation of DCovid by DRH was only highly significant during the winter months (June to August), with similar values of bRH. For transition months (May and September) the relationship weakened toward not significant, regardless of whether the overall number of COVID-19 positive cases was low (as in April) or high (as in October). Given that cross-validation procedures showed no significant interactions, this seasonal effect cannot be explained by the meteorological variables considered here. Mean monthly temperature, however, correlated with bRH across months (Pearson correlation: 0.88; t(6): 4.6; p: 4 x 10 -3 ). We speculate that rather than causality, this correlation could reflect indirect mechanisms such as seasonal changes in behavior that do not rely on day-to-day meteorology (such as habits regarding ventilation, the use of heating or outdoor vs. indoor gathering). Our results could provide a tool to anticipate local surges in COVID-19 due to low RH. To demonstrate this, we first studied the average evolution of RH in anticipation of events of extreme DCovid (Figure 2A ). For the 7 peaks in DCovid that were higher than 200 positive cases, a trough in RH extending roughly from days 15 to 5 prior to the peak and reaching an average value of 55% at its minimum, was observed. Inversely, we asked what was the evolution of the 7-day average of COVID-19 cases after extreme low values of RH took place ( Figure 2B ). We observed an increase in COVID-19 cases starting 5 to 7 days after troughs of RH of different magnitude, peaking on average at day 11. For the 5 lowest humidity events (< 40%) that took place since the start of the pandemic, this implied an average increase of more than 20% in the 7-day average of COVID-19 positive cases. This study provides a fresh perspective on the day-to-day dynamic of the COVID-19 pandemic. Further efforts should be directed to understand which of our results can be replicated in other locations of the world and which are specific to the CBA outbreak (and why). In addition, information regarding the type of transmission in each case (close proximity vs shared-room scale), if available, could shed light on the mechanisms by which RH modulates the number of COVID-19 cases. In close proximity, the physics of exhaled aerosols is likely to play a predominant role, while in shared-room transmission the viral decay rate could be more important. If, instead, RH modulated transmission mainly through its impact on the respiratory immune system, no difference would be expected between these groups. Further development in this area would improve our understanding of not only COVID-19 but possibly other respiratory diseases that pose a threat to human health and welfare. Airborne transmission of SARS-CoV-2 Mechanistic insights into the effect of humidity on airborne influenza virus survival, transmission and incidence Seasonality of respiratory viral infections. Annual review of virology 2020 High Humidity Leads to Loss of Infectious Influenza Virus from Simulated Relationship between humidity and influenza A viability in droplets and implications for influenza's seasonality How far droplets can move in indoor environmentsrevisiting the Wells evaporation-falling curve. Indoor air Violent expiratory events: on coughing and sneezing Low ambient humidity impairs barrier function and innate resistance against influenza infection Influenza virus transmission is dependent on relative humidity and temperature The effects of atmospheric stability with low wind speed and of air pollution on the accelerated transmission dynamics of COVID-19 Significance of geographical factors to the COVID-19 outbreak in India. Modeling earth systems and environment Relationship between COVID-19 and weather: Case study in a tropical country The effect of climate on the spread of the COVID-19 pandemic: A review of findings, and statistical and modelling techniques Misconceptions about weather and seasonality must not misguide COVID-19 response Factors determining the diffusion of COVID-19 and suggested strategy to prevent future accelerated viral infectivity similar to COVID High PM10 concentrations in the city of Buenos Aires and their relationship with meteorological conditions An introduction to statistical learning Incubation period of 2019 novel coronavirus (2019-nCoV) infections among travellers from Wuhan, China The manuscript was written through contributions of all authors. / All authors have given approval to the final version of the manuscript. Notes Any additional relevant notes should be placed here. CBA city of Buenos Aires; RH relative humidity; T temperature; WS wind speed; P pressure; PP precipitation.