key: cord-1044419-feuvmcnt authors: Barros, V.; Manes, I.; Akinwande, V.; Cintas, C.; Bar-Shira, O.; Ozery-Flato, M.; Shimoni, Y.; Rosen-Zvi, M. title: A causal inference approach for estimating effects of non-pharmaceutical interventions during Covid-19 pandemic date: 2022-03-02 journal: nan DOI: 10.1101/2022.02.28.22271671 sha: 4430076370d9f9d81e5ce59101038dec314df372 doc_id: 1044419 cord_uid: feuvmcnt In response to the outbreak of the coronavirus disease 2019 (Covid-19), governments worldwide have introduced multiple restriction policies, known as non-pharmaceutical interventions (NPIs). However, the relative impact of control measures and the long-term causal contribution of each NPI are still a topic of debate. We present a method to rigorously study the effectiveness of interventions on the rate of the time-varying reproduction number Rt and on human mobility, considered here as a proxy measure of policy adherence and social distancing. We frame our model using a causal inference approach to quantify the impact of five governmental interventions introduced until June 2020 to control the outbreak in 113 countries: confinement, school closure, mask wearing, cultural closure, and work restrictions. Our results indicate that mobility changes are more accurately predicted when compared to reproduction number. All NPIs, except for mask wearing, significantly affected human mobility trends. From these, schools and cultural closure mandates showed the largest effect on social distancing. We also found that closing schools, issuing face mask usage, and work-from-home mandates also caused a persistent reduction on Rt after their initiation, which was not observed with the other social distancing measures. Our results are robust and consistent across different model specifications and can shed more light on the impact of individual NPIs. The coronavirus disease 2019 (Covid-19) pandemic has caused an enormous impact on 2 the economy and on global public health. As of January 1 st 2022, the disease had over 3 290 million cases and more than 5 million deaths recorded in over 200 countries and 4 territories [1] . In response to the state of emergency declared by the World Health 5 Organization (WHO) in January 2020, governments worldwide have introduced multiple 6 restriction policies, known as non-pharmaceutical interventions (NPI), to mitigate the 7 spread of the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). Today, 8 two years from the first outbreak registered in Wuhan China, the relative impact of 9 control measures and the long-term causal contribution of each NPI are still a topic of 10 debate. 11 February 24, 2022 1/18 throughout the world, which was substantially reduced in the subsequent waves. Thus, 31 we focus on the initial outbreak period, when the long-term consequences of the virus 32 were still poorly understood, vaccines were still not available, and policy-makers were 33 not certain which control measures would be effective. With this approach, we hope to 34 suggest a method to infer the efficiency of restriction measures and to inform future 35 urgent preparedness response plans in the time-critical phase of a pandemic. 36 We estimate the impact of individual NPIs on social distancing and Covid-19 spread 37 using causal analysis methodology, taking into account confounding factors such as 38 concurrent NPIs, Covid-19 morbidity measures, and country-level socio-economic and 39 demographics factors. We tested two different variables as outcomes in the causal effect 40 estimation. First, we analysed how NPIs influence the mobility trends across different 41 categories of places such as residential and retail/recreation areas. In the second 42 approach, we evaluated the effect on the growth rate of the reproduction number R t , 43 i.e., the rate by which the pandemic spreads. To the best of our knowledge, this is the 44 first study to quantify NPI effectiveness using a causal inference framework in such a 45 wide geography coverage, while accounting for confounding biases and performing 46 sensitivity analyses to assess the robustness of our findings. 47 Materials and methods 48 Data collection and pre-processing 49 Non-pharmaceutical interventions 50 We extracted the data on the restriction policies in 113 countries using the Worldwide 51 Non-pharmaceutical Interventions Tracker for Covid-19 (WNTRAC) [20] , a 52 comprehensive database consisting of over 8, 000 Covid-19-related NPIs implemented 53 worldwide. WNTRAC was updated periodically until October 5 th 2021, and we use the 54 latest release for the analyses described hereby, which covers the period until the day 55 before. For our experiments, we selected a subset of the five most well-defined and 56 frequently imposed NPIs in the data: confinement, entertainment/cultural sector 57 closure, work restrictions, mask-wearing, and school closure. Table 1 shows each NPI 58 with its respective description. To infer people's dynamic behavioral response to restriction policies, we obtained Google 61 mobility data [21] . These reports include the per-day change in movement across 62 different categories of places compared to a baseline day before the pandemic outbreak. 63 Similar to [17] , we chose the change in duration of time spent in residential areas as a 64 primary metric to measure social distancing and policy adherence. We additionally 65 considered the effect of NPIs on the changes of movement in retail and recreation areas, 66 generally seen as nonessential visits. For each category, we smoothed weekly patterns by 67 using the seven-day rolling averaged mobility. When values were missing, we performed 68 a linear interpolation. We used country-level information for all countries in our analysis, 69 except for the US, which contained state-level data for both NPI and mobility data. (WDI) [22] , we selected a subset of variables, including access to electricity, outdoor air 75 pollution, and forest area. We also included country-level health information, ranging 76 from life expectancy at birth, smoking rate, and prevalence of undernourishment. As we 77 have indicators per year, we take the most recent metrics available per country. Similar 78 covariates were used in previous works [3, 4] . 79 We also included variables describing population distribution, age-structure, and 80 human development index from Our World in Data (OWID) [23] . To avoid producing 81 highly biased estimates with missing values imputation, we only analysed countries with 82 data available for at least 70% of the WDI and OWID variables and imputed the 83 remaining missing features with the mean. We characterize each country with a cluster 84 ID obtained through DBSCAN clustering [24] (scikit-learn, version 1.0.1) performed on 85 the vector of socio-economical and health features (see Table 1S robust analytical estimates and extensive use in the disease epidemiology literature. Because estimating the serial interval distribution may not be possible in the early 102 phase of an outbreak, or may be associated with significant uncertainty as countries still 103 had to establish their documentation practices, we exclude the period before 100 104 cumulative cases of each country from our dataset. Therefore, the window of analysis 105 for each country starts on the day the cumulative cases exceeded 100 and ends on June 106 1 st 2020. When concatenating all data sources, we found an intersection of 113 countries 107 that were used in our final analysis. Countries like China, Iran and part of central 108 African countries were excluded due to missing mobility data (see geography selection in 109 Appendix E2). TreeExplainer from the SHapley Additive exPlanations (SHAP) [30] package was fit on 120 the validation set to estimate the association and contribution of each feature to the 121 XGBoost model. 122 We evaluated the accuracy of the predictions using the mean-squared error (MSE). 123 To compare between the performance of different models at different scales, we 124 normalized the estimates and the observed values in the test set before computing the 125 MSE. The 95% confidence intervals (CI) were obtained with 100 bootstrapping 126 iterations. Potential outcome framework 128 We formulated causal effects in terms of the potential outcome framework [31] . Each 129 country i at each point in time t is characterized by a feature vector X i,t , consisting of 130 dynamic variables -the cumulative and new number of cases/deaths per million -and a 131 static variable -a socioeconomic cluster ID, which is country-specific and constant over 132 time. In addition, we included information regarding the status of the restrictions. This 133 is a feature vector containing a set of binary features corresponding to whether other 134 NPIs are in place at that time point. We denote Y . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted March 2, 2022. ; https://doi.org/10.1101/2022.02.28.22271671 doi: medRxiv preprint i,t , the difference in the country-level mobility defined as in [32] and i,t , the change in SARS-CoV-2 transmission represented by a ratio of the 138 reproduction number R t , defined as in [9] . We evaluated the outcomes with respect to a 139 time-lag w (in days), as follows: In this study, we considered the effect on the outcomes 2 weeks after the NPIs were 141 enacted ± 1 week (i.e., w = 7, 14 and 21). Our goal was to estimate the Average 142 Treatment Effect on the Treated (AT T ), defined as where AT T M and AT T R denote the causal effects of human mobility trends and R t , 145 respectively. Thus, a null effect of NPIs on mobility would be equivalent to AT T M = 0. 146 Similarly, a null effect on R t corresponds to AT T R = 1. To identify such a causal effect, we make several assumptions: (i) conditional cohort study exclusively separating samples (in this case, days) from the treatment and 156 control groups. For each NPI k ∈ K, we defined an event as the day NPI k was enacted 157 in a certain country, given that this intervention was not in action the day before. We 158 denote this day t * k . We defined the treatment group for NPI k as the set of t * k in the 113 159 countries during their respective period of analysis (Fig 1) . The control group includes 160 all days in the cohort except t * k ± w ∀k where w is the number of days before and after 161 the event, and it is also the same time-lag period defined in Eq 1 and Eq 2. The control 162 cohort represents a period where none of the five NPIs were enacted (an event did not 163 occur), but they might still be in place during that period. Unlike the treatment groups, 164 which are NPI-specific, the control groups are the same for all NPIs used as treatment. 165 We denote day t * k (in red) as the day when NPI k was enacted, considering that this NPI was not active the day before (i.e., event date). The treatment group for NPI k consists of t * k across all countries. The control group are the remaining days (in blue) outside the time interval t * k ± w, where w is a lag period. February 24, 2022 5/18 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted March 2, 2022. ; https://doi.org/10.1101/2022.02.28.22271671 doi: medRxiv preprint Covariate balancing 166 We used Adversarial Balancing (AdvBal) [33] to estimate the mean potential outcome 167 of the control group. In brief, AdvBal borrows principles from generative adversarial 168 networks to assign weight to each sample in some source data, such that the resulting 169 weighted data becomes similar to a given target data. Unlike the original paper, we 170 defined only the treated group as the target population, and not the entire population, 171 i.e., we fixed the weights of the treated samples and adjusted the weights of the control 172 in order to balance the groups. The implementation of AdvBal and the resulting 173 balancing evaluation of the models were done with the Python package causallib [34] 174 (version 0.7.1; https://github.com/IBM/causallib). The final effects and the associated 175 95% confidence intervals were computed with 1, 000 bootstrap samples. Sensitivity analysis 177 We performed sensitivity analyses to test our conclusions under different scenarios. 178 First, we reran our framework with a different model to estimate treatment effect from 179 observational data. We used inverse probability weighting (IPW) [35] , a longstanding Second, we performed a complementary analysis to test whether the estimated 186 effects were consistent when using an alternative study design. In this approach, the 187 treatment groups are the same as in the original study (Fig 1) . On the other hand, the 188 control group consists of events from the remaining NPIs (apart from the NPI used as 189 treatment). For example, if we were to estimate the effects of NPI k ∈ K, the treated 190 group would contain only the events for this specific NPI, t * k , whereas the control group 191 would be the events of all other NPIs K -k , as long as these events do not overlap with t * k . 192 As a preliminary step, we examined the frequency at which each NPI was imposed in 195 the 113 countries (Fig 2) . Since governments introduced and lifted lockdowns and social 196 distancing measures consistently throughout time, we observed a continuous growth in 197 the number of confinement events (Fig 2A) . This behavior was the opposite for mask NPIs are co-linear in this period, i.e., they frequently co-occurred. For example, out of 214 57 confinement events, 12 of them occurred within a period of two weeks (seven days 215 earlier or later) of a work restriction event in the same country. When high co-linearity 216 exists, individual effect estimates are more challenging, as it is harder to identify what 217 in fact drives the change in SARS-CoV-2 transmission. To overcome this problem and 218 to control for the influence of individual NPIs on the estimated effects, we opted to add 219 the status of NPIs as confounders to our causal model (recall section Potential outcome 220 framework). features are kept the same but the outcomes undergo different permutations. We found 228 that for all scenarios considered in this study (different time lags and different outcome 229 variables), the features we utilized showed enough statistical power to allow a good 230 prediction ( Fig 3C) . Associations between the outcome variables and baseline features 231 can be visualized using SHAP "beeswarm" plots, that show the top-10 contributing 232 features for the outcomes prediction using a time-lag of 7 days (Fig 3A-B) . The 233 strongest predictor of R t was its own value 7 days before, implying a clear positive 234 association between R t measures within a one-week period (Fig 3A) . Such strong a 235 association was not observed with other important features, such as confirmed cases and 236 deaths. Changes in mobility within several categories also impacted the R t prediction. 237 In particular, an increase in mobility in workplaces and retail contributed to a higher 238 SARS-CoV-2 transmission. The feature importance analysis of residential mobility 239 prediction revealed that mobility categories are highly correlated: as the time spent at 240 home increases, the movement outside residence decreases (Fig 3B) . . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted March 2, 2022. ; SHAP analysis in A and B is based on predictions using a time-lag of 7 days. Each dot represents a single data sample in the validation set (i.e., country at a date). The dot color represents the feature value (red=high, blue=low). The farther a dot is from 0 on the x-axis, the more effect (positive or negative) this feature had on the prediction model for this particular sample. (C) Model performance (log of normalized MSE) of R t (solid dark blue line) and residential mobility (solid green lines) in the validation set for different time lags. The dashed lines of the same color correspond to random prediction derived by permutation tests with their respective models. Both models significantly outperformed random prediction. We report the estimated causal effects AT T M and AT T R without balancing weights 252 (unadjusted) and with balancing weights generated by the AdvBal algorithm and IPW 253 (Fig 4) . All estimates are presented with 95% confidence intervals and were obtained 254 after assessing whether the balancing weights approaches were able to reduce biases. 255 Our results suggest that the full extent of NPIs, except mask wearing, significantly 256 affected human mobility change as early as 7 days after their initiation. However, 257 changes in mobility varied between these 4 restrictions. School and cultural closures 258 caused a quick and sustained increase in the time spent inside the home, whereas 259 confinement and work-from-home orders had a slower and plateauing effect over time. 260 Estimates from the AdvBal algorithm suggest that, following the introduction of NPIs, 261 the time spent at home 14 days later was estimated to increase by 9.06% [95% CI: In both plots, the opacity of the markers represents the ability of the balancing weights method to balance the treatment groups: the smaller the ASMD is, the more opaque the markers are. Fully opaque markers indicate an ASMD < 0.1, half-transparent markers indicate 0.1 ≤ ASMD ≤ 0.25 and most transparent ones represent ASMD > 0.25. Apart from mask wearing mandates, all NPIs caused a significant increase in time spent at home. Of those, school and cultural closures were the most effective. Closing schools, issuing face mask usage, and work-from-home mandates also caused a persistent reduction in R t after their initiation, which was not observed with the other social distancing measures. Code used for generating figure is available at https://github.com/barakm-ki/symptoms-dynamicsof-COVID-19-infection/blob/master. . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted March 2, 2022. ; https://doi.org/10.1101/2022.02.28.22271671 doi: medRxiv preprint suggesting a comparable effect between each other. IPW and the AdvBal algorithm 277 showed similar trends in their resulting effects, but the balancing of covariates was 278 marginally better on the latter (2S Fig, Appendix E4) . We constructed a dataset that combines rich information about countries and their 286 reaction to the urgent need to control the pandemic spread. The data include 287 information on social-economics and health benefits, NPIs, and mobility data from more 288 than 100 countries. We showed that the data have predictive power, and that the 289 prediction of changes in mobility after imposing NPIs is more accurate than the 290 prediction of the reproduction number. We employed a causal inference approach to 291 quantify the effect of NPIs on the rate of R t (i.e., transmission of SARS-CoV-2) and on 292 the change of human mobility, which is considered a proxy measure of population 293 adherence and social distancing. The purpose of this study was to help infer the 294 efficiency of interventions in the early months of a pandemic, when a number of control 295 measures had already been imposed by multiple countries in the absence of vaccines. The most common use of causal inference seeks to estimate the average treatment 297 effect (ATE). Such an analysis would answer questions such as "what would have 298 happened if every country applied the NPI?". However, in this case an analysis of this 299 type proved to be unfeasible, since it was not possible to balance the confounding 300 differences between the treated and untreated countries. We therefore opted to 301 measuring the impact of NPIs in the countries that chose to impose the NPIs, which is 302 known as the average effect on treated (ATT). This answers the question of "what was 303 the effect of applying the NPIs in the countries that applied it?". This approach allowed 304 covariate balancing, which provided more reliable estimates of the effect of NPIs. Our findings showed that mask wearing did not significantly impact mobility 306 patterns in the first wave. Although a number of countries favored face mask usage 307 early in their outbreaks [39] , the main reason people changed their behavior was social 308 distancing policies. We found that issuing confinement, work-from-home orders, or 309 school/cultural closure mandates resulted in high levels of policy compliance even one 310 week after their initiation, as measured by changes in movement in residential areas. This result was consistent with recent findings [32, 40] . The estimated effect on R t showed that not all NPIs significantly contributed to a 313 decrease in SARS-CoV-2 transmission. In particular, school closure achieved a sustained 314 decline on the rate of R t , similar to what was found in observational studies of the first 315 wave [4, 5, 7-9] and second European wave [41] . Because infected children can experience 316 mild or no symptoms more frequently than older individuals [42] and tend to have more 317 social contact than adults [43] , it is expected that closing schools would considerably 318 contribute to reduce the transmission. In contrast, on its own, closing cultural 319 establishments does not seem to have an effect on the reproduction number, nor do 320 work restrictions in the first 14 days after imposing the NPIs. Notably, we did not find 321 substantial differences in the results when performing sensitivity analysis. Our study extends previous first-wave estimate studies [2, 4, 5, 7-10] in a number of 323 ways. First, we used the potential outcome framework to infer NPI effects. In its 324 simplest form, our causal model made use of standard causal inference methods to 325 correct observed biases and obtain valid effects with more transparent confidence 326 February 24, 2022 9/18 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted March 2, 2022. ; https://doi.org/10.1101/2022.02.28.22271671 doi: medRxiv preprint intervals. Second, we addressed the issue of concurring NPIs by using their status as 327 covariates in the causal model. By ensuring that all NPI-related covariates are well 328 balanced between treatment groups, we enhanced the power to detect independent NPI 329 effects. Third, we account for heterogeneity of countries by including a social 330 economical cluster indicator in the dataset. Fourth, we conducted complementary 331 analyses under alternative scenarios to test our conclusions not only with a different 332 cohort study design, but also with another balancing weights generation approach. 333 We acknowledge several limitations in our analyses. Even with data from multiple 334 countries that had diverse sets of interventions in place, inferring NPI effects still 335 remained a challenging task. First, the R t estimation was based on epidemiological 336 parameters that are only known with uncertainty, due to many mild or asymptomatic 337 cases that make it difficult to model the timing for the onset of symptoms and serial 338 interval distributions. On top of that, R t also relies on the data of confirmed cases, 339 which were generally unreliable in the early days of the pandemic due to lack of testing 340 availability and not-established documentation practices. To account for this, we began 341 our analysis at each country's 100 th case. Secondly, the data are retrospective and 342 observational, meaning that unobserved factors could confound the results. Third, we 343 were unable to assess the effect of lifting interventions. Since the NPI events in 344 WNTRAC are automatically extracted from Wikipedia articles, which report the 345 introduction of NPIs more frequently than their relaxation, the number of lifting events 346 documented in the database did not have enough statistical power for causal inference. 347 Yet, we believe we set the ground for a thorough analysis of NPIs and we were able to 348 draw conclusions regarding the effect different NPIs had on the pandemic spread. Because we had more locally fine-grained mobility and NPI data available for the United States, we used the state-level information of all 50 states. Identification assumptions The Rubin Causal Model [31] used in this study is an approach to causal inference that is based on a framework of potential outcomes. Under this framework, causal effects are estimated with comparisons of potential outcomes under the two different treatments: one that received the intervention, and another that received a different intervention (e.g., placebo or no treatment). In our context, these two groups are defined as days in countries where NPIs were imposed vs. days in countries where NPIs were not imposed. Causal estimates from this source are unbiased when three assumptions are met, namely (i) conditional exchangeability, (ii) Stable Unit Treatment Value Assumption (SUTVA) and (iii) positivity. Conditional exchangeability states that the counterfactual outcomes are conditionally independent of the treatment given the set of covariates. Under this assumption, the treatment groups are "exchangeable", i.e., there are no unmeasured confounders that are a common cause of both treatment and the outcome [45] . In other words, if all confounders are measured, then we can assume that exchangeability holds within the strata dictated by the confounders, and we can estimate the causal effect by using methods that eliminate the confounding (e.g.,, by emulating a randomized controlled trial [RCT] using balancing weights methods). This is because randomization ensures that the covariates associated with the outcome are equally distributed between the treatment groups. In reality, in observational studies as the one we studied here, one cannot empirically verify that conditional exchangeability holds or there is no unmeasured confounding. Thus, causal inference relies on subject-matter knowledge to identify possible confounders in the data, so that the assumption is at least approximately true. SUTVA stipulates that the outcome of one unit should not be affected by another unit's treatment assignment. Although SUTVA plays a central role in the identification of causal effects, this assumption does not hold in many settings. For example, when a certain country bans inbound flights from its neighbouring countries (introduction of a new NPI), the observable outcome (mobility change, R t rate or any other Covid-19 morbidity measure) of the banned countries are certainly affected by this treatment assignment. Another classical example is given in epidemiology, where the possibility of an individual becoming infected depends on whether the population is vaccinated. In our study of NPI effectiveness, SUTVA is an unrealistic assumption. Yet, we claim that our final estimated effects are still a good enough approximation. We direct the reader to the work of Hudgens and Halloran [46] . In their paper they do a review of previous studies that estimated causal effects in the presence of interventions' interference. Positivity, also called a lack of covariate overlap, is the assumption that any sample has a positive probability of receiving all values of the treatment variable. In other words, to identify causal effects pertaining to a treatment A, there must be some probability of receiving A given a certain baseline of covariates X, otherwise treatment versus control causal effects cannot be identified. Mathematically, P r(A = a|X = x) > 0 for all x where P r(X = x) = 0. The positivity assumption has the very important consequence of ensuring that features in both treatment groups are equal in their distribution. In our work, we empirically verify to some extent whether positivity holds by checking whether the distribution of covariates is similar between the two treatment groups. By inspecting the success of the weighting method used (in our case, either IPW or the AdvBal algorithm [33] , we are ensuring that positivity holds in our dataset. In the next section, we discuss the balancing evaluation of the applied weights in more detail. February 24, 2022 16/18 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted March 2, 2022. ; https://doi.org/10.1101/2022.02.28.22271671 doi: medRxiv preprint Dashboard-WHO Coronavirus (COVID-19) Dashboard With Vaccination Data Estimating the effects of non-pharmaceutical interventions on the number of new infections with COVID-19 during the first epidemic wave Assessing mandatory stay-at-home and business closure effects on the spread of COVID-19 Inferring the effectiveness of government interventions against COVID-19 The impact of non-pharmaceutical interventions on SARS-CoV-2 transmission across 130 countries and territories. medRxiv The temporal association of introducing and lifting non-pharmaceutical interventions with the time-varying reproduction number (R) of SARS-CoV-2: a modelling study across 131 countries. The Lancet Infectious Diseases The effect of large-scale anti-contagion policies on the COVID-19 pandemic Causal impact of masks, policies, behavior on early covid-19 pandemic in the US Impacts of introducing and lifting nonpharmaceutical interventions on COVID-19 daily growth rate and compliance in the United States Modeling COVID-19 scenarios for the United States Strong Social Distancing Measures In The United States Reduced The COVID-19 Growth Rate: Study evaluates the impact of social distancing measures on the growth rate of confirmed COVID-19 cases across the United States Differential effects of intervention timing on COVID-19 spread in the United States Heterogeneity in the Effectiveness of Non-Pharmaceutical Interventions during the first SARS-CoV2 wave in the United States. Frontiers in public health Impacts of social distancing policies on mobility and COVID-19 case growth in the US Association between statewide school closure and COVID-19 incidence and mortality in the US School closure and management practices during coronavirus outbreaks including COVID-19: a rapid systematic review. The Lancet Child & Adolescent Health AI-assisted tracking of worldwide non-pharmaceutical interventions for COVID-19 World Development Report 2021: Data for Better Lives. The World Bank Human development index, age structure, population distribution A density-based algorithm for discovering clusters in large spatial databases with noise World Health Organization. Coronavirus disease 2019 (COVID-19): situation reports A new framework and software to estimate time-varying reproduction numbers during epidemics Improved inference of time-varying reproduction numbers during infectious disease outbreaks Serial interval of COVID-19 among publicly reported confirmed cases. Emerging infectious diseases Xgboost: A scalable tree boosting system A unified approach to interpreting model predictions Estimating causal effects of treatments in randomized and nonrandomized studies Measuring the effect of Non-Pharmaceutical Interventions (NPIs) on mobility during the COVID-19 pandemic using global mobility data Adversarial balancing for causal inference An Evaluation Toolkit to Guide Model Selection and Cohort Definition in Causal Inference The central role of the propensity score in observational studies for causal effects If the world fails to protect the economy, COVID-19 will damage health not just now but also in the future Permutation tests for studying classifier performance Prognostic score-based balance measures can be a useful diagnostic for propensity score methods in comparative effectiveness research An evidence review of face masks against COVID-19 Mobility restrictions were associated with reductions in COVID-19 incidence early in the pandemic: evidence from a real-time evaluation in 34 countries Understanding the effectiveness of government interventions against the resurgence of COVID-19 in Europe Age-dependent effects in the transmission and control of COVID-19 epidemics Social contacts and mixing patterns relevant to the spread of infectious diseases AutoEpsDBSCAN: DBSCAN with Eps automatic for large dataset Causal inference Toward causal inference with interference Clustering of countries based on socio-economic and health variables We characterized each country by a cluster ID representing its socio-economic and health status. We used DBSCAN (Density-Based Spatial Clustering of Applications with Noise) [24] because the number of clusters generated was not limited to a pre-defined constant. The parameters eps and Minpts were decided based on experimental trials. First, we arbitrarily set five countries to be the minimum number of samples in a neighborhood and used the Euclidian metric to calculate distances between instances. The parameter eps, i.e., the maximum distance two countries can be from one another while still belonging to the same cluster, was selected according to a k-dist graph [44] ; this graph calculates the distance between a point and its k-th nearest points (in this case, k = 5). We then plotted the distances in ascending order on a k-distance graph and chose the value at the point of maximum curvature, where the graph has the highest slope. Finally, noise samples were manually assigned to the defined clusters using domain knowledge. This procedure resulted in four clusters (1S Fig) . Finally, we one-hot encoded the cluster identifier and represented the socioeconomic status with four binary variables: cluster 1, cluster 2, cluster 3, and cluster 4. The list of features used in the DBSCAN algorithm are detailed in 1S Table with their descriptive statistics. Countries are colored by their respective cluster ID. The DBSCAN algorithm found a total of 4 clusters: a large group comprising mostly African, South/Central American, and South East Asian countries (in dark green); another cluster predominantly with Eastern European countries (light green); the third cluster with the USA, Canada, and part of Western Europe (light orange); and lastly, a group of countries composed of the UK, along with a few Western European and Asian countries (dark orange). Geography selection The geography was selected based on the intersection of countries with available NPI data, mobility trends and at least 70% non-missing variables from socioeconomical and health databases ( Balancing evaluation A fundamental step required to produce reliable effect estimates is to control for systematic differences between the treatment and control groups. To this end, balancing weights methods, such as adversarial balancing (AdvBal) [33] , generate weights that reduce the observed confounding biases, and thus can be used to emulate a randomized controlled trial (RCT) by re-weighting the population. To evaluate the performance of AdvBal, we analysed the distribution of all covariates across the treatment groups before and after re-weighting samples with AdvBal. The covariate balancing plot in 2S Fig is an example of an evaluation plot when work restrictions was assigned as treatment. In the plot, the difference in distribution between treatment groups is quantified by the absolute standard mean difference (ASMD), defined as the absolute value in the difference in means of a covariate between the treatment groups, divided by its standard deviation in the treated group. A small ASMD value represents a good balance, while a value larger than some threshold is considered imbalanced. In this study, we considered a threshold of 0.1 as a reasonable cutoff for acceptable ASMD [38] . We observed that before balancing, many features were biased between the treated and untreated groups. AdvBal was able to minimize this discrepancy, bringing the ASMD to less than 0.1 for all covariates. NPIs effect on places of recreation The mobility data for retail and recreation areas represents the change (relative to the period before the pandemic) in the number of visitors to places like restaurants, shopping centers, and libraries. Complementary analysis of effect of NPIs In an alternative cohort study design, the treatment group is composed of all events of a certain NPI of interest, whereas the control group contains all events of the remaining NPIs. Thus, instead of estimating the effect of NPIs compared to a period when no NPIs are imposed, we aimed at estimating the effects of individual NPIs relative to others. We consider this scenario a more Appendix E7