key: cord-236070-yao5v598 authors: Carneiro, Carlos B.; Ferreira, I'uri H.; Medeiros, Marcelo C.; Pires, Henrique F.; Zilberman, Eduardo title: Lockdown effects in US states: an artificial counterfactual approach date: 2020-09-28 journal: nan DOI: nan sha: doc_id: 236070 cord_uid: yao5v598 We adopt an artificial counterfactual approach to assess the impact of lockdowns on the short-run evolution of the number of cases and deaths in some US states. To do so, we explore the different timing in which US states adopted lockdown policies, and divide them among treated and control groups. For each treated state, we construct an artificial counterfactual. On average, and in the very short-run, the counterfactual accumulated number of cases would be two times larger if lockdown policies were not implemented. The evolution of the Covid-19 has been posing several challenges to policymakers. Decisions have to be made in a timely fashion, without much undisputed evidence to support them. Being a new disease, and despite the enormous research effort to understand it, estimates of the transmission, recovery and death rates remain uncertain. Nevertheless, these are key pieces of information to assess potential pressures on the health system capacity, as well as the need of a lockdown policy and its intensity if implemented. Not surprisingly, similar regions have implemented different strategies regarding lockdowns. The leading example in the media is the looser social distancing policy in Sweden versus strict policies in its Scandinavian peers. By informally comparing the evolution of the pandemics in Sweden and Denmark (or Norway), many commentators argue that several Covid-19 cases and deaths in Sweden would be avoided in the short-run were a strict lockdown in place. 1 Aiming to provide a quantitative assessment on the short-run effects of lockdowns, this paper takes this exercise seriously in the context of US states. Given that the timing US states adopted lockdown policies differs among them, we adopt techniques based on synthetic control (SC) approach of Abadie and Gardeazabal [2003] and Abadie et al. [2010] to assess the impact of lockdowns on the short-run evolution of the number of cases and deaths in the treated US states. 2 More specifically, we consider an extension of the original SC method called Artificial Counterfactual (ArCo) which was put forward by Carvalho et al. [2018] . Due to the nonstationary nature of the data, the correction of Masini and Medeiros [2019] is necessary. Our results point to a substantial short-run taming of the cumulative number cases due to the adoption of lockdown policies. On average, for treated states, the counterfactual accumulated number of cases, according to the method adopted here, would be two times larger were lockdown policies not implemented. A key feature of our approach is that it is purely data-driven. In the beginning of the crisis, the majority of papers written by economists to evaluate the effectiveness of lockdowns relied on epidemiological models for analysis, including the most recent ones that incorporate behavioral responses. 3 These models are hard to discipline quantitatively. Many calibrated parameters remain uncertain, 4 and models that incorporate behavioral responses need time to mature and agree on a reliable set of ingredients and moments to be matched. Model-free approaches like ours or Medeiros et al. [2020] should complement policy discussions or forecasting exercises based on those models, especially from a quantitative point of view. There are related papers using state or county level US data. 5 At least one of them, , uses a synthetic control approach but it is restricted solely to California. Other papers, such as Brzezinski et al. [2020] , and Sears et al. [2020] , use variations in the timing of statewide adoption of containment policies, and difference-in-differences models to document substantial reductions in mobility and improvements of health outcomes. The key identification assumption in these papers is that variations in the timing are random after controlling for covariates. Brzezinski et al. [2020] also consider an instrumental-variable approach. Fowler et al. [2020] and Grassi and J. Sauvagnat [2020] follow similar empirical strategies but at county level, and also find substantial reductions in cases and fatalities in counties that adopted stay-at-home orders and state-mandated business closures, respectively. Our analysis, that rests on alternative identification assumption and method, should be seen as complementary. As the pandemic evolves, and more data become available, we expect more related empirical evidence to be consolidated. The paper is organized as follows. Section 2 describes the data, while Section 3 presents the empirical strategy. The results are discussed in Section 4. Finally, Section 5 concludes the paper. Additional results are included in the Appendix. Data on Covid-19 (confirmed) cases are obtained from the repository at the Johns Hopkins University Center for Systems Science and Engineering (JHU CSSE). We consider the cumulative cases for a subset of the 50 US states and the District of Columbia. Instead of using the chronological time across the states, we consider the epidemiological time, which means that the day one in a given state is the day that the first Covid-19 case was confirmed there. The econometric approach adopted here relies on the fact that some states adopted a lockdown strategy (the treatment), whereas others did not adopt social distancing measures (control group) and are used to construct the counterfactual. 6 Lockdown strategies include a mix of state-wide non-pharmaceutical measures aiming to limit social interactions, such as restrictions on nonessential activities and requirements that residents stay at home. containment policies. 4 See, for example, Atkeson [2020a] on the uncertainty regarding estimates of the fatality rate. 5 There are also related papers for other countries. For example, Fang et al. [2020] for China. 6 The timing of those policies at each state were obtained, and double checked, in several press articles, e.g., https://www.businessinsider.com/us-map-stay-at-home-orders-lockdowns-2020-3 and https://www.nytimes.com/interactive/2020/us/coronavirus-stay-at-home-order.html. In this section, we describe how we assign states to control and treatment groups, and then, describe the method used to construct the counterfactuals. Aiming to balance control and treatment states, and at the same time obtain enough observations to estimate properly the model before the lockdown policy was implemented, we divide US states into three groups. For a state to be included in the analysis, a state-wide lockdown policy must be established at least twenty days after the first case. We assume that whenever an individual becomes infected, it takes an average of ten days to show up as a confirmed case in the statistics. 7 Hence, the in-sample period used to estimate the synthetic control ("before" the lockdown policy) for each treated state (to be defined below) is the number of days between the tenth day after the first confirmed case and the tenth day after the lockdown strategy was implemented. We choose to start the in-sample from the tenth day as a way to smooth the initial volatility of the data. We adopt a criteria that a state must have at least twenty observations in the in-sample period to be included in the analysis. This criteria excludes states that adopted a state-wide lockdown strategy too early, such as Connecticut, New Jersey, Ohio, among others. These are the unmarked states in Table 1 , which reports the dates of the first case and lockdown policy, as well the difference in days between them, and also helps visualize the three groups of states. The remaining states must be divided into treated and control groups. The idea is to find a synthetic control for each of the treated states. The group of potential controls should consist of states that adopted a lockdown policy too late (or never adopted), such that counterfactuals are not contaminated by lockdown policies implemented in those states. At the same time, and for a similar reasoning, the lockdown strategies adopted in treated states must be in place during the period of analysis. 8 7 This assumption is motivated by the incubation period of the virus. According to the World Health Organization, the "[...] the incubation period for COVID-19, which is the time between exposure to the virus (becoming infected) and symptom onset, is on average 5-6 days, however can be up to 14 days." See https://www.who.int/docs/default-source/coronaviruse/situation-reports/20200402-sitrep-73-covid-19.pdf. 8 In the Appendix A.1, Table A .1 shows the reopen dates for the treated states. Fortunately, there are horizons that can balance both goals: enough states to build the synthetic controls and a relative extensive period to construct the counterfactuals. In particular, we restrict the analysis up to the 58th epidemiological day. This figure accommodates at least ten control states to build the synthetic controls, 9 at the same time it maximizes the out-of-sample days to run the counterfactuals. In this sense, our analysis concerns the very short-run impact of lockdowns, up to nearly three weeks. The treated states are marked in blue in Table 1 , and include twenty states: Alabama, Colorado, Florida, Georgia, Kansas, Kentucky, Maine, Maryland, Mississippi, Missouri, Nevada, New Hampshire, New York, North Carolina, Oregon, Pennsylvania, Rhode Island, South Carolina, Tennessee, and Texas. The potential control states are marked in red, and include ten states: Arizona, Arkansas, California, Illinois, Iowa, Massachusetts, Nebraska, North Dakota, South Dakota, and Washington. Nonetheless, due to the lack of variation within the in-sample period, we exclude four states from this control pool as we explain below. Importantly, Oklahoma, Utah and Wyoming only implemented partial lockdowns (not reported in the table). Therefore, they are hard to classify as either treated or control states. We opt to exclude them from the analysis. Figure 1 illustrates the empirical strategy, which is formalized in the next subsection. It plots the evolution of (log) cumulative cases along the epidemiological time. The first vertical dashed line represents the tenth day after the first confirmed case. The in-sample period is represented in between the first and second vertical dashed lines, which mark the tenth day and the following twenty days, respectively. Similarly, the out-of-sample period is in between the second and third vertical dashed lines, which mark the 31th and 58th epidemiological day, respectively. Blue lines represent the treated states, whereas the red ones the potential control states. The turning points from blue full-to dashed-lines represent the days lockdowns were implemented (plus ten days) in treated states. Note that New York is clearly an outlier among the treated states, exhibiting a huge amount of cases (more on that below). We use the red lines to build synthetic controls for each full blue-line up to the turning point, and then construct counterfactuals by simulating the synthetic controls forward up to the 58th day. The idea is to compare them with the blue dashed-lines that capture actual cases. As Figure 1 highlights, some states display lack of variation within the in-sample period. Just to give an example, Washington had had only one confirmed case for the first 36 days since its first confirmed Covid-19 infection. Hence, we exclude it from the control group. For similar reasons, we also exclude Arizona, Illinois, and Massachusetts from the control pool. The analysis ended up relying on six control states. We propose a two-step approach using the artificial counterfactual (ArCo) method introduced by Carvalho et al. [2018] with the correction of Masini and Medeiros [2019] to estimate the number of cases for each US state. Let = 10, 11, . . . , 58 represents the number of days after the first confirmed case of Covid-19 in a given state. Define as the natural logarithm of the number of confirmed cases days after the first case of the disease in this specific treated state, and as a vector containing the logarithm of the number of reported cases for control states also days after the first case has been reported as well as a logarithmic trend: log( ). The inclusion of the trend is important to capture the shape of the curve. The model is estimated as follows. We use the weighted least absolute and shrinkage operator (WLASSO) as described in Masini and Medeiros [2019] to select the control states that will be used to estimate our counterfactual. The goal of the LASSO is to balance the trade-off between bias and variance and is an useful tool to select the relevant peers in an environment with very few data points:̂︀ = arg min where = | , |, = 1, . . . , − 1, and = 1. is, for each state, the number of days from the first reported case until the lockdown plus ten extra days, and > 0 is the penalty parameter which is selected by the Bayesian Information Criterion (BIC), in accordance with Medeiros and Mendes [2016] . The weight correction in the LASSO is necessary in order to control for the nonstationarity of the data; see Masini and Medeiros [2019] for a detailed discussion. The counterfactual for = + 1, . . . , is computed aŝ︀ We also report 95% confidence intervals based on the resampling procedure proposed in Masini and Medeiros [2019] . We are interested in examining the effects of lockdown policies not only on the number of cases, but also on the number of deaths. However, we cannot implement the strategy described above because there is not enough variation in deaths for the in-sample period. Some states, for instance, implemented a state-wide lockdown policies before the first confirmed death. Thus, we propose an alternative method. We consider a counterfactual state for the number of deaths based on the counterfactual estimated for the number of cases. This is not straightforward as in the traditional synthetic control method because the ArCo methodology described above includes an intercept in the estimation, which is measured in the log of the number of cases, and not only a convex combination of other states. Intuitively, the methodology described above chooses a combination of states that is at a fixed distance from the treated unit at the in-sample period and not a convex combination of states that matches exactly the actual number of cases. The intercept controls for all time-invariant characteristics that define the counterfactual. Then, we proceed as follows. Let be the number of accumulated deaths in state at the day . Also, let be the vector of estimated coefficients for the state as in (1) and used to construct the counterfactual for cases. In addition, let be a vector of the number of deaths for all states in the control pool at time . We define the counterfactual number of deaths in that state as: = −¯+¯ (3) where¯is the day that state implemented the lockdown policies. That is, we maintain the weights estimated above and adjust the intercept so that the counterfactual series for deaths matches the number of actual observed deaths in the beginning of the quarantine. For the sake of exposition, we relegate the results on cumulative deaths to Appendix A.4. 10 To illustrate how the method works, Figure 2 presents the ArCo counterfactuals for the states of Alabama, Colorado, and Maine. The timing of the policy intervention ( 0 + 10) corresponds to the lockdown date plus ten days. The counterfactual analysis makes it clear the importance of lockdown policies in mitigating the acceleration of the number of Covid-19 confirmed cases in the treated states. As shown in Figure 2a , for example, our results point to a substantial increase in the number of cases in Alabama if it had not adopted an early lockdown. Similarly, Figures 2b and 2c reveal the same behavior for the cumulative curves in the other selected states. Counterfactuals are constructed with the estimated weights and cumulative cases of the six states that compose the control group. These weights are reported in Table A .2 in Appendix A.2. In Appendix A.3, we present similar counterfactual plots for the remaining treated states. In order to assure that the proposed methodology is producing proper counterfactual analysis, we generate placebo results by producing a "synthetic control" for each control state using the remaining control states as control pool. Results are displayed in Figure 3 , which shows the ratio of the estimated counterfactual cumulative cases to the actual ones for treated states except New York (black lines), and non-treated states (red lines). We assume that the epidemiological day of the placebo intervention is 0 = 36, marked by the vertical dashed line, which is the median (and the mean) timing of the policy interventions in the treated states. It is reassuring that for half of the placebo counterfactuals the ratios fluctuate around one, whereas for the majority of treated states ratios grew above one at some point (likely around the actual timing of policy intervention). The latter result means that lockdown policies were effective to tame the spread of the virus, whereas the former suggests that results are not driven by chance. Regarding South Dakota, the only placebo counterfactual that reached a ratio well above one, by using Google Mobility Data (described in Appendix A.5), we show that mobility in residential areas increased whereas mobility in outdoor areas decreased substantially once compared to the period before the pandemic (see Figures A.41 and A.42 in Appendix A.5) . This is suggestive that South Dakota's population endogenously decided to stay more at home, and avoided environments prone to the risk of contamination. At the time, a proper lockdown policy was not necessary, and South Dakota's non-conformity to the placebo test does not seem to invalidate our approach. In contrast, for Nebraska and California, the counterfactuals are pointing to a smaller number of cases than the actual ones, which goes against finding that lockdowns were effective to reduce cases of Covid-19. The case of California is quite emblematic, as the number of cases during the estimation window remained very small and with very low variation. However, the number of cases started to grow at a fast rate much after the cut-off date. The state of Nebraska displays a similar pattern. To gauge the quantitative impact of lockdown policies, for each state, whether treated or control used as placebo, we compute the ratio of the counterfactual estimated comulative cases ("without" a lockdown strategy in place) to actual ones on the 58th epidemiological day, which is the last day used to compute the counterfactual. Table 2 reports the mean and median of the ratios across states, whereas Table A .3 in Appendix A.2 reports these ratios for each state. The first row corresponds the case in which controls are used as placebos, whereas the second considers the treated states only. As we discuss below, New York is clearly an outlier, whose ratio reached an implausible value of 16.5 as reported in Table A .3. Hence, our preferred specification is displayed in the third row which excludes New York from the pool of treated states. We also compute other two versions of these ratios using the lower bound (lb) and upper bound (up) of the 95% confidence interval in the numerator. The ratios are clearly above one for the treated units, whether New York is excluded or not. According to our preferred specification, counterfactual estimates suggest that the number of cases would be nearly two times larger were lockdown policies absent. Again, it is reassuring that among the controls used as placebo, these average ratios remain around one. Regarding the effects of lockdowns on cumulative deaths, we present the results for all treated states in Appendix A.4. For some states, the counterfactual cumulative deaths exhibit similar patterns to those regarding cumulative cases. But, for many other states, they are not statistically significant at least for the first days after the policy implementation. One possible explanation is that there is a delay between cases and deaths, as the latter is a consequence of the former. Hence, deaths only show up in the official statistics days after cases. Perhaps, if we could estimate counterfactuals for longer periods, the synthetic accumulated deaths would further decouple from the actual ones. In addition, since weights on the controls are estimated considering the (log) cumulative number of cases, the counterfactuals for cumulative deaths are arguably noisier. 11 As discussed above and presented in Table A .3 in Appendix A.2, we obtain an implausible ratio (of counterfactuals to actual cumulative cases) of 16.5 to New York. This section puts a lens on this state. In particular, Figure 4 displays the estimated cumulative number of cases for New York "without" lockdown, as well as extrapolations of the cumulative number of cases based on the mean and median growth rate of the last ten days of the in-sample period. As reported in Table 1 , among the treated states, New York was the fastest one to react to the pandemic, and established a state-wide lockdown policy only 20 days after the first case. Figure 4 extrapolates the last in-sample observations by using both the observed mean and median growth rates for the last ten days, which yields a similar pattern to the result obtained by applying the synthetic control approach. Due to the progression of the virus, particularly in New York City, the in-sample observed rates are quite high once compared to other states as illustrated in Figure 1 , which can be explained not only by the dynamics of the city but also by its high population density. Hence, New York is clearly an outlier and might not be amenable to our synthetic control approach, which justifies reporting results excluding New York above. In this paper, as opposed to most of the early and incipient literature on the lockdown effects during the Covid-19 crisis, we conisder a purely data-driven approach to assess the impact of lockdowns on the short-run evolution of the number of cases and deaths in some US states. Also, as opposed to some recent papers that use a difference-in-difference approach, we adopt a variant of the synthetic control approach, ArCo, due to Carvalho et al. [2018] and Masini and Medeiros [2019] . On average, according to the synthetic controls, the counterfactual accumulated number of cases would be two times larger were lockdown policies not implemented in treated states. In the first two columns of Table A .1 we show the date of the first confirmed case in every treated state we analyze and its reopen date (plus ten days), whenever available at the time we started to circulate this paper. 12 In the third column, we show the difference (in days) from the first confirmed case and the reopen date plus ten days. These figures illustrate why we had to limit our sample size to only 58 epidemiological days. For example, if we had used 60 days in our analysis, we would have to exclude Alabama and Maine from our treated states, given that they would not be in a state-wide lockdown in the last days of the out-of-sample period. We report in the first seven rows of Table A .2 the coefficients estimated by the LASSO model for each treated state for the 0 +10 in-sample period. The last two rows display the mean and the median (across the out-of-sample period) of the ratio between the actual cumulative cases and the counterfactual cases for every state. With only two exceptions (Missouri and Nevada), every state has an out-of-sample mean and median of the observed-to-predicted ratio below one. This means that, on average, the realized cumulative cases were smaller than the counterfactual, which highlight that lockdowns had a meaningful impact on slowing down the Covid-19 spread in these states. Table A .3 reports the ratio of the counterfactual cumulative cases to the actual ones on the 58th day after the first confirmed case in each state. It also reports the lower and upper limits of the 95% confidence interval. Among the 20 treated states, the ratio is larger than one in 18 of them. For Missouri and Nevada, there is no evidence on the effectiveness of lockdown policies. For Mississipi and South Carolina, the impacts of lockdowns are only modest. Note that New York is clearly an outlier, with such ratio around 16.5. We discuss this case in the main text. In contrast, among the non-treated states, we obtain ratios close to one for three out of six cases. We assume that the cut-off of the placebo intervention is 0 = 36, which is the median (and the mean) timing of the policy interventions in the treated states. As discussed in the main text, South Dakota, which displays a ratio well above one, experienced a large reduction in outside mobility even without official lockdown measures. California and Nebraska, which display ratios below one, had very few Covid-19 confirmed cases during the period before the cut-off. In this section, in Figures A.21 -A.40, we report the counterfactual estimates for cumulative deaths based on the methodology described in Section 3.3. As we discuss in the main text, although for some states, the counterfactuals exhibit similar shapes to those regarding cumulative cases, for many other states, they are not statistically significant at least for the first days after the policy implementation. We know that lockdowns affect the Covid-19 dynamics by imposing social distancing and mobility restrictions. To help understand the results described in this paper, we analyze the mobility data available at Google Mobility Reports (https://www.google.com/covid19/mobility/). Google mobility data show how visits and length of stay at different places change compared to a baseline, before the outbreak of the pandemic. In particular, the baseline is the median value, for the corresponding day of the week, during the five weeks between January 3rd and February 6th 2020. In order to understand how the population in each group (treated and control states) is behaving during the Covid-19 crisis, we compute the median of mobility changes across our sample period, i.e. the 48 days following the tenth day after the first confirmed case in each state. Also, the data concern mobility changes for six categories, being five of them related to outdoor activities. Namely, grocery & pharmacy, transit stations, parks, retail & recreation, and workplaces. The remaining one concerns indoor activities, namely, residential. Hence, to capture an idea of outdoor mobility changes, we aggregate the aforementioned five categories into a single one defined as the median of the original five categories. In contrast, mobility changes in residential areas capture indoor mobility changes. The two boxplots in Figures A.41 and A.42 present the median of mobility changes in all analyzed states both in residential and in outdoor areas, respectively. We report results for treated and control states separately. Regarding mobility changes in residential areas, on average, residents from every state analyzed spent more time in these areas after the pandemic outbreak. However, those from treated states spent even more time indoor. Nevertheless, there are outliers. For instance, residents from South Dakota spent a lot more time in residential areas than before the pandemic, which helps understand the results found for this state in the placebo test. We found similar results for mobility changes in outdoor areas. Clearly, residents from treated states remained in outside areas less often than residents from controls (always compared to the period before the pandemic). In New York, for example, there was a 50% decrease of outdoor mobility. Once more, South Dakota is an outlier for the control group, reinforcing the thesis that its population voluntarily decided to stay more at home. Indeed, residents from South Dakota spent almost 20% less time in outside areas, while those from the median state for the control group spent nearly 8% less. The economic costs of conflict: A case study of the Basque country Synthetic control methods for comparative case studies: Estimating the effect of California's tobacco control program Pandemic, shutdown and consumer spending: Lessons from scandinavian policy responses to covid. Working paper, University of Copenhagen A simple planning problem for covid-19 lockdown How deadly is covid-19? understanding the difficulties with estimation of its fatality rate What will be the economic impact of covid-19 in the US? rough estimates of disease scenarios An SEIR infectious disease model with testing and conditional quarantine Covid-19 infection externalities: Trading off lives vs. livelihoods. Working Paper 27009 The covid-19 pandemic: Government vs. community action across the United States. Working paper ArCo: An artificial counterfactual approach for high-dimensional panel time-series data When do shelter-in-place orders fight covid-19 best? policy heterogeneity across states and adoption time The macroeconomics of epidemics Human mobility restrictions and the spread of the novel coronavirus (2019-ncov) in china The effect of stay-at-home orders on covid-19 cases and fatalities in the united states. Working paper Did california's shelter-in-place order work? early coronavirus-related public health effects Costs and benefits of closing businesses in a pandemic Optimal mitigation policies in a pandemic: Social distancing and working from home The effect of social distancing measures on intensive care occupancy: Evidence on covid-19 in scandinavia. Working paper Counterfactual analysis with artificial controls: Inference, high dimensions and nonstationarity ℓ 1 -regularization of high-dimensional time-series models with nongaussian and heteroskedastic innovations Short-term covid-19 forecast for latecomers. Working paper Villas-Boas. Are we #stayinghome to flatten the curve? Working paper