key: cord-0595997-0xv5a5sv
authors: Zhang, Bo; Heng, Siyu; Ye, Ting; Small, Dylan S.
title: Social Distancing and COVID-19: Randomization Inference for a Structured Dose-Response Relationship
date: 2020-11-12
journal: nan
DOI: nan
sha: 847539aa081353ce906451e70940827416e00169
doc_id: 595997
cord_uid: 0xv5a5sv

Social distancing is widely acknowledged as an effective public health policy combating the novel coronavirus. But extreme social distancing has costs and it is not clear how much social distancing is needed to achieve public health effects. In this article, we develop a design-based framework to make inference about the dose-response relationship between social distancing and COVID-19 related death toll and case numbers. We first discuss how to embed observational data with a time-independent, continuous treatment dose into an approximate randomized experiment, and develop a randomization-based procedure that tests if a structured dose-response relationship fits the data. We then generalize the design and testing procedure to accommodate a time-dependent, treatment dose trajectory, and generalize a dose-response relationship to a longitudinal setting. Finally, we apply the proposed design and testing procedures to investigate the effect of social distancing during the phased reopening in the United States on public health outcomes using data compiled from sources including Unacast, the United States Census Bureau, and the County Health Rankings and Roadmaps Program. We rejected a primary analysis null hypothesis that stated the social distancing from April 27, 2020, to June 28, 2020, had no effect on the COVID-19-related death toll from June 29, 2020, to August 2, 2020 (p-value<0.001), and found that it took more reduction in mobility to prevent exponential growth in case numbers for non-rural counties compared to rural counties.

Social distancing is widely acknowledged as an effective public health policy combating the novel coronavirus. But extreme forms of social distancing like isolation and quarantine have costs and it is not clear how much social distancing is needed to achieve public health effects. In this article, we develop a design-based framework to test the causal null hypothesis and make inference about the dose-response relationship between reduction in social mobility and COVID-19 related public health outcomes. We first discuss how to embed observational data with a time-independent, continuous treatment dose into an approximate randomized experiment, and develop a randomization-based procedure that tests if a structured dose-response relationship fits the data. We then generalize the design and testing procedure to accommodate a time-dependent treatment dose in a longitudinal setting. Finally, we apply the proposed design and testing procedures to investigate the effect of social distancing during the phased reopening in the United States on public health outcomes using data compiled from sources including Unacast ™ , the United States Census Bureau, and the County Health Rankings and Roadmaps Program. We rejected a primary analysis null hypothesis that stated the social distancing from April 27, 2020, to June 28, 2020, had no effect on the COVID-19-related death toll from June 29, 2020, to August 2, 2020 (p-value < 0.001), and found that it took more reduction in mobility to prevent exponential growth in case numbers for non-rural counties compared to rural counties.

1.1. Social distancing, a pilot study, and dose-response relationship. Social distancing is widely acknowledged as one of the most effective public health strategies to reduce transmission of the novel coronavirus (Lewnard and Lo, 2020) . There seemed to be ample evidence from China (Lau et al., 2020) and Italy (Sjödin et al., 2020 ) that a strict lockdown and practice of social distancing could have a substantial effect on reducing disease transmission, but social distancing has economic, psychological and societal costs (Acemoglu et al., 2020; Atalan, 2020; Grover et al., 2020; Sheridan et al., 2020; Venkatesh and Edirappuli, 2020) . How much social distancing is needed to achieve the desired public health effect? In this article, we measure the level of social distancing using data on daily percentage change in total distance traveled compared to the pre-coronavirus level (data compiled and made available by Unacast ™ ) and investigate the causal relationship between social distancing and COVIDrelated public health outcomes.

We conducted a pilot study in March to investigate the effect of social distancing during the first week of President Trump's 15 Days to Slow the Spread campaign (March 16-22, 2020 ) on the influenza-like illness (ILI) percentage two and three weeks later. We tested the causal null hypothesis and found some weak evidence (p-value = 0.08) that better social distancing had an effect on ILI percentage three weeks later. In Supplementary Material A, we described in detail our pilot study. A protocol of the design and analysis was posted on arXiv (https://arxiv.org/abs/2004.02944) before outcome data were available and analyzed.

In addition to the causal null hypothesis, the "dose-response relationship" between the degree of social distancing and potential public health outcomes under various degrees of social distancing is also of great interest. Infectious disease experts seemed to express sentiments that the effect of social distancing on public health outcomes might be small or even negligible under a small degree of social distancing, but much more substantial under a large degree of social distancing. In an interview with the British Broadcasting Corporation (BBC Radio 4, 2020) , director of the National Institute of Allergy and Infectious Diseases (NIAID), Dr. Anthony S. Fauci said: " We never got things down to baseline where so many countries in Europe and the UK and other countries did -they closed down to the tune of about 97 percent lockdown. In the United States, even in the most strict lockdown, only about 50 percent of the country was locked down. That allowed the perpetuation of the outbreak that we never did get under very good control". Perhaps Dr. Fauci was proposing a hypothesis that the treatment dose, i.e., level of social distancing, played a very important role, and the causal effect of social distancing as a public health strategy combating coronavirus transmission is likely to be very different depending on the extent to which it is practiced (see, e.g., Gelfand et al. (2021) ). We would like to formalize and test the hypothesis concerning a dose-response relationship between social distancing and public health outcomes.

1.2. Reopening, causal null hypothesis, dose-response kink model, and connection to epidemiological models. Starting late April and early May, many states in the U.S. started phased reopening. States and local governments differed in their reopening timelines; people in different states and counties also differed in their social mobility during the process: some ventured out; some continued to stay at home as much as possible. Figure 1 plots the 7-day rolling average of percentage change in total distance traveled of all counties in the U.S., from mid-March to late May. It is evident that as many counties started to ease social distancing measures, we saw less reduction in distance traveled; in fact, in many counties, distance traveled started to return to and even supersede the pre-coronavirus level.

In this article, we leverage the county-level social mobility data since phased reopening in the U.S. to study the relationship between social mobility and its effect on public health outcomes. Let t 0 denote a baseline period, T some endpoint of interest, z t0:T a longitudinal measurement of change in social mobility in county n from t 0 to T , and Y n,T (z t0:T ) county n's potential public health outcome at time T under the social mobility trajectory z t0:T , e.g., the number of patients succumbing to the COVID-19 at time T . Our first scientific query is about the causal null hypothesis: had the social mobility trajectory changed from z t0:T to z t0:T , would the potential public health outcome at time T change at all? In other words, does Y n,T (z t0:T ) = Y n,T (z t0:T ) hold for all z t0:T = z t0:T ? Suppose that we have enough evidence from observational data to reject this causal null hypothesis, our second query then is about the dose-response relationship between the level of social distancing and its effect on the potential public health outcome. To illustrate, one such dose-response relationship (among many other candidates) is the following dose-response kink model (see Figure 2 for an illustration):

H K 0 : Y n,T (z) = Y n,T (z * ), ∀z ≤ τ, and Y n,T (z) − Y n,T (τ ) = β(z − τ ), ∀z > τ, ∀n = 1, 2, · · · , N, for some τ and β, where z captures some aggregate dose of the social mobility trajectory z t0:T , e.g., the average reduction in social mobility from t 0 to T , and z * a reference dose level. Model (1) states that the potential health-related outcome (e.g., daily death toll, test positivity rate, etc) at time T would remain unchanged as the potential outcome under the reference level when the aggregate dose z is less than a certain threshold τ , and then increases at a rate proportional to how much z exceeds the threshold. Model (1) succinctly captures two key features policy makers may be most interested in: τ the minimum dose that "activates" the treatment effect, and β how fast the potential outcome changes as the dose changes after exceeding the threshold. Model (1) may remind readers of the "broken line regression" models in regression analysis; see Zhang and Singer (2010, Chapter 4 ). The key difference here is that Model (1) and other dose-response relationships in this article are about the contrast in potential outcomes, not the observed outcomes in a regression analysis.

Our analysis in this article complements standard analyses based on epidemiological models, e.g., the SIR (susceptible-infected-recovered) compartment models (Brauer and Castillo-Chavez, 2012) . The primary interest of epidemiological models is to understand infectious disease dynamics, in particular how the public health outcome trajectory evolves over time.

To investigate a dose-response relationship, we only posit a parsimonious model on the contrast between potential outcomes at time T under different doses, e.g., Y n,T (z) − Y n,T (τ ) in (1), not on the disease dynamics that generate the outcome Y n,T (z). In other words, a parsimonious dose-response relationship does not preclude nonlinear infectious disease dynamics, e.g., those based on the compartment models; moreover, our primary inferential target, the causal null hypothesis, does not impose any restriction on the infectious disease mechanism.

1.3. Our contribution. We have three goals in this article. First, we propose a simple, model-free randomization-based procedure that tests if a causal null hypothesis or a structured dose-response relationship, e.g., the dose-response kink model, fits the data in a static setting with a time-independent, continuous or many-leveled treatment dose. To be specific,

The dose-response kink model an empirical researcher posits a structured dose-response relationship that she finds scientifically meaningful, parsimonious, and flexible enough to describe data at hand; our developed procedure can then be applied to test if such a postulated dose-response relationship is sufficient to describe the causal relationship. If the hypothesis is rejected, empirical researchers are then advised to re-examine the scientific theory underpinning the postulated model; otherwise, the model seems a good starting point for data analysis. In this way, our method can be deemed as a model-free "diagnostic test" for a dose-response relationship, and more broadly a test of the underlying scientific theory. In our application, the treatment and outcome are both longitudinal. Our second goal is to generalize the proposed design and testing procedure to the longitudinal setting. We define a notion of cumulative dose for a time-varying treatment dose trajectory, and discuss how to embed observational longitudinal data into an approximate randomized controlled trial in order to permute two treatment trajectories. Finally, we closely examine our assumptions in the context of an infectious disease transmission mechanism and apply the developed design and testing procedure to characterize the dose-response relationship between reduction in social mobility and public health outcomes during the reopening phases in the U.S. using county-level data we compiled from sources including Unacast ™ , the United States Census Bureau, and the County Health Rankings and Roadmaps Program (Remington, Catlin and Gennuso, 2015) . The rest of the article is organized as follows. Section 2 and 3 study how to investigate a dose-response relationship using nonbipartite matching in a static setting. Section 4 incorporates interference and considers the spillover effects. Section 5 extends the method to longitudinal studies. Section 6 describes the design of the case study and Section 7 presents results and extensive sensitivity analyses. Section 8 concludes with a discussion.

2. Investigating the dose-response relationship via nonbipartite matching.

2.1. Observational data with a continuous treatment dose in a static setting. Suppose there are N = 2I units, indexed by n = 1, 2, · · · , N . Each unit is associated with a vector of observed covariates X n , an observed treatment dose assignment Z obs n , and an observed outcome Y obs n . The vector of observed covariates X n are collected before the treatment assignment and not affected by the treatment. Let Z n denote the treatment dose assignment of unit n, Z the set of all possible treatment doses, z ∈ Z a realization of Z n , and |Z| the cardinality of Z. For a binary treatment, |Z| = 2; for a continuous treatment dose, |Z| is an infinite number. In most applications, Z is an ordered set (either partially ordered or totally ordered) with a (partial or total) order defined in light of the application.

Let Y n (z) denote the potential outcome that unit n exhibits under the dose assignment z assuming no interference among units (Rubin, 1980 (Rubin, , 1986 . Each unit n is associated with a possibly infinite array of potential outcomes {Y n (z), z ∈ Z}. We will assume consistency so that Y obs n = Y n (Z obs n ). A causal estimand is necessarily a contrast between potential outcomes. Each unit n is associated with a collection of unit-level causal effects Table 1 summarizes all information regarding these N units, where we let Z = {0, 1, 2, · · · } be a countable set for ease of exposition. Table  1 is referred to as a science table in the literature (Rubin, 2005) . In a causal inference problem, the fundamental estimands of interest are the arrays of potential outcomes in Table 1 ; the task of uncovering the arrays of potential outcomes is challenging because one and only one of the potentially infinite array of potential outcomes for each unit is actually observed. 

Covariates

One unique feature of problems with a continuous treatment dose assignment is that the unit-level causal effect is an infinite set of comparisons between any two potential outcomes Y n (z) and Y n (z ), unlike with a binary treatment where the unit-level causal effect unambiguously refers to a comparison between Y n (1) and Y n (0). Let z * ∈ Z denote an arbitrary reference dose. Observe that Y n (z) − Y n (z ) = Y n (z) − Y n (z * ) − {Y n (z ) − Y n (z * )}, and the collection of contrasts {Y n (z) − Y n (z * ), z ∈ Z} is sufficient in summarizing all pairwise comparisons of potential outcomes. With a binary treatment, a "summary causal effect" (Rubin, 2005) is defined as a comparison between Y n (1) and Y n (0) over the same collection of units, e.g., the mean unit-level difference for females. With a continuous treatment dose, we first summarize the causal effects with a "unit-level dose-response relationship" for each unit n. For example, one simple unit-level dose-response relationship states that Y n (z) − Y n (z * ) = τ 0 , ∀z ∈ Z; in words, for unit n, the causal effect when comparing treatment dose z to the reference dose z * is equal to a constant τ 0 regardless of the dose z ∈ Z. We may then summarize such unit-level dose-response relationships for a collection of units. For example, one such summary may state that a structured dose-response relationship f (z; z * , θ) holds for all counties in the U.S.; this summary can be represented by the following null hypothesis:

, for all counties in the U.S. indexed by n, for some θ. We first develop a simple, randomization-based testing procedure to assess hypotheses of the form H 1 0 . The work most relevant to our development is Ding, Feller and Miratrix (2016) , who studied testing the existence of treatment effect variation in a randomized controlled trial with a time-independent binary treatment. In a randomization-based inferential procedure, the potential outcomes (i.e., the infinite collection {Y n (z), ∀z ∈ Z, ∀n} in Table 1 , are held fixed and the only probability distribution that enters statistical inference is the randomization distribution that describes the treatment dose assignment. The key step here is to properly embed the observational data into an approximately randomized experiment (Rosenbaum, 2002 (Rosenbaum, , 2010 Bind and Rubin, 2019) , as we are ready to describe.

2.2. Embedding observational data with a time-independent, continuous treatment into an as-if randomized experiment via nonbipartite matching. In a randomized controlled experiment, physical randomization creates "the reasoned basis" for drawing causal inference (Fisher, 1935) . In the absence of physical randomization as with retrospective observational data, one strategy is to use statistical matching to embed observational data into a hypothetical randomized controlled trial (Rosenbaum, 2002 (Rosenbaum, , 2010 Rubin, 2007; Ho et al., 2007; Stuart, 2010; Bind and Rubin, 2019) by matching subjects with the same (or at least very similar) estimated propensity score or observed covariates and forging two groups that are well-balanced in observed covariates.

One straightforward design to handle observational data with a continuous treatment is to dichotomize the continuous treatment based on some prespecified threshold and create a binary treatment out of the dichotomization scheme. For instance, let Z denote a measure of social distancing; one can define counties with the social distancing measure above the median as the "above-median," or "treated" group, and the others as the "below-median," or "control" (or "comparison") group. One may then pair counties in the "above-median" group to those in the "below-median" group via a standard bipartite matching algorithm (for instance, via the R package optmatch by Hansen, 2007) , and test the null hypothesis that social distancing has no effect on the outcome. Such a strategy is often seen in empirical research, probably because of its simplicity; however, dichotomizing the continuous treatment inevitably censors the rich information contained in the original, continuous dose and prevents researchers from studying the dose-response relationship.

To address this limitation, Lu et al. (2001 Lu et al. ( , 2011 proposed optimal nonbipartite matching. In a nonbipartite matching, units with similar observed covariates but different treatment doses are paired. Suppose there are N = 2I units, e.g., counties in the U.S in our application. In the design stage, distances {δ ij , i = 1, · · · , N, j = 1, · · · , N } are calculated between each pair of units and a N × N distance matrix is constructed (Lu et al., 2001 (Lu et al., , 2011 Baiocchi et al., 2010) . Some commonly used distances δ ij include the Mahalanobis distance between observed covariates X i and X j and the rank-based robust Mahalanobis distance. Researchers may further modify the distance to incorporate specific design aspects of the study. For instance, in a study that involves effect modification, researchers are advised to match exactly or near-exactly on the effect modifier (Rosenbaum, 2005) , e.g., the geographic location of the county, and such an aspect of design can be pursued by adding a large penalty to δ ij if county i and j are not from the same geographic region.

An optimal nonbipartite matching algorithm then divides these N = 2I units into I nonoverlapping pairs of two units such that the total within-matched-pair distance is minimized. Nonbipartite matching allows more flexible pairing compared to bipartite matching based on a dichotomization scheme, and preserves the continuous nature of the treatment, which is essential for investigating a dose-response relationship.

Suppose that we have formed I matched pairs of 2 units so that index ij, i = 1, · · · , I, j = 1, 2, uniquely identifies a unit. We follow Rosenbaum (1989) and Heng et al. (2019) and define the following potential outcomes after nonbipartite matching. DEFINITION 2.1 (Potential Outcomes After Nonbipartite Matching). Let Z obs i1 ∨ Z obs i2 = max(Z obs i1 , Z obs i2 ) and Z obs i1 ∧ Z obs i2 = min(Z obs i1 , Z obs i2 ) denote the maximum and minimum of two observed treatment doses in each matched pair i. We define the following two potential outcomes for each unit ij:

where we abuse the notation and use subscripts T and C to denote the potential outcomes under the maximum and minimum of two observed doses within each matched pair, respectively.

Write F = {X ij , Y T ij , Y Cij , i = 1, · · · , I, j = 1, 2}, where Y T ij and Y Cij are defined in Definition 2.1, Z obs ∨ = (Z obs 11 ∨ Z obs 12 , · · · , Z obs I1 ∨ Z obs I2 ), and Z obs ∧ = (Z obs 11 ∧ Z obs 12 , · · · , Z obs I1 ∧ Z obs I2 ). As always in randomization inference (Rosenbaum, 2002 (Rosenbaum, , 2010 Ding, Feller and Miratrix, 2016) , we condition on observed covariates, potential outcomes, and observed dose assignments, i.e., we do not model X or the potential outcomes, and rely on the treatment assignment mechanism to draw causal conclusions. The law that describes the treatment dose assignment in each matched pair i is

, and π i2 = 1 − π i1 . In an ideal randomized experiment, experimenters use physical randomization (e.g., coin flips) to ensure π i1 = π i2 = 1/2: for matched pair i with two treatment doses Z obs i1 and Z obs i2 , a fair coin is flipped; if the coin lands heads, the first unit is assigned Z obs i1 and the second unit Z obs i2 , and vice versa if the coin lands tails. The design stage of an observational study aims to approximate this ideal (yet unattainable) hypothetical experiment by matching units with similar covariates X so that π i1 ≈ π i2 after matching. In this way, nonbipartite matching embeds observational data with a continuous treatment dose into a randomized experiment; this induced randomization scheme will serve as our "reasoned basis" for inferring any causal effect including a dose-response relationship. As is always true with retrospective observational studies, a careful design may alleviate, but most likely never eliminate bias due to the residual imbalance in X or unmeasured confounding variables. The departure from randomization, i.e., π i1 = π i2 , is investigated via a sensitivity analysis (Rosenbaum, 1989 (Rosenbaum, , 2002 (Rosenbaum, , 2010 .

3.1. Randomization inference for τ = τ 0 and β = β 0 in the dose-response kink model. Endowed with the randomization scheme induced by nonbipartite matching, we now turn to statistical inference. We first consider testing the dose-response kink model for a fixed τ = τ 0 and β = β 0 for all units, i.e.,

Under H τ0,β0 0,kink , the entire dose-response relationship for subject ij is known up to Y ij (z * ). Fortunately, we do observe one point on the dose-response curve, namely Y ij (Z obs ij ); hence, the entire dose-response curve for the subject ij is fixed, and both potential outcomes

can then be imputed for each unit ij. In matched Table 2 illustrates the imputation scheme by imputing the missing potential outcome for each subject under the null hypothesis H τ0,β0 0,kink with τ 0 = 0.3 and β 0 = 1. Imputed science table when testing the dose-response kink model with τ 0 = 0.3 and β 0 = 1. Two units in each pair i are arranged so that i1 has a smaller dose and i2 a larger dose. For each unit, one and only one potential outcome is observed and the other one imputed under H τ 0 ,β 0 0,kink .

Observe One Potential Outcome Imputed Potential Outcomes

I} denote the collection of transformed outcomes, and F tr max (·) its CDF. The null hypothesis H τ0,β0 0,kink can then be tested by comparing the following Kolmogorov-Smirnov-type (KS) test statistic

evaluated at the observed data to a reference distribution generated based on the imputed science table (e.g., Table 2 ) and enumerating all 2 I possible randomizations: within each matched pair i, unit i1 receives Z obs i1 ∨ Z obs i2 and exhibits Y obs

In principle, any test statistic can be combined with this randomization scheme to yield a valid test. We motivate the test statistic (4) in Supplementary Material B. Note that when τ 0 = ∞ or β 0 = 0, H τ0,β0 0,kink reduces to the following causal null hypothesis: H 0,null : Y ij (z) = Y ij (z * ), ∀z ∈ Z, ∀i = 1, · · · , I, j = 1, 2, and the developed procedure can be used to test H 0,null .

We illustrate the procedure using the following example. We generate (1) with τ = 1 and β = 0.5. We test the null hypothesis H τ0,β0 0,kink with τ 0 = 1 and β 0 = 0.5 using the test statistic (4). The left panel of Figure 3 plots the empirical distribution F min (y) (blue) and F tr max (y) (red), and t KS (1, 0.5) = 0.08 for the observed data. Instead of enumerating all 2 I = 2 200 possible treatment dose assignments, we draw with replacement 100, 000 samples from all 2 200 possible configurations. The right panel of Figure 3 plots the reference distribution based on these 100, 000 samples. Such a "sampling with replacement" strategy is referred to as a "modified randomization test" in the literature (Dwass, 1957; Pagano and Tritchler, 1983 ) and known to still preserve the level of the test. In this way, a p-value equal to 0.445 is obtained in this simulated dataset and the null hypothesis H τ0,β0 0,kink with τ 0 = 1 and β 0 = 0.5 is not rejected. The p-value is exact as the procedure does not resort to any asymptotic theory and works in small samples. , τ = 1, and β = 0.5. We test the null hypothesis H τ 0 ,β 0 0,kink with τ 0 = 1 and β 0 = 0.5. The left panel plots F min (y), the empirical CDF of Y obs min (blue) and F tr max (y), the empirical CDF of the transformed outcomes Y obs max (red). The test statistic t KS (1, 0.5) evaluated at the observed data is 0.08. The right panel plots the exact reference distribution of the test statistic given the sample and under the null hypothesis. The reference distribution is generated using 100, 000 Monte Carlo draws from the 2 200 randomization configurations. The red dashed line plots the position of the observed test statistic. The exact p-value in this case is 0.445.

Testing the dose-response kink model. Let H K 0 denote a composite hypothesis that is equal to the union of H τ0,β0 0,kink over all τ = τ 0 and β = β 0 , i.e.,

In other words, the activation dose τ and the slope β are nuisance parameters to be taken into account. One strategy testing H K 0 is to take the supremum p-value over the entire range of (τ, β); another commonly used strategy (Berger and Boos, 1994) is to first construct a confidence set around (τ, β) and then take the supremum p-values over the (τ, β) values in this confidence set. This latter strategy is particularly useful when the treatment dose and/or the outcome of interest are not bounded so that τ and β are not bounded; for some applications in the causal inference literature, see Nolen and Hudgens (2011), Ding, Feller and Miratrix (2016) , and Zhang et al. (2021) . In Supplementary Material C, we discuss how to construct a bounded level-γ confidence set for (τ, β) based on inverting a variant of the Wilcoxon rank sum test statistic and its properties. Being able to reject H K 0 suggests evidence against the postulated dose-response relationship; otherwise, the model is deemed sufficient to characterize the dose-response relationship for the data at hand. We illustrate the procedure using the following example. We gen- Figure 4 plots the p-values in log scale against τ 0 and β 0 . The maximum p-value is obtained at τ 0 = 3.8 and β 0 = 0.4 and equal to 0.004. The null hypothesis H K 0 , i.e., the dose-response relationship follows a kink model, can be rejected at level 0.05 for this simulated dataset. 

We test H τ 0 ,β 0 0,kink and plot the p-value in log scale against τ 0 and β 0 values. The maximum p-value is obtained at τ 0 = 3.8 and β 0 = 0.4 and equal to 0.004. The null hypothesis H K 0 is hence rejected at level 0.05 for this simulated dataset.

3.3. Testing any structured dose-response model. Our discussion above suggests a general model-free, randomization-based framework to test any structured dose-response relationship. Here, we say a dose-response relationship is "structured" if it is characterized by a few structural parameters. Consider the following structured dose-response relationship model:

where z * ∈ Z is a reference dose, and f (· ; z * , θ) is a univariate function that satisfies f (z * ; z * , θ) = 0 and is parametrized by a p-dimensional vector of structural parameters θ ∈ R p . Algorithm 1 summarizes a general procedure testing H dose-response 0 at level α. In Supplementary Material D, we briefly discuss and illustrate how to sequentially test a few dose-response relationships ordered in their model complexity. 

1. Construct CI θ , a level-γ confidence set for the structural parameter θ; 2. For each θ 0 ∈ CI θ , do the following steps: a) Compute the test statistic t obs . For each unit ij with Z ij = Z obs i1 ∨ Z obs i2 , i.e., the unit with maximum dose in each matched pair i, define the following transformed outcome

c) Generate a reference distribution. Sample with replacement MC = 100, 000 dose assignment configurations from the 2 I possible configurations. For each sampled dose assignment configuration Z k ,

KS (θ 0 ), k = 1, 2, · · · , MC}; d) Compute the p-value p θ 0 by comparing t obs to the reference distribution F θ 0 , i.e.,

3. Let pmax = sup θ∈CI θ p θ and reject the null hypothesis H dose-response 0

4. Relaxing the SUTVA: dose-response relationship under interference.

Potential outcomes under interference. We relax the stable unit treatment value assumption in this section and consider inference for a structured dose-response relationship under interference. To this end, we collect the treatment doses of all study units in our matched-pair design and use Z = (Z 11 , Z 12 , · · · , Z I1 , Z I2 ) to represent the treatment dose configuration with z being its realization. We further let Z obs denote the observed treatment dose configuration of all 2I study units and

unit ij's potential outcome that is random only through the randomness in the treatment dose configuration Z. The SUTVA states that for all pairs of z and z ,

Definition (6) is in a most general form and useful when the scientific interest lies in testing the null hypothesis of no direct or spillover effect under arbitrary interference pattern. To further explore the dose-response relationship in the presence of the spillover effect, researchers need to model the local interference structure possibly based on units' spatial relationship (e.g., closeness of counties in our case study). To this end, we assume study units are connected through an undirected network with a symmetric, 2I × 2I adjacency matrix G. Matrix G has its rows and columns arranged in the order corresponding to unit 11, 12, · · · , I1, I2 after nonbipartite matching. If unit ij and i j are connected, then the corresponding entry in G is equal to 1 and otherwise 0. The diagonal entries of G are defined to be 0.

Our reasoned basis for testing any causal null hypothesis under interference will still be the randomization scheme endowed by the nonbipartite matching. We have two goals. First, we show that the test developed for H 0,null under the SUTVA remains a valid level-α test for a null hypothesis of no direct or spillover effect under arbitrary interference pattern. Second, we relax the dose-response relationship H 0,kink by modeling various forms of local interference pattern using the adjacency matrix G.

No direct or spillover effect. Following Rosenbaum (2007); Bowers, Fredrickson and Panagopoulos (2013); Athey, Eckles and Imbens (2018), a null hypothesis of no direct or spillover effect states that H 0,direct or spillover : Y ij ( z) = Y ij ( z ), ∀i = 1, · · · , I, j = 1, 2, and all pairs of treatment dose configurations of 2I study units z and z . Under H 0,direct or spillover , the unit-level potential outcome of each study unit under any treatment dose configuration z can still be imputed; in fact, Y ij ( z) = Y ij ( Z obs ) for any z. Any test statistic (e.g., the Kolmogorov-Smirnov statistic used in Algorithm 1) that depends on units' potential outcomes (possibly under interference) is random only through its dependence on the treatment dose configurations of all study units; therefore, the null distribution of the test statistic can again be inferred by enumerating 2 I different configurations of Z as discussed in Section 3. In other words, the testing procedure for H 0,null is still exact and has correct level for testing H 0,direct or spillover . Moreover, since H 0,direct or spillover does not impose any interference pattern, rejecting H 0,null implies rejecting H 0,direct or spillover under arbitrary interference pattern. 4.3. Dose-response relationship under local interference modeling. Testing the null hypothesis is often regarded a starting point of causal analysis (Imbens and Rubin, 2015) . Next, we build up a causal hypothesis regarding a dose-response relationship allowing for local interference. Our construction is guided by the following general principles adapted from the literature on interference (Hong and Raudenbush, 2006; Bowers, Fredrickson and Panagopoulos, 2013; Athey, Eckles and Imbens, 2018) Principle I: The total effect of treatment dose configuration z compared to a reference dose configuration z * can be decomposed into a dose-response direct effect due to ij's own treatment dose z ij and a spillover effect due to other study units' treatment doses so that

is a dose-response direct effect described in Section 3, z −ij (resp. z * −ij ) treatment doses (resp. reference treatment doses) of all study units except ij, and g(·) a function modeling the spillover effect. For a binary treatment, z * = 0 is referred to as a uniformity trial (Rosenbaum, 2007) . Principle II: The spillover effect depends only on the aggregate, excess treatment doses of ij's neighbors with respect to the reference dose configuration so that

• is the ij-th row of the adjacency matrix G.

Principle III: The spillover effect is always dominated by the dose-response direct effect in the sense that

. One simple modeling strategy of g( z − z * , G ij, • ) that satisfies (7) is to scale the magnitude of the dose-response direct effect towards zero.

To illustrate the three principles above, we consider a concrete example of causal hypothesis under local interference. We consider a causal null hypothesis that states that the direct effect is proportional to the dose difference, i.e., f (z ij ; z * ij , θ) = β(z ij − z * ij ). We then model the local interference pattern by scaling the direct effect using a logistic function so

. According to this specification, the spillover effect modeled by g( z − z * , G ij, • ) trivially satisfies the third principle above as the multiplication factor C is always upper bounded by 1. The causal null hypothesis then becomes

Statistical inference in the presence of interference parameters (k, s) depends on one's perspective on (k, s) (Bowers, Fredrickson and Panagopoulos, 2013) . Inference may proceed by regarding interference parameters as sensitivity parameters and researchers could report how confidence sets of the dose-response relationship parameters in the direct effect (e.g., β in H 0,interference ) change as interference parameters change. For fixed interference parameters (k 0 , s 0 ), we can test β = β 0 in H 0,interference by imputing potential outcomes for each study unit and each of the 2 I treatment dose configurations Z under H 0,interference , choosing a test statistic t(Y( Z), Z) that is a function of potential outcomes of all study units Y( Z) and random only via its dependence on Z, generating the randomization-based reference distribution of t(Y( Z), Z), and comparing the observed test statistic t(Y( Z obs ), Z obs ) to this reference distribution.

5.1. Treatment dose trajectory and potential outcome trajectory. In our application, the treatment dose evolves over time and the public-health-related outcomes, e.g., county-level COVID-19 related death toll, may depend on the treatment dose trajectory. We first consider the no-interference case. Let t 0 denote a baseline period and t 1 , t 2 , · · · , t i , · · · , T subsequent treatment periods. Fix t 0 ≤ t i ≤ t j and let Z be the set of all possible treatment doses at each time point. Let

denote the random treatment dose trajectory of one study unit from t i to t j (Robins, 1986; Bojinov and Shephard, 2019) , z ti:tj one realization of Z ti:tj , and Z obs n,ti:tj = (Z obs n,ti , · · · , Z obs n,tj ) the observed treatment dose trajectory of unit n from t i to t j . In our application, t 0 denotes the start of the phased reopening and Z ti:tj the trajectory of daily percentage change in total distance traveled from t i to t j . We are interested in the effect of a sustained period of treatment on some future outcome. We assume that the treatment dose at time t temporally precedes the outcome at time t. Fix a time t and let Y n,t (z t0:t ) = Y n,t (z t0 , z t1 , · · · , z t ) denote the potential outcome of unit n at time t under the treatment dose trajectory Z n;t0:t = z t0:t . We assume consistency so that Y obs n,t = Y n,t (Z obs n;t0:t ). Finally, we let Y n;ti:tj (z t0:tj ) denote unit n's potential outcome trajectory from time t i to t j under the treatment dose trajectory z t0:tj . 5.2. Covariate history and sequential randomization assumption. One unique feature of longitudinal data is that the observed outcome trajectory up to time t − 1 may confound the treatment dose at time t; this is particularly true in our application: if the COVID-19 related case and death numbers were high during the last week in a county, then residents may be more wary of the disease and reduce social mobility this week. Following the literature on longitudinal studies, we let L n,t denote the time-dependent covariate process of unit n up to but not including time t; L n,t contains both time-independent covariates X n and time-dependent covariates like the observed outcomes {Y obs n,t0 , Y obs n,t1 , · · · , Y obs n,t−1 }. We further assume the sequential randomization assumption (SRA) (Robins, 1998) , which states that conditional on the treatment history up to time t − 1 and covariate process up to time t, the treatment dose assignment at time t is independent of the potential outcome trajectories, i.e., Y n;t0:T (z t0:T ) |= Z n,t | Z n;t0:t−1 = z n;t0:t−1 , L n,t , ∀z t0:T .

This assumption holds if residents' adopting the social distancing measures at time t depends on (1) their history of adopting social distancing measures, (2) time-independent covariates, and (3) observed daily COVID-19 related case numbers and death toll up to time t − 1. See also Mattei, Ricciardi and Mealli (2019) for a relaxed version of this assumption.

Cumulative treatment dose, W-equivalence, and dose-response relationship in a longitudinal setting. One general recipe for drawing causal inference from longitudinal data is to model the marginal distribution of the counterfactual outcomes Y n,t (z t0:t ), or the marginal joint distribution of Y n;ti:tj (z t0:t ), as a function of the treatment trajectory and baseline covariates; see Robins (1986 Robins ( , 1994 ; Robins, Greenland and Hu (1999); Robins, Hernán and Babette (2000) for seminal works. For example, one simplest model may state that N units are i.i.d. samples from a superpopulation such that the counterfactual mean of the outcome at time t depends on the treatment dose trajectory and the time-independent covariates X through a known functional form g(·), i.e., E[Y t (Z t0:t ) | X] = g(Z t0:t , X; β), and the interest lies in efficient estimation of the structural parameters β.

In the infectious disease context, modeling the potential outcomes is a daunting task and our interest here lies in testing a structural dose response relationship in a less modeldependent way. To proceed, we generalize the notion of "dose" from the static to longitudinal setting. Consider the following weighted difference between two treatment dose trajectories z ti:tj and z ti:tj :

where W is a shorthand for the weight function W(t ) = {w(t ) | 0 ≤ w(t ) ≤ 1 and ti≤t ≤tj w(t ) = 1}. Let z * ti:tj denote a reference trajectory, e.g., z * ti:tj = (−0.5, · · · , −0.5) corresponding to 50% reduction in total distance traveled from t i to t j . For each treatment dose trajectory z ti:tj , we define its "cumulative dose" as the weighted difference between z ti:tj and z * ti:tj .

DEFINITION 5.1 (Cumulative Dose). Let z ti:tj be a realization of the treatment dose trajectory Z ti:tj . Its cumulative dose with respect to the reference trajectory z * ti:tj and the weight function W is CD(z ti:tj ; z * ti:tj , W) = z ti:tj − z *

where · W is defined in (8).

REMARK 1. The cumulative dose of a treatment dose trajectory is defined with respect to a reference trajectory and a weight function. The choices of the reference trajectory and weight function should be guided by expert knowledge so that the cumulative dose reflects some scientifically meaningful aspect of the treatment dose trajectory. For instance, in a longitudinal study of the effect of zidovudine (AZT), an antiretroviral medication, on mortality, Robins, Hernán and Babette (2000) defined the cumulative dose to be the aggregate AZT dose during the treatment period, i.e., the reference dose z * t0:t = (0, · · · , 0) and CD(z t0:t ; z * t0:t , W) = t0≤t ≤t z t .

A collection of treatment dose trajectories is said to be "W-equivalent" if they have the same cumulative dose with respect to the same weight function and reference trajectory.

DEFINITION 5.2 (W-Equivalence). Two treatment dose trajectories z ti:tj and z ti:tj are said to be W-equivalent w.r.t.to the reference trajectory z * ti:tj , written as z ti:tj W ≡ z ti:tj , if CD(z ti:tj ; z * ti:tj , W) = CD(z ti:tj ; z * ti:tj , W). Treatment dose trajectories that are equivalent to z ti:tj form an equivalence class and is denoted as [z ti:tj ] W = z ti:tj | CD(z ti:tj ; z * ti:tj , W) = CD(z ti:tj ; z * ti:tj , W) .

Equipped with Definition 5.1 and 5.2, we are ready to state a major assumption that facilitates extending a dose-response relationship to longitudinal settings.

ASSUMPTION 1 (Potential outcomes under W-equivalence). Let [z t0:t ] W be an equivalence class as defined in Definition 5.2 with respect to · W and a reference trajectory z * t0:t . Then unit-level potential outcomes at time t, Y n,t (·), satisfies: Y n,t (z t0:t ) = Y n,t (z t0:t ), ∀ z t0:t , z t0:t ∈ [z t0:t ] W .

EXAMPLE. In the study of AZT's effect on mortality, Z t0:t represents the AZT dose trajectory from t 0 to t. Let Y n,30 = 1 if unit n dies at time t = 30 and 0 otherwise. Assumption 1 applied to Y n,30 then states that patient n's 30-day mortality status depends on the AZT trajectory from t 0 to t only through some "cumulative dose" captured by CD(z t0:t ; z * t0:t , W) (e.g., the aggregate dose; see Remark 1). REMARK 2. Although Assumption 1 and its variants are often assumed in the literature on longitudinal studies to reduce the number of potential outcomes (Robins, Hernán and Babette, 2000, Section 7), its validity needs to be evaluated on a case-by-case basis. We evaluated Assumption 1 in the infectious disease dynamics context using standard compartment model before invoking it in our application; see Supplementary Material H for details.

We now extend the dose-response relationship to a longitudinal setting. DEFINITION 5.3 (Unit-Level Dose-Response Relationship in Longitudinal Studies). Let CD(z t0:t ; z * t0:t , W) be a cumulative dose defined in Definition 5.1 and f n (·; θ n ) a univariate dose-response model parametrized by θ n such that f n (0; θ n ) = 0. Suppose that Assumption 1 holds. A unit-level dose-response relationship for unit n states that (9) Y n,t (z t0:t ) − Y n,t (z * t0:t ) = f n CD(z t0:t ; z * t0:t , W); θ n .

REMARK 3. Observe that when z t0:t = z * t0:t , the LHS of (9) evaluates to 0 and the RHS evaluates to f n CD(z * t0:t ; z * t0:t , W); θ n = f n {0; θ n } = 0.

REMARK 4. Let z t0:t and z t0:t be two treatment dose trajectories such that z t0:t = z t0:t but CD(z t0:t ; z * t0:t , W) = CD(z t0:t ; z * t0:t , W). For the dose-response relationship (9) to be well-defined, we necessarily have Y n,t (z t0:t ) = Y n,t (z t0:t ), which is guaranteed by Assumption 1.

REMARK 5. Similar to the static setting considered in Section 2, the dose-response relationship (9) can be thought of as a parsimonious summary of unit-level causal effects from a sustained period of treatment.

Embedding longitudinal data into an experiment and testing a dose-response relationship. Let i = 1, 2, · · · , I be I pairs of two units matched on the covariate process L i1,t = L i2,t but Z obs i1;t0:t = Z obs i2;t0:t . Units i1 and i2 are each associated with the following two potential outcomes at time t: Y ij,t (Z obs i1;t0:t ) and Y ij,t (Z obs i2;t0:t ), i = 1, · · · , I, j = 1, 2, in parallel with Definition 2.1 in the static setting. Write F t = {L ij,t , Y ij,t (Z obs i1;t0:t ), Y ij,t (Z obs i2;t0:t ), i = 1, · · · , I, j = 1, 2}. Let ij denote the unit with the minimum cumulative dose in matched pair i and ij the other unit, and write Z obs ∧;t0:t = {Z obs 1j ;t0:t , · · · , Z obs Ij ;t0:t }, and Z obs ∨;t0:t = {Z obs 1j ;t0:t , · · · , Z obs Ij ;t0:t }. By iteratively applying the sequential randomization assumption, it is shown in the Supplementary Material E that π i1 =P (Z i1;t0:t = Z obs i1;t0:t , Z i2;t0:t = Z obs i2;t0:t | F t , Z obs ∧;t0:t , Z obs ∨;t0:t ) =P (Z i1;t0:t = Z obs i2;t0:t , Z i2;t0:t = Z obs i1;t0:t | F t , Z obs ∧;t0:t , Z obs ∨;t0:t ) = π i2 = 1/2.

REMARK 6. In the static setting, it suffices to match on observed covariates to embed data into an approximate experiment; in the longitudinal setting, one needs to match on the covariate process L t including the time-independent covariates and observed outcomes during the treatment period. REMARK 7. Our framework is different from the balance risk set matching of Li, Propert and Rosenbaum (2001) . According to Li, Propert and Rosenbaum (2001) 's setup, units receive a binary treatment at most once in the entire study period. Our framework is also different from Imai, Kim and Wang (2018) . Imai, Kim and Wang (2018) 's primary interest is the treatment effect of an intervention at a particular time point t; hence, Imai, Kim and Wang (2018) pair a subject receiving treatment at time t to subjects with the same treatment dose and covariate process up to time t − 1 but not receiving the treatment at time t. In sharp contrast, we are focusing on the causal effect of a sustained period of treatment, similar to the setup in Robins, Hernán and Babette (2000) . The entire treatment dose trajectory is the unit to be permuted and our design reflects this aspect.

Consider testing the following dose-response relationship in a longitudinal study:

t0:t , W); θ , ∀i = 1, · · · , I, j = 1, 2, for some θ, where z * t0:t is a reference trajectory, CD(z t0:t ; z * t0:t , W) a cumulative dose, and f (·; θ) a doseresponse relationship of scientific interest. Within each matched pair are two observed treatment dose trajectories Z obs i1;t0:t and Z obs i2;t0:t . We observe the potential outcome that i1 exhibits under Z obs i1;t0:t , i.e., Y i1,t (Z obs i1;t0:t ) = Y obs i1,t ; moreover, we are able to impute Y i1,t (·) evaluated at Z obs i2;t0:t , under H L 0 and Assumption 1:

(11) Y i1,t (Z obs i2;t0:t ) = Y obs i1,t + f CD(Z obs i2;t0:t ; z * t0:t , W); θ − f CD(Z obs i1;t0:t ; z * t0:t , W); θ .

Similarly, we have Y i2,t (Z obs i2;t0:t ) = Y obs i2,t and can impute: Table 3 summarizes the observed and imputed information. The problem has now been reduced to the static setting, except that instead of permuting the two scalar treatment doses, we now permute two treatment dose trajectories. Randomization-based testing procedure like the one discussed in Section 3 in a static setting can be readily applied to testing (1) θ = θ 0 for a fixed θ 0 and (2) the validity of a postulated dose-response relationship H L 0 . 

Time lag and lag-incorporating weights. One unique aspect of our application is that there is typically a time lag between social distancing and its effect on public-healthrelated outcomes. We formalize this in Assumption 2.

ASSUMPTION 2 (Time Lag). The treatment trajectory is said to have a " -lagged effect" on unit n's potential outcomes at time t if Y n,t (z t0 , z t1 , · · · , z t− , z t− +1 , · · · , z t ) = Y n,t (z t0 , z t1 , · · · , z t− , z t− +1 , · · · , z t ), for all z t0 , · · · , z t− , z t− +1 , · · · , z t , and z t− +1 , · · · , z t .

In words, Assumption 2 says that the outcome of interest at time t depends on the entire treatment dose trajectory only up to time t − . Assumption 2 holds in particular when Y n,t measures the number of people succumbing to the COVID-19 at time t. Researchers estimated that the time lag between contracting COVID-19 and exhibiting symptoms (i.e., the so-called incubation period) had a median of 5.1 days and could be as long as 11.5 days (Lauer et al., 2020) , and the time lag between the onset of the COVID-19 symptoms and death ranged from 2 to 8 weeks (Testa et al., 2020; World Health Organization, 2020) . Therefore, it may be reasonable to believe that the number of COVID-19 related deaths at time t does not depend on social distancing practices days immediately preceding t for some properly chosen . Assumption 2 may be further combined with Assumption 1 to state that unit n's potential outcomes at time t depend on the entire treatment dose trajectory z t0:t only via some cumulative dose from time t 0 to t − by defining the cumulative dose with respect to some lag-incorporating weight function W lag that assigns 0 weights to treatment doses immediately preceding time t.

REMARK 8. Suppose that the time lag assumption holds for potential outcomes Y n,t (·), · · · , Y n,t+ −1 (·), and let g : R l → R be a function that maps these potential outcomes to an aggregate outcome g{Y n,t (·), · · · , Y n,t+ −1 (·)}. One immediate consequence of Assumption 2 is that g{Y n,t (·), · · · , Y n,t+ −1 (·)} depends on the entire treatment dose trajectory Z t0:t+ −1 only via Z t0:t−1 ; moreover, we may invoke Assumption 1 and further state that g{Y n,t (·), · · · , Y n,t+ −1 (·)} depends on the entire treatment dose trajectory Z t0:t+ −1 only via some cumulative dose of Z t0:t−1 . Dose-response relationships, statistical matching, and testing procedures described in Section 5.3 and 5.4 then hold by replacing Y n,t (·) with the aggregate outcome g{Y n,t (·), · · · , Y n,t+ −1 (·)} where appropriate. Details are provided in Supplementary Material F. 5.6. Incorporating interference. One may further allow the outcome of interest of unit ij to depend not only on its own cumulative dose, but also the cumulative doses of neighboring units, as described in Section 4. Let Z t0:t = (Z 11;t0:t , · · · , Z I2;t0:t ) denote the random treatment dose trajectories from t 0 to t of all study units, z t0:t its realization, z * t0:t = (z * t0:t , · · · , z * t0:t ) a collection of reference dose trajectories, and Y ij,t ( Z t0:t ) the potential outcome. Finally, collect the cumulative doses of all study units in z cumu = (CD(z 11;t0:t ; z * t0:t , W), · · · , CD(z I2;t0:t ; z * t0:t , W)). We stress that each entry of Z t0:t is itself a random trajectory, while each entry of z cumu is a scalar. Synthesizing our development in Section 4 and 5.4, we consider testing a dose-response relationship in a longitudinal study under interference by modeling the contrast between Y ij,t ( z t0:t ) and Y ij,t ( z * t0:t ). Combining Principle I and II in Section 4 with Assumption 1, we have a causal null hypothesis of the form

where f CD(z ij;t0:t ; z * t0:t , W); θ captures the dose-response direct effect and g( z cumu , G ij, • ) models a spillover effect that depends only on the cumulative doses of ij's neighboring units. Simple parametric models as described in Section 4.3 can be readily applied to model the spillover effect. By imputing under H L 0,interference and permuting the two treatment dose trajectories within each matched pair as described in Section 5.4, one can readily conduct randomization-based inference to construct confidence sets for structural parameters in the dose-response relationship while treating interference parameters in the g(·) model as sensitivity parameters.

6. Social distancing and COVID-19 during phased reopening: study design.

6.1. Data: time frame, granularity, cumulative dose, outcome, and covariate history. The first state in the U.S. that reopened was Georgia at April 24th, 2020. We hence consider data from April 27th, the first Monday following April 24th, to August 2nd, the first Sunday in August in the primary analysis. We choose a Monday (April 27th) as the baseline period and a Sunday (August 2rd) as the endpoint because social distancing and public-health-related outcomes data exhibited consistent weekly periodicity (Unnikrishnan, 2020) .

We analyze the data at a county-level granularity and use the county-level percentage change in the total distance traveled compiled by Unacast ™ as the continuous, time-varying treatment dose. We consider a two-month treatment period from April 27th (Monday) to June 28th (Sunday). According to the data compiled by Unacast ™ , counties cut social mobility by at most 50% during most of the phased reopening; hence, we set the reference dose trajectory to be −0.5 throughout the treatment period and define a notion of cumulative dose with respect to this reference dose trajectory and a uniform weighting scheme that assigns equal weight to each day during the treatment period. In a sensitivity analysis, we further repeated all dose-response analyses using different notions of cumulative dose based on different weighting schemes. In the Supplementary Material H, we assess the appropriateness of Assumption 1 in the context of standard epidemiological models using simulation studies. The primary outcome of interest is the cumulative COVID-19 related death toll per 100, 000 people from June 29th (Monday) to August 2nd (Sunday), a total of five weeks. The countylevel COVID-19 case number and death toll are both obtained from the New York Times COVID-19 data repository (The New York Times, 2020) .

As discussed in Section 5.4, we matched counties similar on covariates, including timeindependent covariates and time-dependent covariate processes, in order to embed data into an approximate randomized experiment. Specifically, we matched on the following timeindependent baseline covariates: female (%), black (%), Hispanic (%), above 65 (%), smoking (%), driving alone to work (%), flu vaccination (%), some college (%), number of membership associations per 10, 000 people, rural (0/1), poverty rate (%), population, and population density (residents per square mile). These county-level covariates data were derived from the census data collected by the United States Census Bureau and the County Health Rankings and Roadmaps Program (Remington, Catlin and Gennuso, 2015) . Moreover, we matched on the number of new COVID-19 cases and new COVID-19 related deaths per 100, 000 people every week from April 20th -26th to June 23th -29th.

6.2. Statistical matching, matched samples, and assessing balance. A total of 1, 211 matched pairs of two counties were formed using optimal nonbipartite matching (Lu et al., 2001 (Lu et al., , 2011 . We matched exactly on the covariate "rural (0/1)" for later subgroup analysis and balanced all other 32 covariates. We added a mild penalty on the cumulative dose so that two counties within the same matched pair had a tangible difference in their cumulative doses, and added 20% sinks to eliminate 20% of counties for whom no good match can be found (Baiocchi et al., 2010; Lu et al., 2011) . Following the advice in Rubin (2007) , the design was conducted without any access to the outcome data in order to assure the objectivity of the design.

Within each matched pair, the county with smaller cumulative dose is referred to as the "better social distancing" county, and the other "worse social distancing" county. Appendix A shows where the 1, 211 better social distancing counties and the other 1, 211 worse social distancing counties are located in the U.S., and Figure 5 plots the average daily percentage change in total distance traveled and the average daily COVID-19 related death toll per 100, 000 people during the treatment period (April 27th to June 28th) in two groups. It is evident that two groups differ in their extent of social distancing, but are very similar in their daily COVID-19 related death toll per 100, 000 people during the treatment period. Finally, Appendix B summarizes the balance of all 33 covariates in two groups after matching. All variables have standardized differences less than 0.15 and are considered sufficiently balanced (Rosenbaum, 2002) . In Supplementary Material G.1, we further plot the cumulative distribution functions of important variables in two groups. A detailed pre-analysis plan, including matched samples and specification of a primary analysis and three secondary analyses, can be found via doi:10.13140/RG.2.2.23724.28800. 7. Social distancing and COVID-19 during phased reopening: outcome analysis. and average daily COVID-19 related death toll per 100,000 people (solid lines) in 1, 211 better social distancing counties (blue) and 1, 211 worse social distancing counties (red). We saw a sharp contrast in the level of social distancing but little difference in the COVID-19 related death during this treatment period.

7.1. Primary analysis: causal null hypothesis regarding the death toll. Fix t 0 = April 27th and T = June 28th. Let Z t0:T = z t0:T denote a treatment dose trajectory from t 0 to T and Y t (·) the potential COVID-19 related deaths at time t. We specify the time-lag parameter = 35 so that T + corresponds to August 2nd. As discussed in Remark 8, we consider the aggregate outcome Y agg (·) = g{Y T +1 (·), · · · , Y T + (·)} = T +1≤t≤T + Y t (·). Our primary analysis tests the following causal null hypothesis for the 1, 211 × 2 = 2, 422 counties in our matched samples: H 0,primary : Y ij,agg (z t0:T ) − Y ij,agg (z * t0:T ) = 0, ∀z t0:T , ∀i = 1, · · · , I = 1211, j = 1, 2. This null hypothesis states that the treatment dose trajectory from April 27th to June 28th had no effect whatsoever on the COVID-19 related death toll from June 29th to August 2nd.

The top left panel of Figure 6 plots F min (CDF of the better social distancing counties' observed outcomes) and F tr max (CDF of the worse social distancing counties' observed outcomes under H 0,primary ); we calculate the test statistic t KS = 0.735 and contrast it to a reference distribution generated using 1, 000, 000 samples from all possible 2 1211 randomizations; see the top right panel of Figure 6 . In this way, an exact p-value equal to 2.06 × 10 −4 is obtained and the causal null hypothesis H 0,primary is rejected at 0.05 level. Moreover, as detailed in Section 4 and 5.6, rejecting the null hypothesis H 0,primary also implies rejecting the null hypothesis of no direct or spillover effect under arbitrary interference pattern.

We further conducted two sensitivity analyses to assess the no unmeasured confounding assumption and the time lag assumption we made in the primary analysis. In the first sensitivity analysis, we allowed the dose trajectory assignment probability π i1 and π i2 as in (10) to be biased from the randomization probability and then generated the reference distribution with this biased randomization probability; specifically, we considered a biased treatment assignment model where log(Γ i ) = log{π i1 /π i2 } in each matched pair i was proportional to the absolute difference in the cumulative doses of two units in the pair (π i1 = π i2 = 1/2 and Γ i = 1 in a randomized experiment for all i). We found that our primary analysis conclusion would hold up to Γ i having a median as large as 3.82. See Supplementary Material G.3.1 for details. In the second sensitivity analysis, we repeated the primary analysis using a shorter time lag l = 28 days and the result was similar; see Supplementary Material G.3.2 for details.

Our primary analysis results suggested that different social distancing trajectories during the treatment period had an effect on the COVID-19-related death toll in the subsequent weeks. This causal conclusion stands under arbitrary interference pattern and is robust to unmeasured confounding. 7.2. Secondary analysis I: secondary outcome. Let Y ij,case,agg denote the cumulative COVID-19 cases per 100, 000 people from June 29th to July 12th (corresponding to time lag l = 14 days), as specified in our pre-analysis plan. We test the following null hypothesis concerning the secondary outcome Y ij,case,agg : H 0,secondary : Y ij,case,agg (z t0:T )−Y ij,case,agg (z * t0:T ) = 0, ∀z t0:T , ∀i = 1, · · · , I = 1211, j = 1, 2.

The exact p-value is less than 10 −5 ; see the bottom panels of Figure 6 . In a sensitivity analysis, we repeated the test with a shorter time lag l = 10 days and the result was similar; see Supplementary Material G.3.3 for details. Our result suggests strong evidence that social distancing from April 27th to June 28th had an effect on cumulative COVID-19 cases per 100, 000 people from June 29th to July 12th in our matched samples.

7.3. Secondary analysis II: explore dose-response relationship. Let z * t0:T denote a reference trajectory equal to the −0.50 for all t 0 ≤ t ≤ T (corresponding to 50% reduction in total distance traveled from April 27th to June 28th), W lag a weight function that assigns equal weight to all t such that t 0 ≤ t ≤ T and 0 otherwise, and a cumulative dose CD(z t0:T ; z * t0:T , W lag ) defined with respect to z * t0:T and W lag . We invoke Assumption 1 so that Y agg (·) depends on Z t0:T = z t0:T only via CD(z t0:T ; z * t0:T , W lag ), and consider testing the dose-response kink model concerning the aggregate case number Y ij,case,agg : H 0,kink : Y ij,case,agg (z t0:T ) − Y ij,case,agg (z * t0:T ) = 0, ∀z t0:T such that CD(z t0:T ; z * t0:T , W lag ) ≤ τ, and log{Y ij,case,agg (z t0:T )} − log{Y ij,case,agg (z * t0:T )} = β · {CD(z t0:T ; z * t0:T , W lag ) − τ }, ∀z t0:T such that CD(z t0:T ; z * t0:T , W lag ) > τ, ∀i = 1, · · · , I = 1211, j = 1, 2.

This dose-response relationship states that the potential COVID-19 cases from June 29th to July 12th remains the same as the potential outcome under Z t0:T = z * t0:T , i.e., strict social distancing that reduces total distance traveled by 50% everyday from April 27th to June 28th, when the cumulative dose (defined w.r.t. z * t0:T and W lag ) is less than some threshold τ ; after the cumulative dose exceeds this threshold, the COVID-19 case number increases exponentially at a rate proportional to how much the cumulative dose exceeds the threshold. We tested (13) for different τ = τ 0 and β = β 0 combinations; the maximum p-value obtained at (τ 0 , β 0 ) = (0.48, 10.0) is equal to 0.417 and hence the kink model (13) cannot be rejected. The left panel of Figure 7 plots the level-0.1 and level-0.05 confidence sets of (τ, β). The right panel of Figure 7 plots three dose-response curves with baseline (i.e., 50% reduction) case number equal to 1 per 100, 000 people for three selected (τ 0 , β 0 ) pairs in the level-0.1 confidence set. In a sensitivity analysis, we repeated the analysis by considering two different specifications of cumulative dose, one assigning more weight to early days of the phased reopening and the other towards the end of the phased reopening. Confidence sets results look similar under different specifications; see Supplementary Material G.3.4 for details.

The confidence set of the threshold parameter τ is tightly centered around 0.5, suggesting that as a county's average percentage change in total distance traveled from April 27th to June 28th increases from −50% to around −5% to 5%, the potential COVID-19 cases number from June 29th to July 12th would largely remain unchanged; however, once beyond this threshold, the case number would rise exponentially and could incur an increase as large as 10-fold when a county's average distance traveled increased by about 20% compared to the pre-coronavirus level. 7.4. Secondary analysis III: subgroup analysis and differential dose-response relationship. We also conducted subgroup analyses by repeating the primary and secondary analyses described in Section 7.1 to 7.3 on 462 matched pairs of 2 non-rural counties and 749 matched pairs of 2 rural counties. P-values when testing the primary analysis hypothesis H 0,primary concerning the death toll and secondary analysis hypothesis H 0,secondary concerning the case number are 0.004 and less than 10 −5 , respectively, in the non-rural subgroup, and 0.008 and 0.009, respectively, in the rural subgroup. We also allowed a differential doseresponse relationship between social distancing and case numbers in rural and non-rural counties and constructed confidence sets for (τ, β) separately; see Figure 8 . We further repeated the subgroup analyses under different specifications of the cumulative dose and the results were similar; see Supplementary Material G.3.4 for details.

A comparison of confidence sets for the non-rural counties (top left panel of Figure 8 ) and rural counties (bottom left panel of Figure 8 ) revealed an intriguing pattern: while the level-0.1 confidence set of the activation threshold τ is centered around the range of 0.4 − 0.6 for the rural counties, it contains 0 for the non-rural counties; moreover, the level-0.1 confidence set of rural counties covers a much larger range of β values compared to that of the non-rural counties. Together, these results suggest that the activation dose required to trigger exponential growth in case numbers in rural counties seemed much larger than that in non-rural counties; however, once exponential growth in case numbers was incurred, the growth seemed more rapid in rural counties. 7.5. Dose-response relationship under local interference. We next applied the methodology developed in Section 4 and 5.6 to obtain corrected confidence sets of (τ, β) under local interference. To this end, we collect 2 × 1, 211 copies of reference dose trajectory z * t0:T in z * t0:T and the cumulative doses of all study units during the treatment period in z cumu = (CD(z 11;t0:T ; z * t0:T , W lag ), · · · , CD(z I2;t0:T ; z * t0:T , W lag )). We consider relaxing the dose-response relationship by incorporating local interference as follows:

H 0,kink,interference : Y ij,case,agg ( z t0:T ) − Y ij,case,agg ( z * t0:T ) = 0, ∀ z t0:T such that CD(z ij;t0:T ; z * t0:T , W lag ) ≤ τ, and log{Y ij,case,agg ( z t0:T )} − log{Y ij,case,agg ( z * t0:T )} =β · {CD(z ij;t0:T ; z * t0:T , W lag ) − τ } · 1 +

Spillover Effect Factor C , ∀ z t0:T such that CD(z ij;t0:T ; z * t0:T , W lag ) > τ, ∀i = 1, · · · , I = 1211, j = 1, 2. According to this null hypothesis, there is no direct effect if county ij's cumulative dose is below some threshold τ ; hence, there is no spillover effect in this case by Principle III described in Section 4. Once county ij's cumulative dose is above the threshold, this triggers exponential growth captured by the dose-response direct effect plus a spillover effect. The magnitude of the spillover effect is equal to the direct effect multiplied by a spillover effect factor C. This spillover effect factor depends on the average cumulative dose of ij's neighbors but is always upper bounded by 1 so that the spillover effect is no larger than the direct effect (see Section 4.3). In rare cases when a county has no neighbor, C is defined to be 0 so that there is no spillover effect. We used the county adjacency file provided by the United States Census Bureau (U.S Census Bureau) as our adjacency matrix G.

The interference parameters (k, s) in the above model are easy to interpret and specify. For instance, (k, s) = (5.0, 1.0) corresponds to a small spillover effect of approximately 1% of the direct effect when neighbors' average cumulative dose is 0.10 (corresponding to an average 40% reduction in social mobility compared to the pre-pandemic level during the treatment period) and approximately 8% of the direct effect when neighbors' average cumulative dose is 0.50 (corresponding to social mobility remaining the same as the pre-pandemic level during the treatment period). In this way, the interference parameters (k, s) carry concrete meanings and can be easily tuned and communicated to the audience.

The left panel of Figure 9 plots the level-0.1, 0.05, and 0.005 confidence sets of (τ, β) when the interference parameters (k, s) = (5, 1). The right panel of Figure 9 further illustrates the inferred dose-response direct effects (solid lines) and the associated spillover effects (dotted lines) under (k, s) = (5, 1) for (τ, β) = (0.44, 2.5) (red lines) and (0.48, 5.0) (blue lines).

The level-0.05 confidence set of the dose-response direct effect contains similar τ values but considerably smaller β values compared to assuming no interference and modeling the total effect using a dose-response kink model (see the left panel of Figure 7 ). This makes intuitive sense as the total effect has now been decomposed into the dose-response direct effect and a spillover effect due to neighboring counties. under interference parameters (k, s) = (5, 1). Maximum p-value is obtained at (τ 0 , β 0 ) = (0.44, 2.5) (red marker). Three isopleths (0.1, 0.05, and 0.005) are plotted. Right panel: dose-response direct effect (solid lines) and the associated spillover effects (dotted lines) for selected (τ 0 , β 0 ) in the 0.05 confidence set with baseline Y ij,case,agg (z * t 0 :T ) equal to 1 per 100, 000. Two red lines correspond to (τ 0 , β 0 ) = (0.44, 2.5) and blue lines (τ 0 , β 0 ) = (0.48, 5.0). 8. Discussion. We studied in detail the effect of social distancing during the early phased reopening in the United States on COVID-19 related death toll and case numbers using our compiled county-level data. To address the statistical challenge brought by a time-dependent, continuous treatment dose trajectory, we developed a design-based framework based on nonbipartite matching to embed observational data with time-dependent, continuous treatment dose trajectory into a randomized controlled experiment. This embedding induces a randomization scheme that we then used to conduct randomization-based, model-free statistical inference for causal relationships, including testing a causal null hypothesis, a structured dose-response relationship and a causal null hypothesis under local interference modeling.

Upon applying the proposed design and testing procedures to the mobility and COVID-19 data, we found very strong evidence against the causal null hypothesis and supportive of a causal effect of social distancing during the early phases of reopening on subsequent COVID-19-related death and case numbers. Our finding complements many recent studies based on standard epidemiological models Koo et al., 2020) and structural equation modeling (Chernozhukov, Kasahara and Schrimpf, 2021; Bonvini et al., 2021 ) from a unique perspective, and once again confirms the important role of social distancing (as captured by a reduction in mobility in this article) in combating the novel coronavirus. Our transparent comparison of two groups of similar counties makes our findings digestible and easy to communicate to the general public.

In a dose-response analysis, we found that the confidence set of the dose needed to activate exponential growth was tightly centered and its magnitude suggested that once the total distance traveled returned to or even superseded the pre-coronavirus level, it would have a devastating effect on the COVID-19 case numbers by contributing to exponential growth. Moreover, in a subgroup analysis where we allowed a differential dose-response relationship, we found that more stringent social distancing would be needed to avoid devastating exponential growth for non-rural counties; however, once the exponential growth was incurred, the growth appeared more rapid in rural counties. This striking difference in dose-response relationship between rural and non-rural communities agrees with experts' assessment of the transmission dynamics. Given its clinical features, the rate of virus reproduction is likely higher in large, urban areas due to more reproductive opportunities afforded by denser populations (Souch and Cossman, 2020) and this may explain the absence of an "activation dose" in non-rural counties (see top left panel of Figure 8 ). On the other hand, although rural residents have less social interaction compared to non-rural counterparts, they often have more underlying medical conditions and are more likely to present for treatment at more advanced stages of disease (Callaghan et al., 2021) , which may partly explain why rural communities seemed to incur more drastic exponential growth in case numbers once the activation dose was exceeded (see bottom left panel of Figure 8 ).

The design-based approach and analysis proposed in this article has its limitations. First, we used social mobility data as a proxy measure for social distancing. It would be interesting to look at other aspects of social distancing, e.g., closure of borders, reduction in aviation travel, etc, in future works. Second, in order to permute two treatment dose trajectories in a longitudinal setting, one necessarily needs to match on observed outcomes during the treatment period and compare outcomes after the treatment period; therefore, in a longitudinal setting, the method is suited only for applications where the effect of a time-varying treatment is not immediate, e.g., effect of precautionary measures on the death toll. Third, when the sample size is limited, the interference parameters are treated as sensitivity parameters that researchers vary, rather than population parameters for which researchers construct confidence sets. The proposed method also has its unique strengths: it embeds the noisy observational data into an approximate randomized controlled trial and has a clear "reasoned basis" (Fisher, 1935) when testing the causal null hypothesis, and researchers can always conduct a sensitivity analysis to investigate how causal conclusions would change when the randomization assumption is relaxed. The method developed in this article can be readily applied to many practical problems where there is a continuous exposure and the scientific interest lies in testing a dose-response relationship. Understanding a dose-response relationship is central to many scientific disciplines like public health (Gorell et al., 1999; Farrelly et al., 2005) , pharmacology (Tallarida and Jacob, 2012) , and toxicology (Calabrese and Baldwin, 2003) , among many others. Supplementary Material F provides details on generalizing the dose-response relationship to an aggregate outcome. Supplementary Material G provides further details on the case study, including maps of the 1, 211 better and worse social distancing counties in the matched samples, the balance table, a closer examination of the distributions of some important variables after matching, separate analyses of rural and non-rural counties, and numerous sensitivity analyses. Supplementary Material H assesses Assumption 1 using a standard epidemiological model. code and data.zip Data and R code implementing the statistical matching and randomization inference. 

Best of Today

Amulti-risk SIR model with optimally targeted lockdown Technical Report

Is the lockdown important to prevent the COVID-19 pandemic? Effects on psychology, environment and economy-perspective

Exact p-values for network interference

Building a stronger instrument in an observational study of perinatal care for premature infants

P values maximized over a confidence set for the nuisance parameter

Bridging observational studies and randomized experiments by embedding the former in the latter

Time series experiments and causal estimands: exact randomization tests and trading

Causal Inference in the Time of Covid-19

Reasoning about interference between units: A general framework

Mathematical Models in Population Biology and Epidemiology 2

Hormesis: the dose-response revolution

Rural and urban differences in COVID-19 prevention behaviors

Causal impact of masks, policies, behavior on early covid-19 pandemic in the US

Institutional, not home-based, isolation could contain the COVID-19 outbreak

Randomization inference for treatment effect variation

Modified randomization tests for nonparametric hypotheses

Evidence of a dose-response relationship between "truth" antismoking Ads and youth smoking prevalence

The Design of Experiments

The relationship between cultural tightnesslooseness and COVID-19 cases and deaths: a global analysis

Smoking and Parkinson's disease: a dose-response relationship

Psychological impact of COVID-19 lockdown: An online survey from India

Optmatch: Flexible, optimal matching for observational studies

Instrumental Variables: to Strengthen or not to Strengthen? arXiv preprint

Matching as nonparametric preprocessing for reducing model dependence in parametric causal inference

Evaluating kindergarten retention policy: A case study of causal inference for multilevel observational data

Matching methods for causal inference with time-series cross-section data

Causal inference in statistics, social, and biomedical sciences

Interventions to mitigate early spread of SARS-CoV-2 in Singapore: a modelling study

The positive impact of lockdown in Wuhan on containing the COVID-19 outbreak in China

The incubation period of coronavirus disease 2019 (COVID-19) from publicly reported confirmed cases: estimation and application

Scientific and ethical basis for social-distancing interventions against COVID-19

Balanced risk set matching

Matching with doses in an observational study of a media campaign against drug abuse

Optimal nonbipartite matching and its statistical applications

Bayesian Inference for Sequential Treatments Under Latent Sequential Ignorability

Randomization-based inference within principal strata

Report of the WHO-China joint mission on coronavirus disease

On obtaining permutation distributions in polynomial time

The county health rankings: rationale and methods

A new approach to causal inference in mortality studies with a sustained exposure period-application to control of the healthy worker survivor effect

Correcting for non-compliance in randomized trials using structural nested mean models

Marginal structural models

Estimation of the causal effect of a time-varying exposure on the marginal mean of a repeated binary outcome

Marginal Structural Models and Causal Inference in Epidemiology

Sensitivity analysis for matched observational studies with many ordered treatments

Observational Studies

Heterogeneity and causality: Unit heterogeneity and design sensitivity in observational studies

Interference between units in randomized experiments

Design of Observational Studies

Randomization analysis of experimental data: The Fisher randomization test comment

Statistics and causal inference: Comment: Which ifs have causal answers

Causal inference using potential outcomes: Design, modeling, decisions

The design versus the analysis of observational studies for causal effects: parallels with the design of randomized trials

Social distancing laws cause only small losses of economic activity during the COVID-19 pandemic in Scandinavia

Only strict quarantine measures can curb the coronavirus disease (COVID-19) outbreak in Italy

A commentary on rural-urban disparities in COVID-19 testing rates per 100,000 and risk factors

Matching methods for causal inference: A review and a look forward

The dose-Response relation in pharmacology

Visualizing the lagged connection between COVID-19 cases and deaths in the United States: An animation using per capita state-level data

Coronavirus (Covid-19) Data in the United States

Globally Coherent Weekly Periodicity in the Covid-19 Pandemic. medRxiv

Social distancing in covid-19: what are the mental health implications?

Recursive Partitioning and Applications

Bridging preference-based instrumental variable studies and cluster-randomized encouragement experiments: study design, noncompliance, and average cluster effect ratio