key: cord-193947-vcm3v0ix authors: Pollmann, Michael title: Causal Inference for Spatial Treatments date: 2020-10-31 journal: nan DOI: nan sha: doc_id: 193947 cord_uid: vcm3v0ix I propose a framework, estimators, and inference procedures for the analysis of causal effects in a setting with spatial treatments. Many events and policies (treatments), such as opening of businesses, building of hospitals, and sources of pollution, occur at specific spatial locations, with researchers interested in their effects on nearby individuals or businesses (outcome units). However, the existing treatment effects literature primarily considers treatments that could be assigned directly at the level of the outcome units, potentially with spillover effects. I approach the spatial treatment setting from a similar experimental perspective: What ideal experiment would we design to estimate the causal effects of spatial treatments? This perspective motivates a comparison between individuals near realized treatment locations and individuals near unrealized candidate locations, which is distinct from current empirical practice. Furthermore, I show how to find such candidate locations and apply the proposed methods with observational data. I apply the proposed methods to study the causal effects of grocery stores on foot traffic to nearby businesses during COVID-19 lockdowns. How can we do causal inference with spatial treatments? In the setting of this paper, a spatial treatment, such as the opening of a "million dollar plant" (Greenstone and Moretti, 2003; Greenstone et al., 2010) occurs at a geographic location, and the outcome of interest, such as earnings, is measured for separate individuals who are located nearby. This distinction between units of treatment assignment and outcome units has received little attention in theoretical work in causal inference. In the absence of guidance from theoretical work, most recent empirical studies using highly-detailed location data rely on adaptations of the familiar difference-in-differences method. Unfortunately, these adaptations to the spatial treatment setting implicitly either rely heavily on functional form assumptions or on partly incongruent nonparametric assumptions to identify causal effects. This is in stark contrast to settings with individual-level treatments, where many researchers prefer causal inference based on quasi-experimental methods with simpler, more transparent assumptions, that obtain credibility by emulating an "ideal experiment" the researcher wished to have run. In this paper, I propose (quasi-) experimental methods for spatial treatments that are motivated by an ideal experiment where the spatial locations of treatments are random. These methods are based on a simple insight: Suppose the ideal experiment randomly chooses some locations from a larger set of candidate locations. Then quasi-experimental methods should compare individuals near locations that are chosen to individuals near locations that were not chosen for treatment. For a formal characterization of estimands and estimators, I extend the potential outcomes framework for individual-level treatments to allow treatments to be randomized across space and to directly affect nearby individuals. Within this framework, I derive finite sample design-based standard errors similar to those of Neyman (1923 Neyman ( , 1990 for randomized experiments with individual-level treatment assignment for a fixed population. In the "million dollar plant" example, my proposals using micro location data are analogous to the approach Greenstone and Moretti (2003) take with aggregate data, while most current empirical work takes a conceptually distinct approach. Suppose we want to estimate the average effect of a million dollar plant on individuals who are, say, 1 mile away. The method employed by most recent empirical work compares individuals on an "inner ring" around the million dollar plant with radius 1 mile to individuals on an "outer ring," who are, say, 5 miles away from the same million dollar plant. Since many observable and unobservable characteristics correlate with distance from any one point in space (Lee and Ogburn, 2020; Kelly, 2019) , this comparison of inner and outer ring is often unattractive: If treatment always occurs in the city center, the inner ring, or treated, individuals are urban individuals, while the outer ring, or control, individuals are suburban and rural individuals. Researchers attempt to ameliorate this issue by adding a pre vs. post comparison in a difference-in-differences approach, where outcomes for urban and suburban individuals are allowed to be on different levels, but must evolve along parallel trends. In contrast, Greenstone and Moretti (2003) take a different approach with data aggregated at the county level. They compare counties that "won" the bidding war for a million dollar plant to "runners-up" counties that were also very seriously considered as locations for million dollar plants, but ultimately "lost" (Greenstone and Moretti, 2003) . In short, the methods I propose compare individuals who are 1 mile away from the million dollar plants to individuals who are 1 mile away from 2 locations that would have been chosen for the plants in the losing counties. Since these counterfactual locations are rarely known in observational studies, I show how to find suitable candidate locations in practice. With micro location data, the methods then estimate the same detailed estimands targeted by the difference-in-differences approach. They have an attractive quasi-experimental interpretation and are valid by design if the choice of treatment location is as good as random within a set of plausible candidate locations. The difference-in-differences approach of current empirical practice relies on either partly incongruent nonparametric or functional form assumptions that are not guaranteed to be satisfied even in a true randomized experiment. The comparison of individuals on an inner ring to those on an outer ring inherently makes two assumptions: First, the treatment must not affect individuals on the outer ring directly. This is most easily achieved by choosing an outer ring with large radius, such that these "control" individuals are far away. Second, the individuals on inner and outer rings must be comparable. This is most easily achieved by choosing an outer ring close to the inner ring, in conflict with the first assumption. Even when the differences in levels between inner and outer ring are differenced out with individual fixed effects in panel data, the parallel trends assumption is particularly strong in spatial treatment settings. Suppose, for instance, that treatment only occurs in city centers. Then the assumption may require individuals living in downtown areas to be on parallel trends to those on the outskirts of the city. Furthermore, researchers typically estimate the effect not just at one distance but at multiple distances, typically using the same outer ring control group. This effectively requires that individuals at all distances up to the outer ring are on the same parallel trend, with additively separable time fixed effects. These assumptions are not just approximations to make finite sample analysis feasible where asymptotically an analogous nonparametric specification identifies treatment effects: Identification in this approach rests upon the functional form assumptions even asymptotically, and even with experimental data. Instead, I recommend estimators that are formally valid under the quasi-experimental variation in treatment location sometimes used to informally justify the assumptions of the difference-in-differences approach. The difference-in-differences approach generally yields the most credible estimates if the treatment is known not to have an effect past a short, known, distance. Then individuals on the outer ring are likely to be comparable. Sometimes, these comparisons are then justified by the fact that the exact location of the treatment was as good as random. For instance, (Linden and Rockoff, 2008, p. 1110) , referencing Bayer et al. (2008) , argue that for their treatment, sex offenders moving into neighborhoods, "the nature of the search for housing is also a largely random process at the local level. Individuals may choose neighborhoods with specific characteristics, but, within a fraction of a mile, the exact locations available at the time individuals seek to move into a neighborhood are arguably exogenous." The estimators proposed in this paper allow researchers to make use of such credible identifying variation directly, rather than relying on an ultimately arbitrary outer ring. I demonstrate the quasi-experimental methods I propose in an application studying the causal effects of grocery stores on foot-traffic to nearby restaurants during COVID-19 lockdowns in April of 2020. In this application, I observe the exact spatial locations of grocery stores as well as other businesses in the San Francisco Bay Area. I show how to find "control" neighborhoods that are similar to neighborhoods of actual grocery stores except for the absence of one marginal grocery store. The outcome of interest, foot-traffic to restaurants, is measured as the number of customers whose smarthphone location is shared with Safegraph. I find that restaurants at distances of less than 0.05 miles from a grocery store have substantially more weekly customers than restaurants near counterfactual grocery store locations in comparable neighborhoods lacking the marginal grocery store. This suggests a positive externality of grocery stores on nearby businesses, akin to anchor stores in shopping malls, at least when customer mobility is reduced as during the COVID-19 pandemic. While I argue in favor of a design-based, quasi-experimental approach in this paper, the difference-in-differences approach has its own advantages, such that both approaches are complementary. Specifically, the comparison with an "outer ring" effectively removes time-specific noise that is shared within a larger region but distinct across regions. In contrast, the methods proposed in this paper focus mostly on eliminating confounding due to differences in the spatial neighborhoods, such as population density, of treated and control individuals. Whether spatial variation, temporal variation, or functional form assumptions yield the most credible estimates of causal effects depends on the particular empirical setting. Researchers may find studies particularly credible if several distinct identification strategies lead to similar conclusions. Doubly-robust estimators (e.g. Robins and Rotnitzky, 1995; Belloni et al., 2017) , which model both the outcome (conditional expectation) and assignment process (propensity score), may offer an attractive bridge between approaches. The framework developed in this paper allows me to extend the proposed methods to settings where multiple treatment locations are close to one another, as in the application to foot-traffic caused by grocery stores, which are often near other grocery stores. The existing difference-in-differences approach, in contrast, is not applicable when treatment locations are too close to one another. In the framework of this paper, I can allow for such interference between spatial treatments and illustrate the complications it causes. In recent work, Zigler and Papadogeorgou (2018) and Aronow et al. (2020) specifically study such interference in a spatial treatment setting. They derive average effect estimands that are identified despite interference. In the present paper, I instead define the estimands of interest based on an ideal experiment that rules out interference by design. In extensions that complement the work by Zigler and Papadogeorgou (2018) and Aronow et al. (2020) , I then discuss assumptions under which these estimands are identified even when there is interference. In addition, I demonstrate how to find additional candidate treatment locations, where treatment could have occurred but did not, with observational data, increasing the number of settings the proposed methods can be applied to. Furthermore, I view the framework developed in this paper as particularly helpful for deriving standard errors of estimators of the effects of spatial treatments. By providing formulas for finite sample design-based standard errors, I sidestep the often difficult decisions regarding clustering and "spatially correlated errors" (e.g. Conley, 1999) that arise in practice for virtually any application using spatial relationships between observations. Aronow et al. (2020) also provide some design-based standard errors, but focus on asymptotic normality and sampling-based variances in the style of Conley (1999) for the estimator most similar to the ones proposed in this paper. The results in their work therefore complement those in this paper. The interpretation of the standard errors I propose is simple: They reflect the variation in the estimator that arises from randomizing treatment locations, holding the individuals in the sample fixed. This is the same variation that is needed for internal validity of the causal effect estimates (Abadie et al., 2020) . In the baseline setting, the variance estimators I derive are similar to clustering at the level of treatment assignment (Abadie et al., 2017) . The approach I take in this paper, generalizes straightforwardly to settings with a contiguous region or multiple treatments close to one another. Clustering, in contrast, is based on sharp, sometimes arbitrary, boundaries and the absence of interference between clusters. Finally, the framework highlights nuances in interpretation that have received little attention in the literature thus far. Most recent empirical work estimates the effects of spatial treatments at multiple distances. However, the average effects at different distances are not generally comparable. Since some individuals are often more likely -before realization of treatment assignment -to be close to treatment locations, their treatment effects typically get more weight in average effect estimands at shorter distances, and less weight in average effect estimands at longer distances. In other words, we cannot generally interpret effect-by-distance curves or the change in effect between distances as average within-individual effects. Even the aggregate weight placed on individuals near any one treatment location varies with distance. Both of these effects can lead to estimates of average treatment effects that increase in distance, even though individual-level treatment effects are decreasing in distance for every individual. The framework in this paper allows me to characterize estimators with alternative weights on individuals to mitigate such issues. In addition, in this framework I can show how to aggregate individual-level treatment effects to estimate the aggregate effects of treatment at a location on all nearby individuals. The framework and methods discussed in this paper may also prove useful for causal inference questions not directly related to spatial treatments. First, other non-spatial settings also feature "treatments" that are not directly assigned to individuals but affect them based on some measure of distance. In this paper, I briefly discuss Bartik (1991) -, or shift-share, instruments, where for instance industry-level shocks affect all cities depending on industry composition. 1 The perspective taken in this paper resembles that of Borusyak and Hull (2020) , with non-random distances from candidate treatment locations but random variation in which candidate locations are realized. Second, I develop an approach to finding suitable unrealized candidate locations in observational data based on flexible machine learning methods. This approach may extend to other settings with dependency between observations where it is sometimes challenging to find (good) control observations, such as event studies and other time series settings. Third, separating treatment assignment and outcome individuals in this framework further clarifies distinctions between design-based and sampling-based inference (Abadie et al., 2020) . While design-based inference captures variation in treatment locations, sampling-based inference can reflect sampling of individuals at fixed locations, within fixed regions (infill asymptotics (Cressie, 1993) in the spatial statistics literature), of a growing contiguous space (expanding domain asymptotics (Cressie, 1993) ), or of independent regions (clustering). The present paper focuses on design-based inference specifically; in-depth comparisons of different modes of inference are beyond its scope. My current analysis is limited in at least three important ways. First, I assume that outcome individuals have fixed locations. This is problematic if individuals move, or migrate, strategically in response to the treatment. Second, the framework is not directly applicable to settings where we are interested in the causal effects of spatially correlated characteristics of places, such as in the literature on social mobility (e.g. Chetty et al., 2014) . Instead, the present paper focuses on treatments that occur at discrete locations in space. While the ideal experiment of randomizing treatment locations also creates a spatially correlated covariate of interest (distance from treatment), the randomization distribution it induces is much simpler to characterize. Third, alternative estimators that are more robust or more efficient in certain settings may exist. While I attempt to offer theory and estimators for a variety of spatial treatment settings, the primary focus of this paper lies in developing a coherent conceptual framework that allows me to characterize, discuss, and exploit the ideal experiment with spatial treatments. In particular, the present paper provides no formal justification for the use of methods from the literature on sample splitting and double robustness (e.g. Chernozhukov et al., 2018) . The need to consider many relative spatial locations for finding suitable unrealized candidate locations makes this a high-dimensional estimation problem in observational settings, suggesting the importance of methods and insights from that literature. The remainder of this paper is organized as follows. The final part of the introduction highlights the wide range of empirical applications for which this work is relevant, as well as connections to the theoretical literature. Section 2 develops a potential outcomes framework for spatial treatments. Section 3 contains the main results on identification, estimation, and inference under the ideal experiment. Section 4 discusses how to extend these results to additional settings of empirical relevance. Section 5 shifts the focus from experimental to observational data, proposing assumptions and methods that allow researchers to emulate the ideal experiment. Section 6 shows how to apply these methods in practice. In the conclusion, I discuss limitations of the present paper and fruitful directions for future research on causal inference for spatial treatments. Empirical Relevance The methods I propose are relevant for a diverse range of questions from many applied fields in economics and other social sciences. Recent studies estimating the effects of spatial treatments using individual-level outcome and location data include Stock (1989 Stock ( , 1991 ; Linden and Rockoff (2008) ; Currie et al. (2015) ; Aliprantis and Hartley (2015) ; Sandler (2017); Diamond and McQuade (2019) ; Chalfin et al. (2019) ; Rossin-Slater et al. (2019) . Notably, Dell and Olken (2020) explicitly consider counterfactual treatment locations in a quasi-experimental setting, as well as the permutation distribution based on counterfactual assignments. 2 Much more existing empirical work studying spatial treatments is limited to aggregated outcome data. If micro location data had been available for these studies at the time, researchers would likely have asked questions that can be answered using the methods I propose in this paper. Experimental and observational studies of spatial treatments in economics using aggregate data fitting into the framework of this paper include Duflo (2001) ; Miguel and Kremer (2004) ; Cohen and Dupas (2010) in development economics, Greenstone et al. (2010); Feyrer et al. (2017) public and labor economics, (Jia, 2008) in industrial organization, and environmental economics (Keiser and Shapiro, 2019) . Furthermore, a recent literature has documented large geographic variation in a diverse range of outcomes (for instance Chetty et al., 2014; Chetty and Hendren, 2018; Finkelstein et al., 2016 Finkelstein et al., , 2019 Bilal, 2019) . Many potential sources or causes of this inequality, as well as many potential remedies such as place-based policies, involve spatial treatments. Related Theoretical Literature This paper sits at the intersection of the literatures on causal inference, spatial statistics and econometrics. A small number of recent theoretical papers has similarly studied spatial treatments, albeit with a different focus. Most closely related Zigler and Papadogeorgou (2018) , Aronow et al. (2020), and Imai et al. (2018) focus on settings with interference between treatment locations and show that only some average treatment effects are identified without additional semiparametric assumptions. In contrast, I define estimands of interest in a setting without interference, and discuss application-specific assumptions to retain identification of these estimands under interference. Furthermore, I discuss a broader range of estimands and estimators, in particular for observational data where unrealized candidate treatment locations are rarely known. The simpler baseline setting also highlights interpretation and weighting issues that are obscured in the presence of interference. Papadogeorgou et al. (2020) take a conceptually different approach. They develop a framework based on spatial point patterns (cf. Cressie, 1993) rather than fixed units of observation to answer a distinct question. For them, the locations of outcome units vary with the treatment assignment, and the number of outcome units is the object of interest. Instead of contrasting different treatment assignments to define effects, their estimand contrasts entire assignment mechanisms (stochastic interventions, Muñoz and van der Laan, 2012) . McIntosh (2008) proposes an estimator for settings where individuals known to be unaffected by the treatment exist as a natural group group. Pouliot (2018) also studies a setting where the locations of outcomes and covariates are spatially misaligned, but not in the context of spatial treatments and causal inference. Within the causal inference literature, the setting of this paper most closely relates to work on interference and networks. Some work in causal inference explicitly considers spatially correlated treatments (Delgado and Florax, 2015; Druckenmiller and Hsiang, 2019) , but is not directly applicable to the patterns generated by spatial treatments. The literature on interference is concerned with spillover, or indirect, effects of treatments assigned to individuals in violation of the stable unit treatment value assumption (Rosenbaum, 2007; Hudgens and Halloran, 2008; Tchetgen Tchetgen and VanderWeele, 2012; Aronow and Samii, 2017; Vazquez-Bare, 2017; Sävje et al., 2017; Sävje, 2019; Basse et al., 2019) . Treatment effects in network settings typically originate from individual-level treatment assignment and propagate through the network (e.g. Basse et al., 2019). 3 In contrast to the interference and networks literatures, the present paper is concerned with a setting where the units of treatment are separate from outcome units. While the effect "spills over" to the outcome units, there is no interference between different treatment units if they are few and far apart, as in the baseline setting of this paper. Consequently, the estimands and estimators of interest in spatial treatment settings generally differ from those in interference and network settings. For spatial treatment settings with interference, the spatial relationships between observations allow me to make semiparametric functional form assumptions to limit interference; see section 4.2 for details. Similar assumptions may sometimes also be plausible if treatments are directly assigned to individuals but have spillover effects on other individuals. The design-based finite sample inference developed in this paper complements samplingbased large sample asymptotic theory developed in the spatial statistics and econometrics literature. Conley (1999) proposed standard errors taking into account cross-sectional (spatial) dependence in a GMM framework; see also Case (1991) ; Lahiri et al. (2002) ; Lee (2004) ; Andrews (2005) ; Kelejian and Prucha (2007) ; Bester et al. (2011); Lahiri and Robinson (2016) ; Kuersteiner and Prucha (2020) and references therein for alternative results. Spatial proximity is also commonly used to motivate cross-sectional dependence in the literature on clustered sampling (Moulton, 1986 (Moulton, , 1990 Moulton and Randolph, 1989; Hansen, 2007; Donald and Lang, 2007; Barrios et al., 2012; Cameron and Miller, 2015; Abadie et al., 2017) . The spatial statistics and econometrics literature is primarily concerned with descriptive estimands, modeling the spatial correlations existing in outcome data even in the absence of spatial treatments. Textbook treatments of such models in spatial statistics and econometrics include Cressie (1993) ; Cressie and Wikle (2011); Anselin (1988) ; Anselin et al. (2004) ; Anselin and Rey (2010) ; LeSage and Pace (2004) ; Arbia (2014) . Since treatment assignment (or distance from treatment) does not vary within location, "increasing domain asymptotics" (Cressie, 1993) (asymptotics in the number or size of regions or clusters) are likely needed for consistency of causal effect estimates. Spatial treatment applications, however, typically feature a large number of individuals near a smaller number of treatment locations, such that alternative "infill asymptotics" (Cressie, 1993) may offer better approximations. The primary contribution of this paper to this literature is a focus on the estimation of causal effects and design-based in-sample inference, rather than descriptive estimands and sampling-based inference. This paper also connects to the literature on estimation of treatment effects under unconfoundedness and doubly-robust estimation. Specifically, I propose a formal notion of unconfoundedness (cf. Rosenbaum and Rubin, 1983; Imbens and Rubin, 2015) that is appropriate for spatial treatments. With individuals and treatment locations distributed across space, a large number of covariates, such as population density or average income at different distances, are predictive of both outcomes and treatment assignment probabilities. Doublyrobust estimators are particularly promising in observational settings with spatial treatments: they have attractive consistency and efficiency properties based on the combination of outcome and treatment (propensity score) modeling. Recent work has adapted these estimators to high-dimensional settings (Belloni et al., 2014 (Belloni et al., , 2017 Farrell, 2015; Chernozhukov et al., 2018; . Initial results suggest that such estimators may also perform well in spatial treatment settings. Finally, this paper contributes to the recent literature illustrating creative uses of modern machine learning methods for economic analyses (see Mullainathan and Spiess, 2017; Glaeser et al., 2018; Athey, 2018; Gentzkow et al., 2019; , for recent reviews). I propose a method for finding unrealized candidate treatment locations based on an adversarial task: Finding unrealized locations that are indistinguishable (to the algorithm) from realized treatment locations. Most closely related, use generative adversarial networks (Goodfellow et al., 2014) to create samples for simulation studies. Kaji et al. (2020) similarly propose "adversarial estimation" to estimate structural models using generative adversarial networks. In each of these applications, the aim is to generate synthetic samples which look indistinguishable from the real data. In the application of this paper, only the unrealized candidate treatment locations are synthetic, while the outcome and covariate data around it are real. In this paper, I argue in favor of convolutional neural networks in particular, based on the similarity between spatial data and image data, which sparked more recent developments in this method (Krizhevsky et al., 2012) . Relative spatial positions are similar to relative positions of pixels, and different covariates at each location correspond to the different color channels of images. For economic applications using satellite data (see Donaldson and Storeygard, 2016 , for a review), convolutional neural networks have also shown promise (e.g. Jean et al., 2016; Engstrom et al., 2017) . Convolutional neural networks are particularly attractive for spatial settings because they build on relevant economic intuition for regularization: While the geographic space might be large and high-dimensional, the immediate spatial neighborhood often matters the most, and relative distances matter similarly at different absolute locations. Through careful design decisions, the methods I propose for the spatial treatment setting retain some interpretability in addition to the good performance commonly associated with "black box" machine learning algorithms. In this section, I propose an extension of the potential outcomes notation (cf. Imbens and Rubin, 2015) that treats the level of treatment assignment as conceptually distinct from the level at which we measure outcomes. This distinction separates the intervention that is the cause of the effect from the individuals for whom the effect is measured. It allows me to formally characterize estimands of interest, and to derive estimators and their properties in the following sections. With spatial treatments, potential outcomes of individuals are functions not of an individual-level binary or continuous treatment, but of a set of candidate treatment locations. We are interested in the effects of spatial treatments. Let S denote the set of candidate treatment locations, shown as triangles in figure 1. Th set of candidate treatment locations is assumed to be finite; in the example of figure 1 just two locations in the region shown. This reflects an inherent scarcity that is common to most applications: Only a small number of locations are ultimately realized, and most locations are infeasible, unsuitable, implausible, or unlikely for the treatment. In spatial settings, the candidate locations are typically given by latitude and longitude or other (relative) coordinates, such that S ⊂ R 2 . Throughout this paper, the set S is finite, as virtually any practical application will be based on some discretized, or rounded, locations. One can, however, take S as defining a finely spaced grid over R 2 . This is convenient to Figure 1 : Illustration of the setup. While typically only relative locations matter, locations are often given by their "GPS coordinates" as latitude and longitude. In the figure, the candidate treatment locations at which the treatment may occur are given by triangles. The small circles indicate the locations of individuals. The researcher typically estimates the treatment effects, caused by treatment at one of the candidate locations and experienced by the individuals, conditional on distance from treatment. When the (weighted) Euclidean distance function is used, individuals within a narrow distance bin from a candidate location are located on a ring, here displayed as an area shaded gray. If driving time is used instead to measure distance, individuals at a given distance need not be located on a circular ring. The figure shows data from a single region. In the baseline setting of this paper, the researcher has data from multiple such regions, with treatment realized only in some of them. If treatment is realized at multiple (both) candidate locations (triangles) within the same region, there is potential interference between them, complicating estimation and inference. In the baseline setting, the probability of treatment at locations and in regions describes a two-stage process. In the first stage, a fixed number of regions are chosen randomly for treatment somewhere in the region. In the second stage, a single candidate location in each chosen region is chosen randomly to receive treatment. conceptualize situations where treatment could be realized anywhere with some positive probability. The random variable ⊂ S denotes the set of the realized treatment locations. We measure the outcome of interest for units indexed by . For the remainder of this paper, I will refer to these outcome units as individuals, but in some settings may be a business, census tract, or similar, typically small, unit with fixed geographic location. Denote the set of all individuals by I. Individual has spatial location, or residence, , shown as small circles in figure 1. Throughout this paper, I assume that the locations of individuals are fixed; there is no migration. In some applications, corresponds to, for instance, the workplace of individual rather than their residence. The location of is in the same space as the candidate treatment locations, such that typically ∈ R 2 are latitude and longitude. Define potential outcomes for each individual ∈ I Potential Outcomes: as the outcome for individual if treatment is realized in locations ⊂ S. To simplify notation, and consistent with standard potential outcomes notation, let the potential outcome of individual in the absence of any realized treatment be (0) ≡ (∅). The treatment effects of primary interest contrast some treatment vector ⊂ S with the absence of realized treatments, ′ = ∅. Specifically, I define the effect of on an individual ∈ I as Treatment Effects: Oftentimes, the treatment vector of interest, , is a singleton, = { } for a single candidate location ∈ S. With slight abuse of notation, 4 define Treatment Effects: I define meaningful average treatment effects in section 3. These average treatment effects average across both individuals and treatment vectors . Distances Distances between treatment locations and individuals are central to defining interesting average treatment effects in section 3. For instance, the researcher may estimate the average effect of a treatment at a distance of 1 mile. In figure 1 , the areas shaded gray highlight all locations approximately 1 mile away from any candidate treatment location. The distance between treatment location ∈ S and individual ∈ I is given by a distance function Distance Function: ( , ) ≥ 0 Importantly, the distance between two locations must be observable (to the researcher) and must not be affected by treatment assignment, ruling out migration in response to the treatment. 5 The distance function is used for two purposes. First, to estimate heterogeneous average treatment effects by distance from treatment. Second, to assume distances at which treatments have no effect to limit interference and to thereby aid in estimation and inference. When locations are given as Cartesian coordinates, we can use the Euclidean distance in R 2 : Euclidean Distance: When locations are given by latitude and longitude, the Great Circle distance is more accurate than a Euclidean distance with fixed weights on latitude and longitude. 6 For some applications in social sciences, driving distances are arguably more relevant. Suppose the spatial treatment corresponds to an employer opening a new location. Then an individual's access to the treatment, and hence treatment effect, likely depends on driving time rather than straight line distance. However, computing driving times between many locations may be computationally and financially expensive. When using straight line distances instead, some interpretability, but not validity, is lost. We can also study the effects of state-wide policies and other clustered assignments in this framework. In this setting, each candidate treatment location ∈ S corresponds to one cluster, or state. The appropriate distance function for this setting is In the simplest case of state-wide policies, we use this distance function to estimate the treatment effect at a distance of 0. This corresponds to estimating the treatment effect of the policy by comparing individuals in treated states to individuals in untreated states. We can generalize the cluster membership function to be smooth in distance to a treated state: For individuals in treated states, this distance is 0. For individuals in untreated states, the distance is smallest if they are most exposed to treated states. Exposure may measure, for instance, distance to the state border, shared media markets, number or cost of flights between airports, or the relevance of the industries of treated state to 's occupation. 7 Regions In many applications, it is convenient to group individuals and treatment locations into regions. For instance, in a sample of data from different cities, individuals and treatment locations of each city may form a separate region. When regions are not directly coded in the data, one can sometimes define regions based on geographic proximity such that treatment locations only have effects within their own regions. That is, no individual is close enough to candidate treatment locations from two or more distinct regions to be affected by both of them. Figure 1 shows data from one such region. In the baseline setting of this paper, the researcher has access to data from multiple such regions, but this requirement is relaxed in section 4.2. Throughout, I denote regions by subscripts = 1, . . . , . Let S ⊂ S be the set of candidate treatment locations within region . The set of realized treatment locations within region is . If treatment is realized within region , ̸ = ∅, let = 1, and otherwise, = ∅, let = 0. If = 1, I say that region "is treated" or "is a treated region." Analogously, if = 0, I say that region "is a control region." Let I ⊂ I be the set of individuals with residence in region . The region where individual resides is given by ( ), such that ∈ I ( ) . Interference The notation in this paper can be seen as an extension of the notation of the literature on interference (cf. Aronow and Samii, 2017) . Consider first a setting with individual-level treatments. Let be the treatment assigned to an individual = 1, . . . , , and ∈ {0, 1} be the vector stacking all of the . In the absence of interference, that is, under the stable unit treatment value assumption (cf. Imbens and Rubin, 2015) , the observed outcome of individual is = ( ). With interference, the outcome of individual may depend not only on her own treatment assignment, but also on the treatment assignment of other individuals. That is, the potential outcomes of are function a function of the entire rather than only her own , and her observed outcome is = ( ). Notationally, spatial treatments generalize this setting by allowing to have a dimension other than , the number of individuals. For the closest analogy, enumerate the candidate treatment locations by = 1, 2, . . . , where is the finite number of candidate treatment locations. The random variable of realized treatment locations takes on values ∈ {0, 1} , such that ≡ 1 whenever the kℎ candidate location is treated, and ≡ 0 otherwise. The realized outcome for individual is then = ( ), where is rather than dimensional. Consider the example given in figure 1. Some individuals are at a distance of 1 mile from both treatment locations. If the treatment has an effect at that distance, the treatment states of both candidate locations jointly determine the observed outcome. The two candidate locations can interfere because conditional on the treatment state of just one of them, the outcome for some individuals still varies depending on the treatment state of the other candidate location. The literature on interference is typically interested in answering (at least) one of two questions. First, what is the effect of changing 's treatment status, holding the treatment status of 's neighbors fixed? Second, what is the effect of changing the treatment status of 's neighbors, holding the treatment status of fixed? With spatial treatments, neither of these questions is of primary interest. If is 1 mile away from a realized treatment location, then a neighbor of , say ′ , is also approximately 1 mile away from the same realized treatment location. A counterfactual where is 1 mile away from a realized treatment location, while her neighbor ′ is not, is typically not feasible or relevant in practice. The treatment does not spill over from to ′ , it affects both of them directly, such that decompositions into direct and indirect effects (cf. Hudgens and Halloran, 2008) are not well defined. 8 Interference in the spatial treatment setting refers to multiple treatment locations affecting the same individual, rather than the treatment or effect of one individual spilling over to another individual. Formally, a treatment location affects an individual if for some set of treatment locations ⊂ S, the outcome of changes when is included or excluded: . Two treatment locations , ′ ∈ S interfere with one another if there is an individual affected by treatment at both locations, that is In spatial treatment settings, it is often natural to assume that treatment locations that are far away from an individual do not affect her. Formally, assume that whenever ( , ) > max for some sufficiently large distance max , ( ∪ { }) = ( ∖ { }) for all ⊂ S. Assumption 1 formally states that there is no interference across regions. Assumption 1 (No Interference Across Regions). Individuals in region are unaffected by treatment locations in regions ′ ̸ = . That is, for ∈ I and ⊂ S, Regions are sufficiently far apart that individuals in one region are unaffected by treatment locations in another region. The results in this paper, however, fundamentally rely on the absence of interference between treatment locations that are far apart, not on separate regions. Section 4.2 discusses a setting where all data available to the researcher comes from a single large contiguous region. If the region is sufficiently large and realized treatment locations are sufficiently scarce, it is still possible to estimate causal effects without strong additional assumptions. The separate region framework, however, helps clarify key concepts by simplifying estimators, and it is applicable to a large number of empirical studies. The assumption that treatment locations only affect individuals within the same region is similar in spirit to assumptions that interference or spillovers are limited to family members, classrooms, or other subgroups in settings with individual-level treatments (e.g. Vazquez-Bare, 2017). The assignment mechanism (Imbens and Rubin, 2015) determines the probabilities with which treatment is realized at each of the candidate treatment locations. The marginal probability that treatment is realized at a location ∈ S is given by Pr( ∈ ). In the main part of the paper, I consider a two-stage assignment mechanism that imposes structure on Pr( ∈ ) as well as on the conditional probabilities Pr( ∈ | ′ ∈ ) and Pr( ∈ | ′ ̸ ∈ ). In the first stage, either a fixed number of regions is chosen to receive treatment, or assignment is through independent Bernoulli trials (coin flips). In the second stage, a single location receives treatment in each treated region. I discuss methods for some observational settings that deviate from this assignment mechanism in sections 4.2. Suppose the randomization of treatments across regions takes the form a completely randomized experiment with a fixed number of treated regions. Assumption 2 formalizes this design together with an assumption that each region is equally likely to be treated. Define ≡ Pr( = 1) for = 1, . . . , to be the probability that a region receives treatment. Note that the completely randomized design differs from experiments that are paired or stratified at the region-level. Results for stratified experiments are generally similar and can be obtained by substituting the appropriate covariances of treatment indicators in the proofs. Estimating the variance of estimators under paired designs is often difficult (e.g. Bai et al., 2019 , for individual-level treatment assignment), but does not contribute conceptually to our understanding of the spatial treatment setting. Assumption 2 (Completely Randomized Experiment). Regions are chosen for treatment according to a completely randomized design (e.g. Imbens and Rubin, 2015, ch. 4.4) where each region has equal marginal probability of receiving treatment somewhere, = for all regions . That is, all assignment vectors ∈ {0, 1} with ∑︀ = ≡ are equally likely, and assignments with ∑︀ ̸ = have zero probability: As an alternative to completely randomized designs with fixed probability of treatment, I also consider designs where treatment is decided by independent coin flip for each region, potentially with different probabilities. Assumption 3 below formalizes this assumption. Assumption 3 (Bernoulli Trial). Regions are chosen for treatment according to a Bernoulli trial (e.g. Imbens and Rubin, 2015, ch. 4 .3) where region has marginal probability of receiving treatment somewhere and assignment is independent across regions. That is, the probability of assignment ∈ {0, 1} is such that the number of treated regions varies. In the main part of the paper, I consider a setting with exactly one treated location in each treated region. This restriction of the assignment mechanism rules out interference by design under the minimal assumption that treatments have no effects across regions. For each candidate treatment location in a region, ∈ S , define the probability of treatment conditional on the region receiving treatment as ( ) ≡ Pr( ∈ | = 1). Then, by the definition of conditional probabilities, Pr( ∈ ) = Pr( ∈ | = 1) Pr( = 1) = ( ) . The notational distinction between treatment of regions and treatment of particular locations within regions is motivated by an asymmetry in which potential outcomes are observed: In control regions, the control potential outcomes are observed for all individuals near each (unrealized) candidate treatment location. In treated regions, in contrast, only the treated potential outcomes corresponding to one particular treatment location are observed for all individuals. This asymmetry is apparent in the estimators and variances throughout section 3. Individual-level effects express the average effects of treatment locations on individuals. The most intuitive estimator of the average effect of a spatial treatment on nearby individuals takes the simple average of individuals near realized treatment and subtracts from it the average outcome of properly chosen control individuals. In this section, I first show who the proper control individuals are under the ideal experiment of random variation in treatment locations. Then I present properties of this estimator and discuss its interpretation as the average treatment effect on the treated. The average of individuals who are treated at a distance ± ℎ from a treated location is where ( ) = 1 if and only if individual is in a region ( ) that is treated. The indicator function equals 1 if and only if the distance between individual and the realized treatment location in her region, ( ) , is within the distance bin of distances between − ℎ and + ℎ. For instance, to estimate the average outcome for individuals who are between 1 and 2 miles from treatment, calculate¯(1.5) with ℎ = 0.5. The choice of control individuals to compare this average of treated individuals to is less obvious. Recent empirical studies compare the treated to controls on an outer ring; that is, to individuals ′ in treated regions ( ( ′ ) = 1) who are farther away from treatment. Effectively, this estimates the treatment effect at distance as¯( ) −¯( ′ ) where ′ ≪ . In analogy to individual-level randomized experiments, one might also consider taking the simple average of individuals in control regions, ∑︀ (1 − ( ) ) / ∑︀ (1 − ( ) . While either of these strategies is valid under further assumptions or in particular settings, below I argue in favor of a different strategy that is justified by the experimental design. One particular choice of (weighted) control average is, however, justified by the experimental design of the ideal experiment considered in this paper: Most importantly, the estimator¯( ) only averages over individuals who are at approximately distance from some candidate location ∈ S ( ) . The remaining weighting is similar to inverse probability weighting estimators of the average effect of the treatment on the treated (ATT) in settings with individual-level treatments (cf. Imbens, 2004) . To see that the control average¯( ) provides the appropriate counterfactual for the simple average of the treated ( ), consider the expected value of the terms in the numerator of the latter. It is straightforward to show that see appendix A.1 for the details. The difference between the expression above and the terms of the estimator¯( ) is that the latter can only average over individuals in control regions, with ( ) = 0, requiring the additional inverse probability weight 1 ( ) in¯( ). The estimator¯( ) therefore aligns, in expectation, the weights placed on each control potential outcome (0) with those placed on the corresponding treated potential outcome ( ) bȳ ( ). Consequently, the estimator^( estimates a weighted average of the differences ( ) − (0), which are the individual-level treatment effects ( ) defined in section 2 above. The particular inverse probability weights make^( ) an estimate of the average treatment effect on the treated at a distance of ± ℎ. Theorem 1 states approximate finite sample properties of this estimator. (i) unbiasedness for the ATT: is the average potential outcome of individuals at distance from location corresponding to treatment at location ,˜( ) averages the˜( , ) within region , with weights proportional to the probability of treatment at location .˜( ) is the analogous withinregion average potential outcome for the same individuals but in the absence of treatment. ( ) and˜( ) similarly average the within-region averages across regions. The number of individuals at distance from location is ( , ),¯( ) when averaged within region , while¯( ) is the expected number of individuals at distance from realized treatment across regions. The theorem is a special case of theorem 3 below with weight Remark 1. The approximation in theorem 1 arises because the denominators of the estimator ( ) are stochastic. 9 The proof proceeds by deriving the finite sample properties of an infeasible demeaned estimator˜( ) with non-stochastic denominators that satisfies^( )−˜( ) = ( −1 ), where is the number of regions; details are given in the appendix. Even with relatively few regions, the approximation is likely to perform well in practice. Similar issues arise with individual-level treatments if treatment is decided by successive coin flips, rather than by fixing the number of treated. In spatial treatment settings, however, it is rarely feasible to hold the number of individuals near treatment fixed when randomizing the assignment of treatment locations. When all candidate locations have equal numbers of individuals in the distance bin, the approximations in the theorem above hold with equality. Remark 2. The expected value given by theorem 1 is also relevant for other estimators usinḡ ( ) as the mean of the treated but relying on a different control comparison group and auxiliary assumptions to justify the comparison. When researchers argue that randomization in the spatial locations for treatments allows them to estimate the treatment effect usinḡ ( ) (or close analogs), they therefore implicitly estimate the average treatment effect on the treated. I believe there is value in making the estimation target explicit: As I argue in section 3.1.2 below, the ATT as defined above does not necessarily allow the most meaningful comparisons of the effects at different distances. Remark 3. The control average¯( ) used by the proposed estimator^( ) simplifies to the simple average over all individuals in control regions, if each individual is equally likely to be distance from realized treatment. This typically requires that treatment can be realized at any location within a region with equal probability ( ( ) is constant within ), and the probability that a region is selected for treatment ( ) must be proportional to its area. Then the unconditional treatment probability ( ) is constant for all locations , not just for a small, finite, set of candidate locations. Figure 2 illustrates and contrasts this with the more common setting where only a small number of candidate locations have positive probability of receiving treatment. Remark 4. The variance given in the theorem is the design-based variance (Abadie et al., 2020) of the estimator. It expresses the variation in the estimate arising from assigning treatment randomly to one candidate location in a fixed number of randomly chosen regions. The individuals whose outcomes are measured are held fixed across these repeated samples; the only difference between samples lies in which potential outcome is observed for each individual. The thought experiment behind the variance above is therefore similar to performing a permutation, or placebo, test. Aronow et al. (2020) also suggest permutation tests as an alternative basis for inference in the spatial treatment setting. Remark 5. The first three terms in the variance expression are similar to the variance of the difference in means estimator in a completely randomized experiment with individual-level treatments (cf. Imbens and Rubin, 2015, ch. 6 ). In the ideal spatial experiment considered in this section, treatment is randomized similar to a completely randomized experiment across regions with outcomes aggregated within regions (and distance bins).˜( ) ( ) is the variance of aggregated treated potential outcomes,˜( 0) ( ) is the variance of aggregated control potential outcomes, and ( ) ( ) resembles a variance of treatment effects, such that˜( ) ( ) +˜( 0) ( ) − ( ) ( ) resembles the variance of the difference in means under repeated sampling of fixed individuals but varying treatment assignment, the framework of this paper. Remark 6. There is a distinct asymmetry between treated and control outcomes in the expressions for the variance: There are two terms capturing different variances of treated potential outcomes, but only one variance of control potential outcomes. In a treated region, ( ) only averages over potential outcomes corresponding to the realized treatment location, but not those of other, unrealized, candidate treatment locations. The variance of this estimator therefore depends both on how treated potential outcomes vary across regions and within region across candidate locations. If most of the variance is across regions, the final term, −˜( ) is large (negative), reducing the overall variance of the estimator that is due to the variance of treated potential outcomes,˜( ) ( ). Since most of the variance is across regions, little is lost by only observing outcomes corresponding to one treatment location in regions with treatment. In a control region, in contrast, we observe the control potential outcomes (0) that are the counterfactual to all candidate treatment locations S ( ) in the region.¯( ) therefore averages over potential outcomes for all candidate locations within each region, and˜( 0) ( ) is the variance of such averages of (0) within region, across candidate locations. Remark 7. The last two terms in the variance expression arise due to the two-stage randomization in the ideal experiment. After randomizing between regions, the ideal experiment also randomizes between the candidate treatment locations within each treated region. When each region only has a single candidate treatment location, the variance can be simplified to only use the first three terms (scaled), as there is no second stage randomization in that case. Estimation of Variance Without further assumptions, one can only estimate the first two terms of the variance,˜( ) ( ) and˜( 0) ( ), to form a conservative estimator of the approximate finite sample variance of^( ). If there is a single candidate treatment location per region, the fifth term can be combined and estimated along with the first (and second) term. The third and fourth term are (approximately) variances of treatment effects, which are unidentified. However, the third term is larger in absolute value than the fourth term (see appendix A.2.6), such that − ( ) ( ) + ( ) ≤ 0. Intuitively, ( ) ( ) is approximately the unconditional variance of treatment effects, while ( ) is the variance of the conditional expectation (conditional on region) of treatment effects. By the law of total variance, the difference is expectation of the conditional variance, which is necessarily non-negative. Hence, dropping both terms yields a conservative estimate of the variance. If there are multiple candidate treatment locations per region, one can still estimate the fifth term under semiparametric assumption on potential outcomes, such as constant treatment effects. Specifically, note that the fifth term consists of the variance of regionaverage potential outcomes, ≈ var( (¯( , )| ∈ S )). One can readily estimate the variance of ≈ var( (¯( , ))), as in the estimation of ≈˜( ) ( ). One can also estimate both types of variances for control potential outcomes because in control regions, the relevant control potential outcome for individuals at distance from any candidate treatment location are observed. If treatment effects are constant, one can estimate¯by scaling^¯( ) by the ratio of the average within region variance to the across region variance of average control potential outcomes. The estimand ( ) is not generally appropriate when the researcher is interested in how the effect of the treatment changes with distance from treatment. As an alternative, I propose the estimand − ( ) with a more attractive interpretation when comparing effects at The figures show regions with individuals (small circles) and candidate treatment locations (triangles), highlighting areas that are distance 1 away from a candidate treatment location in gray. Suppose each candidate treatment location is equally likely to be realized. In panel a, all individuals who are distance 1 away from a candidate treatment location receive equal weight in the estimand ( ). In estimation, if the region is in the control group, we take the simple average of outcomes of the highlighted individuals. In panel b, some individuals are distance 1 away from both candidate treatment locations, so these individuals receive greater weight in the estimand ( ). In estimation, if the region is in the control group, we take the average of outcomes of the highlighted individuals, but individuals who are located in both gray rings receive twice the weight. In panel c, candidate treatment locations are everywhere (for illustration, only candidate treatment locations along a grid are displayed). If we assume that candidate treatment locations extend past the boundaries of the region, then all individuals in the region receive equal weight in the estimand ( ). In estimation, if the region is in the control group, we take the simple average of outcomes of all individuals. Distance from Treatment different distances. Additional, one can interpret − ( ) as the expected average effect at distance of a new treatment location. Figure 3 illustrates the problem of interpreting the difference between the estimands ( ) and ( ′ ) as the pattern of treatment effects across distance from treatment . Suppose the researcher is interested in comparing the average treatment effect at a short distance = short and long distance ′ = long . Suppose further that there are two types of candidate treatment locations, each type equally likely to be realized. The first type of candidate treatment locations has many individuals located at the short distance and few individuals at the long distance. These first candidate locations all have relatively small treatment effect at both distances, but decreasing in distance from treatment. The second type of candidate treatment locations has few individuals located at the short distance, and many individuals at the long distance. These second candidate locations all have relatively large treatment effect at both distances, but also decreasing in distance from treatment. In the example in figure 3, the estimand ( ) is increasing in distance even though the treatment effect of any single treatment location is decreasing in distance from treatment. The estimand ( short ) places most weight on the first type of candidate locations because most individuals at the short distance from treatment are near this type of location. In contrast, the estimand (long) places most weight on the second type of candidate locations. Hence, (long) > (short). This inequality states that the average treatment effect at a long distance for the average individual at the long distance from a candidate treatment location is larger than the average effect at a short distance for the average individual at the short distance from candidate treatment locations. It does not imply that the average effect of any single treatment is increasing in distance from treatment. Instead, the average individual at a long distance may simply be both a different type of individual (in terms of observables and unobservables) and also be exposed to a different treatment location on average. An alternative estimand, − ( ) defined below and also shown in figure 3, avoids such issues in interpretation by placing the same aggregate weight on each candidate treatment 22 location irrespective of the distance . The estimand − ( ) first separately averages the potential outcomes of nearby individuals for each candidate treatment location. These averages are then averaged again, with weights proportional only to the probability of treatment at the location. In contrast, the estimand ( ) uses weights proportional to the product of the treatment probability and the number of individuals near the treatment location. Formally, where¯( , ) is the average effect of a given candidate location on individuals at distance ± ℎ from it. These average effects are then averaged with weights ( ), which do not depend on distance from treatment. Hence, the weight placed on the average effect of a given location does not depend on the distance from treatment. To also non-parametrically control for observable differences in pre-treatment variables , one can estimate − ( ) separately using only individuals with covariate values falling into groups defined by . The comparison of − ( ) and − ( ′ ) then compares individuals with the same average exposure to the different candidate locations and similar individual characteristics . Holding the aggregate weight per treatment location constant across distance from treatment is attractive when the treatment effects are expected to be heterogeneous by region or location. Such heterogeneity is particularly plausible in many spatial treatment settings: Oftentimes, the exact implementation of the treatment differs substantially from location to location. For instance, the million dollar plants in the study of Greenstone and Moretti (2003) are each operated by distinct companies which may differ in their labor demand and wage setting. Hence, heterogeneous treatment effects arise not only due to differences between individuals, but also due to differences in the implementation of the treatments. Since spatial treatments are often larger, rarer, and more complex, their implementation tends to vary more than, say, the administration of a drug in medical trials to different patients, or the content of a job training program across training sites or cohorts. Additionally, the estimand − ( ) has an attractive interpretation as the expected effect at distance of a new treatment location. Consider the following hierarchical model. First, when treatment is realized at location , its average effect at distance is drawn as ( ) ∼ . Second, the individual-level effect of location on individual is given by where ( ) is a mean zero individual-specific component. Then, as the width of the distance bin, 2ℎ, goes to 0, the estimand − ( ) approaches the mean of the distribution . Hence, one can interpret − ( ) as the expected average individual-level treatment effect of a new treatment location drawn in the same way as existing realized treatment locations. I propose the following estimator to estimate − ( ): Theorem 2 gives the approximate properties of the finite sample distribution of − ( ). Under assumptions 1 (no interference across regions) and 2 (completely randomized design), the estimator^− ( ) has an approximate finite sample distribution over the assignment distribution with (ii) and variance as given by theorem 3 with The theorem is a special case of theorem 3 below with weight as specified above. Remark 8. The estimator here is exactly unbiased because under a completely randomized design, the sum of weights is constant across assignment realizations. This is different from theorem 1 above, where the number of treated individuals varies. Here, treated individuals are averaged by candidate treatment location, and the number of treated locations is constant across assignment realizations by assumption 2. More generally, the same ideas allow estimation of any weighted average of individual-level treatment effects that places non-zero weights only on the effects of candidate treatment locations with positive probability of realization. Write these estimands of individual-level average effects of the spatial treatment on individuals at a distance from treatment as: where ( ) is the effect of treatment at location on individual , and ( , ) are weights specified by the researcher. The estimand here therefore can be any weighted average of the effect of single treatments on individuals with weights as specified by the researcher. For the average effect of the treatment at distance , weights ( , ) are non-zero only when location and individual are (approximately) distance apart. While I define the ATT estimands ( ) and − ( ) for distance bins ± ℎ above by using the rectangular, or uniform, kernel function 1{| ( , ) − | ≤ ℎ}, the weights ( , ) can generally use any kernel function in place of distance bins to estimate the effects at distance . The average effect of the treatment on the treated estimands in equations (i) and 2 are special case of 4. For ATT the estimand corresponding to^( ), choose weights For ATT the estimand corresponding to^− ( ), choose weights I propose an inverse probability weighting estimator (cf. Imbens, 2004) to estimate the weighted average treatment effect in equation 4 above. In short, the estimator is the difference between weighted average outcomes of individuals near realized treatment locations and weighted average outcomes of individuals in regions without treatments. The weights here need to account for two aspects: First, the researcher specifies the desired weights ( , ) in the estimand. Second, individuals near locations with high treatment probability are relatively more likely, across samples of repeated treatment assignment, to appear in the sum of "treated" individuals than in the sum of "control" individuals due to the experimental design. The estimator cancels out the probability weighting induced by averaging over individuals (not) near realized treatment for the treated (control) average. To estimate the average effect of the treatment on the treated, the treatment probabilities ( ) ( ) ( ) can be included in the weights ( , ). The proposed estimator for the weighted average treatment effect in 4 iŝ The weights ( , ) are fixed and specified by the researcher. The weights ( ) and , in contrast, are stochastic due to their dependence on the treatment assignment random variables and . Specifically, ( ) = 0 unless treatment is realized at location , in which case it is equal to the inverse of the probability of this event. Similarly, = 0 unless there is no treatment in the region of individual , in which case it is equal to the inverse of the probability of no treatment in the region. Consequently, the stochastic weights are equal to 1 in expectation. The estimator divides each term by the sum of realized weights, ( ) ( , ) and ( , ), such that it is the difference between a convex combination of treated outcomes and a convex combination of control outcomes. Theorem 3 gives the approximate finite sample properties of the estimator^( ). under assumption 3 Proof: See appendix A.2. Remark 9. The variance in theorem 3 can be estimated analogously to that of theorem 1, also see appendix A.2.6. Remark 10. The approximate finite sample variance is smaller under Bernoulli trials than under a completely randomized design. This is due to the nature of the approximation, which does not penalize the variance as heavily when for instance few treated regions are available under an imbalanced assignment. In practice, the difference between both designs is negligible due to the factor − 1 in the denominator of , such that √ → 0 and there is no difference between the two designs under standard asymptotics in the number of regions. The aggregate effects of a single treatment on all affected individuals is of importance for cost-benefit and welfare analysis. In this section, I propose estimators of aggregate effects that build on estimators of individual-level effects. In experiments with spatial treatments, there are two units of observation: outcome individuals and spatial treatments. The individual-level treatment effects of the previous section are average effects per outcome individual. The aggregate treatment effects of this section are average effects per spatial treatment. Suppose the researcher is interested in the aggregate effect that a single treatment location has on all affected individuals. Define the estimand where, as before, ( ) = ( ) − (0) is the effect of treatment location on individual . The aggregate treatment effect sums these effects across individuals and averages them across candidate treatment locations , with weights ( ). In this section, I focus on the average aggregate treatment effect on the treated, , with weights ( ) ≡ Pr( = | = 1) Pr( = 1) These weights place larger weight on the effects of treatment locations that are more likely to be realized. The estimand therefore answers the question: What is the expected aggregate effect of a treatment location under the observed policy of assigning treatments to locations? One can estimate the aggregate effect by aggregating outcomes at the region-level: . This is the propensity score weighting estimator of an average treatment effect on the treated, where the outcome variable of interest is the sum over the outcomes of all individuals in a region. When there is a single candidate treatment location per region, standard results from the literature on experiments with individual-level treatments apply (cf. Imbens, 2004) , with regions taking the role of individuals. Estimators based on region-aggregate outcomes are likely to have very large variance. Each region-aggregate outcome is the sum of outcomes of individuals in the region. If there is substantial variance in the number of individuals per region and outcomes are positive, the aggregate outcome of regions with many individuals can be substantially larger than the aggregate outcome of smaller regions. For instance, suppose that the number of individuals per region is Poisson distributed with mean , and individual-level outcomes are i.i.d. within and across regions, with mean and variance 2 . Then region-aggregate outcomes have variance · ( 2 + 2 ) by the law of total variance. Hence, aggregate potential outcomes have large variance, which leads to large variance of the estimator (cf. Imbens, 2004) . Variation in region sizes generates a large variance of the region-aggregate estimator ,1 in two ways. First, if there is variance in the number of individuals per region, then in finite samples, some treatment assignments will be such that there are more individuals in treated regions than in control regions. 10 Suppose outcomes are positive and constant, for instance all individuals have the exact same value for the outcome. Then the treatment effect estimate^, 1 in such a sample is positive and sensitive to the scale of the outcome value. Hence, the estimator^, 1 can have large variance even when there is no variance in potential outcomes. Second, variation in region sizes increases the variance in a sampling-ofregions thought experiment. Even if the average individual-level treatment effect was known, needing to estimate the number of times the effect is realized on average per region can create substantial variance. The design-based variances considered in this paper condition on the individuals in the sample. With known number of individuals and known individual-level average treatment effect, it is possible to form an estimator of aggregate treatment effects with a design-based variance equal to zero, in contrast to the variance results for the estimator ,1 above. I therefore recommend an estimator of average aggregate effects that reduces the variance by building on the estimator of the average individual-level effect at a distance . Let where¯( ) is the average number of individuals at distance from candidate treatment locations: using the same distance bins (uniform kernel and bandwidth equal to bin width) for botĥ ( ) and¯( ). The set of distances D contains the midpoints of the bins that partition the full space into distance bins. For instance, if one uses distance bins [0, 1], (1, 2], . . . , (9, 10] for a treatment that is known not to have effects past a distance of 10 miles, then D = {0.5, 1.5, . . . , 9.5}. The theoretical properties of the estimator^, 2 follow from those of^( ) in theorem 1 above. Theorem 4. Under the ideal experiment, the estimator^, 2 has an approximate finite sample distribution over the assignment distribution with Remark 11. For approximate unbiasedness, the estimator^, 2 must be based on^( ), not^− ( ). Intuitively, when "integrating" the effect^( ) against the number of individuals at this distance, one needs to ensure that^( ) is an unbiased estimate of the effect for these particular¯( ) individuals. Remark 12. The variance follows from theorem 3 and theorem 1. The covariances can be derived analogously. Since^, 2 is a sum, its variance is a sum of the covariances of the terms. In the design-based perspective, the analysis is conditional on the individuals in the sample. Hence, the number of individuals in each bin,¯( ), is fixed. The estimators^( ) for distances ∈ D are therefore the only stochastic components. Remark 13. The optimal choice of distance bins (and bandwidths) remains an open question. If individuals are spread uniformly across space, equal-width rings with larger radii have larger area and hence contain more individuals. More generally, in densely populated areas, smaller bins may be preferable, and under suitable sequences of populations (infill asymptotics or growing number of regions), it may be possible to allow ℎ → 0 and |D| → ∞. Generally, in the formula above, additional distance bins decrease the (squared) weights¯( ) at the cost of increasing variances var I discuss issues in imposing parametric assumptions on the decay of treatment effects over distance from treatment and estimation by least squares regression. First, I show how to impose a parametric model on the individual-level effects at different distances. Second, I show how to estimate aggregate effects based on such a model. Most simple linear parametric models for the decay of average treatment effects over distance from treatment take the form ( ) = ∑︁˜( ) where˜are known functions of distance, and are coefficients to be estimated. In many settings, one needs to impose a distance after which the treatment has no effect, even within region, to obtain reasonable estimates from parametric models. Assumption 4 below formalizes this assumption. Assumption 4. The treatment has no effect after a distance max if, for any individual ∈ I, set of treatment locations ⊂ S, and location ∈ such that ( , ) > max , Without such a restriction, any simple functional form for˜will typically offer a poor approximation for at least some distances from treatment. One can improve the approximation to the treatment effect at short distances by using functions that only fit the treatment effect pattern up to the maximum distance max : Relatively simple functions may well approximate the average treatment effects at distances ∈ (0, max ). This imposes contextual knowledge that average treatment effects are negligible at large distances from treatment. It also resembles a "bet on sparsity" (Hastie et al., 2001) : If treatment effects really are negligible at distances longer than max , the estimators proposed below will likely perform well. If treatment effects are not negligible even at long distances, then no (parametric) estimator will perform well. For instance, one can impose a linear functional form on the treatment effect decay by choosing 1 = 1, 2 = . The coefficient then estimates the rate of decay. A quadratic functional form is imposed by 1 = 1, 2 = , 3 = 2 . In principle, the analysis in this section can be extended also to functional forms that are non-linear in the parameters, such as an exponential decay of treatment effects with unknown rate of decay, ( ) = exp( (− )). To estimate the parameters , suppose initially that there is only a single candidate treatment location in each region. This allows the definition of the distance of individual from the candidate treatment location uniquely as , irrespective of realized treatment. Then estimate the weighted linear regression with weights reflecting those in section 3.1.1 depending on the estimand, such as ( ) or − ( ) for different versions of the average effect of the treatment on the treated. The function ℎ models the average control potential outcomes at each distance from candidate treatment locations. For semiparametric estimation, specify the treatment effect decay ( ) parametrically, and estimate ℎ nonparametrically, as a partially linear model (e.g. Robinson, 1988) . In this paper, I instead focus on parametric linear estimation, which imposes known parametric functions and ℎ and estimates their coefficients: The same caveat about setting a maximum distance applies also to ℎ. Since there is no interest in effects at distances larger than max , the constant 0 captures the mean outcome for individuals at such larger distances. In practice, one typically not only want to impose a zero treatment effect after distance max (assumption 4), but a treatment effect that tends to zero continuously at max . 11 To this end, estimate the linear regression with transformed covariates Figure 4 illustrates what it means to impose this restriction. In panel (a), without the restriction, the estimated treatment effect will jump to 0 discontinuously at max . Imposing the restriction in panel (b), the estimated treatment effect is continuous also at max . The restriction generally reduces the variance of the estimator, in particular for estimating aggregate effects, as discussed below. In practice, most functional forms for imply not just a zero effect after distance max , but also a non-zero effect at distances slightly shorter than max . The figure shows a scatter plot of outcomes against distance from treatment. Both regression estimators use a quadratic in distance that is set to 0 at a distance of max . The restricted estimator further restricts the regression coefficients such that function is continuous at max . The same parametric functional form can be imposed to estimate the average aggregate effects of the treatment. Under the parametric model, the average aggregate treatment effect on the treated is Solving for 1 plugging this into the regression specification above, obtain the one-step regression specification where the coefficient on the first (transformed) covariate is the estimate of the average aggregate treatment effect. The transformed covariates are readily computed by realizing they are equal to the original covariates multiplied or shifted by average covariates. The average here is taken across all regions, both treated and untreated, such that this estimates has similarly attractive properties as the nonparametric estimator^, 2 above, in leveraging that the number of individuals near candidate treatment locations are available irrespective of assignment. When there is more than one candidate treatment location per region, augment the regression approach as follows. The variable is not uniquely defined, since there are multiple "distances from candidate treatment locations" for individuals. Suppose individual in a control region ( ( ) = 0) is 1 mile away from one candidate treatment location and 5 miles away from a different candidate treatment location. Then should be used to estimate the control mean ℎ( ) for the two distances = 1 and = 5. One can therefore duplicate observation . Specifically, if individual is in a region with |S ( ) | candidate treatment locations, then include |S ( ) | times in the regression. Each version of uses the corresponding to a different candidate treatment location. This ensures ( | = , = ) = 0 and hence results in consistent linear regression estimates. The framework and estimators proposed in the previous sections can readily be adapted to variations in the setting that are of empirical relevance, such as panel data and settings with interference. Panel data can serve two distinct purposes in settings with spatial treatments. First, one can use pre-treatment outcomes to reduce the variance of treatment effect estimators. Second, with panel data one can base identification of causal effects on a "parallel trends" assumption that is familiar from difference-in-differences methods. I show that existing empirical work relies on a version of this assumption that is not justified by the ideal experiment, discussed in section 3 above, or other (quasi-) random variation in the location of spatial treatments. Reducing Variance Under the ideal experiment, the nonparametric estimators proposed above are (approximately) unbiased, but may have large variance. The variance may be particularly large if potential outcomes of individuals in different regions are on substantially different levels. Then, for some treatment assignment realizations, treatment is predominantly realized in the regions with large potential outcomes, such that the estimate ex-post overstates the true average treatment effect. Symmetrically, under the inverse of this assignment, treatment is predominantly realized in the regions with small potential outcomes, and the estimate understates the true effect. Ex-ante (on average across treatment assignments) the estimator is unbiased. The large differences between estimates for different treatment assignments imply, however, a large design-based variance of the estimator. If the researcher has a pre-treatment outcome for each individual, she can difference out the different levels of the potential outcomes in different regions. To implement this, simply take the difference between post-treatment and pre-treatment outcome for each individual, − pre , and then use the same estimators as before. This does not substantively affect the approximate unbiasedness of the nonparametric estimators. Denote, for instance, the estimator^( ) with differenced outcomes as^d iff ( ). It follows immediately that where the pre-treatment outcomes, pre , are fixed across treatment assignments. The treated and control weights, ( ( ) , ( ) , ) and ( ( ) , ) mirror the weights in¯( ) and¯( ), respectively. Hence, they are (approximately) equal in expectation by the arguments for unbiasedness of^( ) itself (appendix A.2.4). Hence, the second term is equal to zero in expectation. Subtracting the pre-treatment outcomes from the potential outcomes within the estimand ( ) does not change the estimand at all because the pre-treatment outcomes cancel between the treated potential outcome and the control potential outcome. For the regression estimators, one can alternatively use pre-and post-outcomes as separate observations and include individual fixed effects in the regression for a similar effect. Differencing out the levels of the potential outcomes greatly reduces the variance of the estimator if the temporal persistence in potential outcomes is large. This is most easily seen in the case with one candidate treatment location per region. Then subtracting the pre-treatment outcomes affects the marginal variances¯( ) ( ) and¯( 0) ( ). I recommend using the same formula for the variance estimator as before, but applied to the differenced outcomes, to obtain an estimate of the variance of^d iff ( ). Loosely speaking, 12 the variance of^d iff ( ) is smaller if the differenced outcomes have smaller marginal variance: var( ( ) − pre ) < var( ( )) and var( (0) − pre ) < var( (0)). Hence, using the differenced outcomes is likely to reduce the variance if the coefficients in (population) regressions of post-treatment potential outcomes on pre-treatment outcomes is at least 0.5. One can also incorporate multiple pre-periods into this approach for variance reduction. Relying on randomized treatment assignment for identification, additional pre-periods can be useful for differencing out the levels of the potential outcomes more precisely. Specifically, if there are period specific unobservable components affecting outcomes, averaging over outcomes from multiple pre-periods may provide a more precise estimate of the level of control potential outcomes in the post-period. Since the target of estimation is the postperiod effect, however, it is attractive to give greater weight to pre-periods that are closer in time to the post-period. Intuitively, the goal is to use the pre-period outcomes to make a one-step-ahead forecast of post (0). Small adjustments have to be made if the pre-period data is for different individuals than those observed in the post-period. Since the individuals are distinct, there is no single pre for a post-period individual . Instead, the goal is to deterministically construct an estimate^p re based on pre-period outcomes of individuals with locations near the location of , . One can then use the same estimators as before with the transformed outcomes −^p re . If the construction of^p re does not depend on treatment assignment and post-period outcomes, formal results, such as theorem 3, continue to hold for the differenced outcomes. Averaging over pre-period outcomes for different individuals estimates the expected outcome conditional on location. However, it fails to remove individual-specific fixed effects that are not correlated with the location of the individual. Intuitively, the loss due to only having pre-period outcomes for different individuals is greater if individual-specific components are large and individual-time specific "noise" is small. In practice, pre-period outcomes for different individuals remain useful as long as neighbors' outcomes are sufficiently predictive of own outcomes, and neighborhood-level outcomes are sufficiently stable across time. Identification Based on Parallel Trends One can alternatively use the panel structure of the data to rely on a "parallel trends" assumption for identification of causal effects in a difference-in-differences approach. In practice, such an approach uses the same estimators as proposed for variance reduction. Under the ideal experiment, treatment assignment is independent both of the post-period potential outcomes and of trends between the pre-period and the post-period, conditional on the known randomization probabilities. Hence, the experimental setting allows the use of the pre-period data but does not require it, as discussed before. In observational settings, discussed in section 5, using pre-period data augments the assumption underlying identification. Whether treatment assignment is more plausibly conditionally independent of levels or of trends depends on the setting. Existing empirical work oftentimes uses panel data without control regions in which no treatment occurred (for instance Linden and Rockoff, 2008; Currie et al., 2015; Diamond and McQuade, 2019) . Instead, these papers compare individuals near a treated location to individuals farther away from the same treated locations. These farther-away individuals are the control group in a difference-in-difference setup. Figure 5 illustrates which individuals these estimators are based on. When estimating the treatment effect ( ) at distance , individuals in an "inner ring" at radius from realized treatment constitute the treatment group. Individuals who are substantially farther away in an "outer ring" around the same realized treatment location constitute the control group. Typically, the same outer ring individuals serve as control units irrespective of the distance at which the treatment effect is estimated. In a difference-in-differences setup, estimators used in much existing empirical work hence rely on a different parallel trends assumption. Specifically, individuals on the inner ring need to be on the same trend as individuals on the outer ring. For each distance for which the researcher estimates an effect, she obtains a different set of inner ring individuals. When the effect at each distance is estimated using the same outer ring individuals, she therefore needs to assume that individuals at any distance from treatment (up to the farther distance) are on parallel trends. Effectively, this is the semi-parametric functional form assumption that control potential outcomes in all neighborhoods within a region share the same additively separable time-specific component. For these existing estimator, one additionally needs to assume that the farther-way individuals are unaffected by the treatment. If individuals on the outer ring were directly affected, their outcomes in the post-period would not generally reflect the control potential outcomes of individuals on inner rings even when the parallel trend assumption holds. Researchers therefore typically restrict the control group to individuals who are substantially farther away from treatment than the treated individuals. However, this assumption is partly incongruent with the parallel trends assumption: The farther the control individuals are Figure 5 : Existing estimators focus only on regions that received treatment. In this figure, the realized treatment location is shown as filled-in triangle. The treatment group consists of individuals in an "inner ring" at a given distance of interest from treatment, here displayed as small filled-in circles. The control group consists of individuals in an "outer ring" who are farther away from realized treatment, here displayed as hollow squares. Existing estimators use pre-and post-treatment data for both groups in a difference-in-differences setup. Typically, when researchers estimate the effect at multiple distances, the same control group is used for all distances. Panel a shows the estimator^( short ) for a short distance short . Panel b shows the estimator^( med ) for a medium distance med . away from treatment, the less likely the parallel trends assumption is to hold. The choice of distance for the outer ring needs to carefully balance these two competing assumptions. As with other difference-in-differences estimators, demonstrating an absence of pre-trends can strengthen the credibility of the parallel trends assumption. For inner vs. outer ring estimators, researchers need to argue that the absence of pre-trends suggests parallel trends even into the post-treatment period based on randomness of timing, not randomness of treatment locations. When treatment effects at multiple distances are estimated, control potential outcomes at each distance from treatment must be on parallel trends with one another and with the outer ring control individuals. For instance, (Diamond and McQuade, 2019, figures 3, 4, 5) illustrate the absence of pre-trends in plots of three-dimensional data (time since treatment, geographic distance from treatment, outcome). In contrast to the more familiar two-dimensional plots from non-spatial settings, it is unfortunately challenging to include standard errors in such figures. It is therefore oftentimes difficult to accurately assess the magnitude and sometimes even direction of possible pre-trends visually. I therefore recommend formal sensitivity analysis and estimation of the partially identified set of treatment effects under small violations of the parallel trends assumption. Recent theoretical work has proposed promising approaches to this problem for non-spatial settings that likely extend to the setting of spatial treatments (for instance Manski and Pepper, 2018; Freyaldenhoven et al., 2019; Rambachan and Roth, 2019) . In settings with panel data, existing estimators and the estimators proposed in this paper both compare individuals near realized treatment to individuals farther away. For existing estimators, far-away individuals are in an outer ring around the realized treatment locations. For the estimators proposed in this paper, far-away individuals are in other, untreated, regions, near candidate treatment locations that appear similar to real treatment locations. However, the assumption for estimators using individuals on an outer ring as a control group is not generally justified by an ideal experiment of randomizing treatment locations. Suppose the researcher is interested in the effect of the treatment at some distance . If individuals on the outer ring were the proper control individuals under an ideal experiment, then for each individual on the outer ring, there must be at least one candidate treatment location at distance from the individual. Similarly, there must be candidate treatment locations such that individuals who are at distance from the realized treatment would be in the outer ring relative to these locations. This suggests that candidate treatment locations are everywhere and realized with equal probability, as in panel (c) of figure 2. This assumption is, in general, testable and typically violated in treatments that are of interest to social scientist; an example is given below. Hence, these estimators are generally based on functional form assumptions such as additive separability of time-specific effects, rather than on an ideal experiment that involves randomized treatment locations. A first example illustrates when existing estimators based on an outer ring are relatively more attractive. Suppose the researcher is interested in the effect of a spatial treatment at a distance of 0.1 miles, and knows that individuals at a distance of 0.3 miles are unaffected by it. In this setting, individuals at either distance from treatment are likely similar. They live in the same neighborhood and experience the same conditions except for exposure to treatment. The parallel trends assumptions between inner and outer ring individuals is plausible. Instead, one should primarily focus on supporting the argument that the treatment has no effect after a distance of 0.3 miles. A second example illustrates when the estimators proposed in this paper based on untreated regions are relatively more attractive. Suppose the researcher is interested in the effect of a spatial treatment that is typically realized in the city center. She is only willing to assume that the treatment has no effect after a distance of more than 10 miles. Then a comparison of individuals close to the treatment to individuals farther than 10 miles from treatment may compare individuals who live close to the city center to individuals living in suburban neighborhoods. The parallel trends assumption between these individuals is less plausible. Since treatment is typically realized in city centers, simple tests are likely to reject Pr(treatment at | in rural area). Instead, it may be more attractive to compare the individuals near treatment to individuals who live close to the city center of other, untreated, cities. With panel data, one can then assume that the inner city neighborhoods of treated and untreated cities are on parallel trends. If one can argue that the location of the treatment was chosen (quasi-) randomly from a set of candidate treatment locations across multiple cities, the assumption is satisfied by design. The estimators proposed in this paper allow this information to be used directly for identification and estimation of causal effects. In this section, I discuss two assumptions on how realized treatment locations that are close to one another interfere. Under either of these assumptions, average treatment effects very similar to those in section 3 are identified and readily estimated. The two assumptions I focus on in this section are: (i) treatment locations have additively separable effects; and (ii) only the nearest realized treatment location matters. Additively separable treatment effects are an appropriate specification if the effect of each treatment is independent of the realization of other treatments. For instance, the effects of toxic waste plants (cf. Currie et al., 2015) or air-polluting power plants (cf. Zigler and Papadogeorgou, 2018) on exposure to pollution are likely approximately additive. Typically, only the nearest realized treatment location matters if individuals only access, or visit, a single realized treatment location. For instance, if a developing country quasi-randomly chooses locations to construct new schools (cf. Duflo, 2001) , it may be plausible to assume that only the nearest school matters to an individual. For the effects of infrastructure projects, such as additional bus or subway stops, on commute times and real estate prices (cf. Gupta et al., 2020) , the appropriate assumption may depend on the type of stops that are added. An additive effects specification for bus or subway stops may be a good approximation if each stop gives access to a different transit line. A specification where only the nearest stop matters may be more appropriate for stops of the same line. In contrast, if the treatments interact in some way leading to diminishing or increasing returns in the number of nearby treatment locations, different parametric assumptions on the functional form of these returns may be necessary. This section serves as an example for how to incorporate such assumptions on interference into the analysis of causal effects. I focus one two settings: A first setting where if a region is treated, a fixed number of candidate locations in the region are realized (completely randomized design within region), and data from untreated regions are available. A second setting where treatment assignment to candidate locations is independent (Bernoulli trials), but all data come from a single (contiguous) region. Suppose that if region receives treatment, exactly˜of the |S | candidate treatment locations are realized, each with equal marginal probability. Assuming a completely randomized experiment between the candidate treatment locations within a region greatly simplifies the formulas in this section without mechanically resolving key conceptual issues. In practice, it is sometimes more plausible to assume that the assignment mechanism guarantees some minimum distance between realized treatment locations. 13 It may be possible to obtain analogous results for such more complicated assignment mechanisms. Continue to consider a setting with some regions with no realized treatment location. The presence of regions without realized treatment locations is a crucial simplification because it allows identification of control potential outcomes. Within this setting, one can see how the assumptions on treatment effects limit interference and allow estimation average treatment effects similar to those in section 3. To give an example, suppose a company operating chain stores (quasi-) randomly chooses which cities to enter, and opens multiple stores in chosen cities. Then there are multiple realized treatment locations close to one another (in the same city), but also control regions with (unrealized) candidate treatment locations. I discuss a settings without untreated regions further below. Even in settings with control regions, one needs to make an assumption on interference to identify and estimate the treatment effects as defined in section 3.1.1. Suppose one makes no such assumption. If some treated regions have multiple, for instance two, realized treatment Figure 6 : An example of a region with three candidate treatment locations (panel a): 1 (blue), 2 (red), 3 (yellow). Suppose exactly two of these treatment locations are realized whenever the region is treated, such that there is interference. Under the assumption that only the nearest realized treatment location matters, panel b illustrates the locations for which we can estimate effects for individuals in each area. For individuals in the orange area, we can estimate the effects of the red and yellow locations. For individuals in the green area, we can estimate the effects of the blue and yellow locations. For individuals in the purple area, we can estimate the effects of the blue and read locations. locations, then it is impossible to identify the average (across all candidate locations) causal effect of implementing one treatment location. But even the effect of implementing multiple treatment locations at once is difficult to estimate in the detail of interest. Presumably, one would be interested in the average effect of implementing two treatment locations at distance 1 and 2 , respectively. A non-parametric estimate of this effect is likely based on very few individuals, since few treated individuals are at distance 1 from one treatment and at distance 2 from another treatment. For a given pair of realized treatment locations, there are at most two locations where circles around them with radii 1 and 2 intersect. Limiting estimation to only individuals residing close to such intersection points is therefore oftentimes impractical. It implies a dramatic reduction in sample size relative to estimating the effect of a single treatment at a given distance, based on all individuals around this distance ring. If there are more than two realized treatment locations, or treated regions vary in the number of realized treatment locations, this estimation issue worsens. A simple example, illustrated by figure 6a, helps to build intuition for the estimators proposed below. Suppose the researcher has data from multiple regions . Each region has three candidate treatment locations; ,1 , ,2 , and ,3 . If region receives treatment, the assignment mechanism randomly chooses exactly two of the three candidate locations to be realized. Hence, each candidate location has marginal conditional probability of 2/3 of being realized. The set of realized treatment locations in region , , satisfies ∈ I present and discuss two assumptions on interference and how to estimate effects under them in turn. Assumption 5 (Additive Separability of Treatment Effects). Let ⊂ S be an arbitrary subset of the candidate treatment locations, and let ∈ be an arbitrary location in this subset. The effects of spatial treatments are additively separable if, for all individuals ∈ I, ( ) = ( ∖ ) + ( ). Assumption 5 formally states that the effects of all treatment locations are additively separable. Intuitively, the assumption requires that there are no diminishing (or increasing) returns to having additional treatment locations nearby. Under assumption 5, one can still identify the average treatment effects defined in section 3. These estimands are weighted averages of individual-level treatment effects ( ) of individual and candidate treatment location which are distance apart. For exposition, I focus on the example with three candidate treatment locations, two of which are realized in treated regions. Under the additive separability, assumption 5, the potential outcomes satisfy Hence the treatment effect of interest is where each of the potential outcomes has positive probability of realization. One can therefore estimate ( 1 ) aŝ additive ( ,1 ) = 1 2 Each term in the sum is an unbiased estimator of the the corresponding potential outcome, such that (^a dditive ( ,1 )) = ( ,1 ). One can then average such estimators to estimate, for instance, the ATT estimand, ( ). See appendix A.3 for a generalization. Assumption 6 (Only Nearest Realized Treatment Location Matters). Let ⊂ S be an arbitrary subset of the candidate treatment locations, and ∈ I an arbitrary individual. Only the nearest realized treatment location matters if whenever ∈ satisfies ( , ) ≤ ( ′ , ) for all ′ ∈ , we have, for ′ ∈ ∖ { }, Assumption 6 states that if ′ ∈ is not the nearest realized location to individual , it does not affect her. The assumption also implies that if individual is at equal distance to two treatment locations 1 and 2 , then both have the same effect on her: Under assumption 6, only some of the average treatment effects of section 3 are nonparametrically identified in general. Specifically, it is impossible to identify the effect of a candidate treatment location on an individual if the treatment location is never the nearest realized location for the individual. Consider the example of three candidate locations with two realized locations. The effect of location ,1 is unidentified for individuals nearer to both locations ,2 and ,3 . Panel b of figure 6 highlights areas in which each candidate treatment location is nearest with positive probability before realization of treatment assignment. Generally, the estimand ( ) from section 3 is identified nonparametrically under assumption 6 if it only places weight on individual level effects ( ) if is the nearest realized location to individual with positive probability. Formally, write this as where the probability is taken over draws from the assignment distribution of for fixed , , , and . In the example illustrated in figure 6 , one can estimate ( ,1 ) for individuals in the purple and green shaded areas. Under assumption 6, the potential outcomes satisfy An unbiased estimator of ( ,1 ) iŝ One can then average estimates^n earest ( ) across individuals and locations to estimate average treatment effects similar to those in section 3. However, the estimator^n earest ( ,1 ) is undefined for individuals in the orange shaded area (last line of the definition). Since location ,1 is never the nearest realized treatment location for these individuals, it is impossible to estimate its effect on individuals in that area. That is, only average treatment effects that place weight ( ,1 , ) = 0 on individuals in the orange area are identified nonparametrically. See appendix A.3 for examples of identified estimands and the general setting. Suppose the researcher has data on a single contiguous region with individuals , outcomes and candidate treatment locations S. The realized treatment locations are ⊆ S, with assignment to locations , ′ ∈ S independent when ̸ = ′ . Assumption 7 formalizes this assumption, which is a straightforward extension of assumption 3 above. Assumption 7 (Independent Treatment Assignment -Single Region). Assignment of treatment to locations is independent: For ∈ S,˜⊆ S with ̸ ∈˜⊆ S, As before, the researcher is interested in the weighted average treatment effect with known weights ( , ), for instance for a distance-bin estimator of the average effect of the treatment on the treated. To estimate the average this average treatment effect, consider the estimator which is the difference in average outcomes for individuals near realized candidate locations, Remark 14. While all treatment probabilities are known to the experimenter in experimental analyses, they typically need to be estimated in observational studies. To this end, first note that one can generally write, for any potential assignment ∈ 2 S , Second, Pr( ⊂ ) Pr( = ) cancels between numerator and denominator. Hence, to estimate Pr( ∈ | ⊂ ) in practice, it is convenient to parameterize this conditional probability and estimate where ( ) are other (spatial) covariates specific to treatment location (and its neighborhood). The framework, estimators, and analysis of this paper are applicable more generally to settings where treatment assignment is separate from the units of observation, and the effect of treatment is moderated by some observable, not necessarily geographic, distance from treatment. I give two examples in this section: firm entry in markets with differentiated products, and shift-share designs based on randomness of the shifts. For the first example, suppose the researcher is interested in the effects of firm entry on competition in markets with differentiated products. She has data for several markets on prices charged by firms ∈ I for products with horizontal or vertical locations in characteristics space. In some markets, a new firm enters with a product with characteristics . Here, the estimand ( ) measures the average effect an entrant has on the price of a product at distance in characteristics space. For short distances , it captures competitive effects or deterrence behavior by firms selling products very similar to the product of the entrant. For longer distances , it captures ripple effects that arise if in equilibrium firms with more different products react to the price changes of firms with products similar to the entrant's. These estimands are therefore informative about the nature of competition. Firm entry, however, is not generally random. Theoretical models of competition and profits may therefore help to determine the probability of firm entry at any given point in characteristics space, conditional on the locations of existing competitors in characteristic space. For instance, expected profits of the entrant may come from a structural model based on distance to competitors in characteristics space (cf. Hotelling, 1929) , perhaps calibrated to pre-treatment data. Intuitively, validity of the estimator then requires that firm entry is random conditional on the expected profitability in the model. The structural model provides a baseline to enhance the credibility of the quasi-experimental analysis, but does not directly restrict the estimated pattern of competition. For a second example, suppose the researcher is interested in the causal effects of exogenous shocks to individual industries on employment outcomes in cities based on their industry mixes (cf. Autor et al., 2013) . The framework of this paper is useful in this setting if the claims of causal identification are based on randomness in which industries are shocked, rather than on randomness in industry composition. Importantly, the analysis in this paper reflects that cities with similar industry mixes are shocked similarly, in a way that is difficult to capture accurately with existing clustered standard errors. Adao et al. (2019) and Borusyak et al. (2019) develop alternative approaches based on the same idea, and show how it relates to Bartik (1991) and shift-share instruments more generally. A benefit of the framework of this paper is that its results are not specific to linear (or other) functional form and that it allows for very transparent estimation of aggregate effects. The setting fits into the framework of this paper as follows. Data are available for time periods = 1, . . . , . In some time periods, a single industry ∈ S = {1, . . . , } receives an exogenous shock, potentially with different industries shocked in different time periods. Assume that the time periods are chosen such that the shock only affects outcomes within the same time period. Define the indicator = 1 if an industry in period is shocked, and = 0 otherwise. The researcher observes employment outcomes for cities ∈ I in time period . City has industry shares ∈ R , satisfying , ∈ [0, 1] and ∑︀ =1 , = 1. Here, the distance function captures exposure to the shock. City is heavily exposed to shocks of sector if industry has large share , , such that the "distance" ( , ) = 1 − , is small between industry and city . The estimands ( ) and measure the effects of the exogenous industry shocks. For = 0, the estimand ( ) measures the average effect of shocking an industry on employment in cities with employment only in the shocked industry. For = 0.75, the estimand measures the average effect of the exogenous shock on cities with 25% of their employment in the shocked industry. The estimand measures the aggregate effect of the exogenous shock across all cities, averaged across shocks to different sectors. The estimators and inference procedures of section 3 are valid if it is random in which time periods and sectors an exogenous shock occurs. In principle, one can augment the variance calculations to allow, for instance, dependence structure in the shocked industry across time periods. The results in section 4.2 are relevant for settings where shocks occur to multiple industries in the same time period. While the previous sections presumed that the researcher designed the experiment for assignment of the spatial treatment, much empirical work relies on observational data. The primary challenge to observational studies in this setting is that researchers typically do not observe the exact locations of unrealized candidate treatment locations. To emulate the analysis of the ideal experiment with observational data, researchers need to estimate candidate treatment locations and their treatment probabilities. Estimation is then based on an unconfoundedness assumption stating that among individuals near candidate treatment locations, whether their treatment location is realized is as good as random, conditional on characteristics of the individuals and the neighborhood of the candidate treatment location. Suppose that there are multiple regions = 1, . . . , , defined such that any treatment location only affects individuals within the same region. Define the location-specific treatment indicator ( ) to equal 1 if location in region is treated, and −1 if the location is no treated: where = 1 if treatment occurs somewhere in region , and is the realized treatment location in region as in section 3. In treatment effect settings with individual-level randomized experiments, unconfoundedness is often written as ⊥ ⊥ (0), (1) | = which is equivalent to an assumption on densities I similarly define unconfoundedness of spatial treatments at distance from location as Here, = means that the sets of individual covariates are the same up to permutation. In practice, it is rarely feasible to find two candidate locations with equal number of individuals and equal covariates. Instead, one can assume that treatment is unconfounded conditional on, for instance, average characteristics of individuals in the neighborhoods of candidate locations. Such an assumption greatly simplifies estimation in practice. Alternatively, one can make an individual-level unconfoundedness assumption for spatial treatments as a conditional mean equality, where, for the control potential outcome, there is no conditioning on distance from candidate treatment locations. In other words, individuals with the same covariates , potentially including neighborhood characteristics of , in control regions offer a valid comparison for the individuals treated at distance . Such an assumption simplifies estimation, but is not justified by experimental design or arguments that the location of the treatment is as good as random. In this section, I outline a general strategy for finding unrealized candidate treatment locations with observational data. These counterfactual locations for the treatment are necessary for the quasi-experimental methods I propose in this paper. Consider first the example of Linden and Rockoff (2008) given in the introduction, where the choice of candidate locations is relatively straightforward. They argue that the exact houses where sex offenders move in are as good as random due to random availability of houses within neighborhoods. Here, the candidate treatment locations are houses in these neighborhoods. Hence, the candidate locations are known, but their probabilities of treatment need to be estimated. See section 5.3 for propensity score estimation. When there are no (or insufficiently many) known unrealized candidate locations, however, the problem of choosing candidate locations from continuous space is hard. In principle, one could imagine estimating the probability of treatment at any location in a region conditional on all the features of the region. This is akin to estimating the spatial distribution of treatment locations ∼ ( ), where are the characteristics of region , potentially relative locations of all individuals in the region as well as moments of their covariates. One could then use the estimated^to inform the treatment probabilities at each point in the region as inputs in the estimators proposed in this paper. In practice, it is typically sufficient to find a finite number of candidate treatment locations that offer a plausible counterfactual to the realized treatment locations. Computationally, it is often infeasible to use a continuous distribution , since the weight of each individual when estimating effects at distance would depend on the integral of^along a ring with radius around her location, . Instead, I recommend finding a finite number of candidate locations. The average across these finitely many candidate locations approximates the strategy based on the complete distribution . I propose taking draws ∼^( ) to obtain candidate treatment locations. Perhaps surprisingly, recent machine learning methods achieve good results at this task, despite the difficulty of estimating itself. Specifically, I recommend a formulation similar to generative adversarial networks (Goodfellow et al., 2014) ; see Liang (2018) and Singh et al. (2018) on the relationship between generative adversarial networks and density estimation. Most closely related to this paper, use generative adversarial networks to draw artificial observations from the distribution that generated the (real) sample, for use in Monte Carlo simulations. Generative adversarial methods for drawing ∼ ( ) are based on iteration between two steps. First, a generator generates draws˜∼˜( ), where˜is an implicit estimate of the density maintained by the generator in the current iteration. Second, a discriminator receives as input either counterfactual locations proposed by the generator,˜| , or real treatment locations, | , and guesses whether its input is real. Both the generator and the discriminator are highly flexible parametric models for their given tasks. The discriminator is trained by taking (stochastic) gradient descent steps in the direction that improves discrimination between real and counterfactual locations. The generator is trained by taking (stochastic) gradient descent steps in the direction that leads to fooling the discriminator into classifying counterfactual locations as real. Effectively, the output of such models are counterfactual candidate treatment locations | that are indistinguishable (to the discriminator) from real treatment locations | . With a sufficiently flexible discriminator, the process is therefore similar to matching. 14 If a proposed candidate location˜is noticeably different from all real treatment locations , a flexible discriminator will learn to reject˜. In contrast, synthetic control-type methods (Abadie et al., 2010) would average multiple candidate locations, for instance˜and˜, to create a synthetic counterfactual for a real treatment location . If˜and˜individually differ from all real treatment locations , the discriminator will reject them despite their average resembling . The goal therefore is to find "false positives:" Occasions when the classifier suspects a missing realized location even though there is no such missing realized location. Typical classifier networks do not directly make binary predictions, but give a continuous activation score that indicates how likely each location (or the "no missing location" category, see below) is. 15 In practice, I recommend looking for high activation scores for a particular location and low activation score for "no missing location." Alternatively, one can look directly for activation scores resembling the activation scores of real treatment locations. Such locations are likely to be decent matches for the real treatment locations, since they must share features of realized locations in order to achieve high activation scores. In the remainder of this section, I discuss how to tune generic machine learning methods to find suitable candidate treatment locations in social science applications. I recommend four high-level implementation decisions in adapting these methods. First, discretization of geographic space into a fine grid for tractability. Second, convolutional neural networks capture the idea that spatial neighborhoods matter in a parsimonious way. Third, incorporating the adversarial task of the discriminator into a classification task for the generator greatly simplifies training. Fourth, data augmentation (rotation, mirroring, shifting) for settings where absolute locations and orientation are irrelevant. Discretization To tractably summarize the relative spatial locations of individuals and treatment locations, I recommend discretizing geographic space into a fine grid. Discretization provides an approximation that is particularly tractable for the convolutional neural networks recommended below. In principle, future improvements to, for instance, Capsule Neural Networks (Hinton et al., 2011) or other novel methods, may replace this as the preferred architecture and eliminate the need for discretization. For each grid cell, one can include a count of individuals with residence in the cell, potentially separately for individuals with different values of covariates, as well as average covariate values of the individuals in the cell or other moments of their covariates. If the grid is very fine, this discretization retains almost all meaningful information about relative locations. For instance, in the application of this paper, each grid cell has size 0.025mi × 0.025mi (approximately 40m × 40m). The discretized grid creates a three-dimensional array: The first two dimensions determine spatial location, and the third dimension enumerates the different covariates that are summarized. Rather than taking the spatial dimensions to be entire regions, I recommend using square cutouts of regions such that the probability of treatment in the center of the cutout is only affected by individuals and covariates within the cutout. Convolutional neural networks Convolutional neural networks (cf. Krizhevsky et al., 2012) have been particularly successful at image recognition tasks. In image recognition tasks, the input is a 3D array: a 2D grid of pixels, with multiple layers corresponding to the RGB color channels. For spatial treatments, the input also is a 3D array: the 2D spatial grid with layers corresponding to different covariates as described above. Convolutional steps in neural networks generally retain the shape of the 2D grid, but the value of each neuron is a function of the covariates (or neurons) of the previous step not just at the same grid cell, but also the covariates (or neurons) at neighboring grid cells. Figure 7 illustrates this aspect of the convolution operation. However, the particular way in which the neighborhood of a grid cell is averaged is the same for any point in the grid. This makes convolutional layers substantially more parsimonious than fully connected layers, and allows the neural network to capture neighborhood patterns appearing in different parts of a region in a unified way. In particular, I recommend using at least two convolutions with reasonably large spatial reach. Consider the application in this paper, where grocery stores are spatial treatments and restaurants are individuals with foot-traffic to the restaurants as the outcome. The first convolution allows each grid cell to see what other cells are around it. In the example, the output of the first convolution for a particular grid cell may be: "There are 3 grocery stores nearby, 4 competing restaurants very close, and 10 restaurants within walking distance." The second convolution then uses the information on such neighborhoods to determine whether treatment is likely in a grid cell: "If there are many grid cells nearby (in all directions) containing restaurants or grocery stores facing much competition, this location is probably in center of a shopping area and reasonably likely to contain another grocery store." Intuitively, the first convolution may measure what is important to the restaurants, while the second Figure 7 : Convolutions in a neural network allow the prediction of a candidate location in a grid cell to depend on the characteristics of neighboring grid cells (up to a user-specified distance). These models remain parsimonious by requiring the same "neighborhood scan" to be performed for each grid cell. convolution translates how that is important for the treatment location choice, mirroring the unconfoundedness assumption (equation 13) of the previous section. Adversarial Classification Generative adversarial networks are oftentimes difficult to train despite recent advances such as networks with Wasserstein-type criterion function . The difficulty arises because the training of generator and discriminator needs to be sufficiently balanced such that both improve. In contrast, convolutional neural networks for image classification are much easier to train. I therefore recommend to set up the problem of finding candidate treatment locations as a classification task. Specifically, the convolutional neural network takes a given input array and "classifies" it into, say, 100 categories, where each category corresponds to a grid cell and signifies that there should be an additional treatment location at that point in the grid. To retain the adversarial nature of the task, train the classification on three sets of data: First, regions with at least one real treatment location, but with one treatment location removed. The correct classification of such region data is into the category corresponding to the grid cell from which the treatment location was removed. Second, regions with at least one real treatment location, but without any treatment location removed. The correct classification of such region data is into a specially added category signifying no missing treatment location. Third, regions without treatment locations. These are also classified as not missing any treatment location. The neural network then balances two tasks: a generative task of picking the correct location if a treatment location is missing, and a discriminatory task of deciding whether a treatment location is missing at all. This structure retains the attractive interpretation of generative adversarial networks, but is substantially easier to train. Technically, it resembles denoising autoencoders (cf. Vincent et al., 2008) . Data Augmentation Data augmentation serves two closely related purposes. First, rotating, mirroring, and shifting regions, while maintaining relative distances, produces additional, albeit dependent, observations. This is helpful since training neural networks requires large numbers of training samples. Second, these transformations effectively regularize the parameters of the estimated model. One can choose transformation that induce equivariance to rotation, mirroring, and shifts as appropriate for the particular setting. For instance, in many applications in the social sciences, North-South and East-West orientation is irrelevant on a small scale; only the relative distances matter. 16 In particular, suppose an individual who visits a business to the North of her home because it is on the way to work in the North. If the whole space was rotated, the individual equally visits the same business now to the West as it is still on the way to work, now also rotated to be to the West of her home. In image classification, the use of similar data augmentation is common and often associated with greater generalizability of the learned models. Shifting the entire grid has two further desirable effects: First, if one imposes a continuous shift of relative coordinates in combination with a fixed grid, the exact discretization becomes less relevant. The average (across draws from the shift distribution) distance in grid cells between two observations becomes directly proportional to their actual distance. Second, the location of an observation within a grid cell is no longer fixed. This is attractive because the classification is not actually informative of whether the candidate treatment location is at the center or towards the edge of a grid cell. With a continuous shift of the observations, the center of the grid cell points to different absolute locations depending on the shift. One can then average over several realizations of the shift to reduce the influence of the particular translation of grid cell to absolute location. There are at least two notable alternatives or complements to data augmentation in the machine learning literature. First, spatial transformer networks (Jaderberg et al., 2015) attempt to estimate a rotation or other transformation that makes the subsequent classification task as easy as possible. Second, some recent work considers imposing the desired in-and equivariance properties on the convolution kernel. Similarly, penalization of deviations from in-or equivariance serves as a less strict regularization of the model parameters. Ultimately, current implementations of these methods are less computationally efficient than data augmentation and standard convolutional neural networks. Furthermore, simulation evidence suggests that data augmentation achieves the first order gains implied by these properties. One can also inspect the models to assess the implied degree of invariance, and consider averaging parameters as implied by invariance. Suppose candidate treatment locations S are known (in all regions), for instance as output of the convolutional neural network classification tast described in the previous section. The remaining challenge in implementing the methods proposed in this paper is the estimation of the "propensity score" Pr( ∈ ). I briefly sketch propensity score estimation in two canonical settings: a fixed number of realized treatment locations per treated region (often just one realized location), and independent Bernoulli trials determining realization of treatment at candidate locations. Fixed Number of Realized Treatment Locations Suppose there are a fixed number of realized treatment locations per treated region. Then the problem of propensity score estimation resembles discrete choice modeling: There are |S | discrete alternatives in region , a fixed number of which is realized. See, for instance, Greene (2009) for an overview of estimation methods. When treatment assignment is independent across locations, propensity score estimation for spatial treatments is similar to propensity score estimation for individual-level treatments. Logistic regression is a simple option. Each candidate treatment location ∈ S is a separate observation. With logistic regression, regress the indicator 1{ ∈ } on covariates ( ) that describe the neighborhood of candidate location as well as (moments of) the characteristics of individuals near location , ( ) : ( , )= for all distances of interest . Adjusting for the true propensity score is likely sufficient for unconfoundedness in equation 13, similar to the setting with individual-level treatments (cf. Rosenbaum and Rubin, 1983) . Using Estimated Propensity Scores In observational studies, the propensity score is typically estimated by the methods above rather than known. Even when the propensity score is known, there may be benefits from using estimated propensity scores for parts of the analysis as in experiments with individual-level treatments (cf. Hahn, 1998; Hirano et al., 2003; Frölich, 2004b) . When estimated propensity scores are close to 0 or 1, the inverse propensity score weighting estimators proposed in this paper may perform poorly (cf. Frölich, 2004a; Busso et al., 2014) because small estimation errors in the propensity scores have large effects on the weights when denominators are close to zero. To reduce the effect of estimation error from this first-stage estimation, I also use cross-fitting and a doubly-robust moment condition (e.g. Chernozhukov et al., 2018) in the application of this paper. While existing results assuming i.i.d. data are not directly applicable to the spatial treatment setting, doubly-robust moments likely still substantively reduce the effect of error due to propensity score (and outcome model) estimation. treatments, restaurants are the (outcome) individuals, and foot-traffic (the number of customers) is the outcome of interest. I argue that the inner ring vs. outer ring comparison used in many recent empirical studies is unattractive in this setting: Its identifying assumption is not credible, and it requires discarding the majority of the sample for practical reasons. I show how to implement the methods proposed in this paper, and argue that the control groups these methods are based on are preferable to outer ring control groups. The average treatment effect of interest is identified by an ideal experiment where some grocery store locations are randomly closed during COVID-19 lockdowns. Specifically, take a restaurant near a grocery store at location . What is the difference between the number of customers of restaurant during the COVID-19 lockdown when there is a grocery store at location , and the number of customers of restaurant if there was no grocery store at location , holding fixed the locations of all other businesses and grocery stores. In the notation of this paper, if are the locations of other grocery stores, the treatment effect of interest is ( ) = ( ∪ { }) − ( ). This effect is distinct from fixing a spatial location near a grocery store, and considering the difference in the outcome (during COVID-19) of the business that exists at this point in space when there is a grocery store nearby, and the outcome (also during COVID-19) of the, possibly different, business that would have been at the same location, had there never been a grocery store nearby. Grocery stores may have causal effects on the number of customers to nearby restaurants if they draw customers into the shopping and business area. In particular during the first few weeks of the COVID-19 lockdowns, when individual mobility was greatly reduced, getting groceries may have been one of the few trips still made. If grocery store customers are more likely to stop by coffee shops or restaurants for pick-up orders right before or after getting groceries, restaurants and similar businesses may receive more foot-traffic if there is a grocery store nearby. Large department stores serving as "anchor stores" of shopping malls may play a similar role in normal times. Relatedly, Jia (2008) studies the effects of new Wal-Mart stores on existing businesses. study the effect of restaurant closings on nearby restaurants. The effects of grocery stores on nearby restaurants are informative about several questions. Do grocery stores have (positive) externalities on other businesses? If so, should mall operators subsidize grocery stores through lower rent such that they internalize these externalities, to support other businesses in the mall? In the context of pandemics, are grocery stores likely choke points leading to bunching of customers at nearby restaurants instead of spreading out across all restaurants, increasing the risk of infections? Alternatively, grocery stores may resolve a coordination problem: Suppose that the overall reduced number of restaurant customers is insufficient to operate restaurants profitably or with reduced loss when spread across all restaurants. Grocery stores may then help to resolve a coordination problem between restaurants, by focusing potential restaurant customers on the nearby restaurants. I use Safegraph 17 data on the number of customers of each business in the week starting April 13, 2020. I restrict the sample to businesses in the area between San Francisco and San Jose in the San Francisco Bay Area, as highlighted in figure 8. Restricting to businesses with The outcome of interest is the inverse hyperbolic sine of visits to restaurants, with visits as measured by Safegraph. 19 To interpret the percentage point effect on the number of Safegraphtracked customers as the overall effect, assume that Safegraph's sample selection is orthogonal to the presence and absence of grocery stores. Otherwise, the estimates retain internal validity as the effects on the number of Safegraph-tracked customers to these restaurants. The inverse hyperbolic sine allows for zero visits, and effects on it can be transformed into elasticity estimates similar to log( ) or log( + 1) specifications (see Bellemare and Wichman, 2020 , for a discussion). 20 18 Businesses with fewer customers may also be open. However, grocery stores with few if any customers tracked by Safegraph are unlikely to have effects on the number of Safegraph-tracked customers to nearby restaurants. 19 Safegraph (2019) describes the algorithm used for attributing visits to businesses. Generally, pick-up orders as well as outside dining are likely picked up by the algorithm as long as a customer's smartphone sends location data at the point of interest for more than one minute. For errors in attribution to matter in the application of this paper, they need to correlate with the presence or absence of nearby grocery stores. 20 The inverse hyperbolic sine is defined as arcsinh( ) ≡ ln( + √︀ 2 + 1). Hence arcsinh(0) = 0, arcsinh(1) ≈ 0.9, arcsinh(2) = 1.4, and arcsinh( ) ≈ ln( ) + 0.7 if ≥ 3. Figure 9 : The comparison of businesses on an inner vs. outer ring around a particular grocery. The grocery store is marked by an orange triangle in the center of the figure. Other businesses are small blue circles. Businesses on the gray inner ring, at a distance of 0.1 ± 0.025 miles, are primarily in strip malls, while businesses on the gray outer ring, at a distance of 0.25 ± 0.025 miles, are away from these main shopping areas. Figure 9 illustrates why comparisons between observations on an inner ring and observations on an outer ring around a strategically chosen location are often not attractive. Here, businesses (blue circles) on the inner ring are at a distance of 0.1 ± 0.025 miles from the grocery store (orange triangle), while businesses on the outer ring are at a distance of 0.25 ± 0.025 miles from the same grocery store. While inner ring businesses are part of the same strip mall, outer ring businesses are outside of the primary shopping areas. Interpreting differences in outcomes for these two groups of businesses as causal effects requires assuming that outer ring businesses are unaffected by treatment and have similar outcomes as inner ring businesses in the absence of treatment. Generally, distance from treatment often correlates with many other variables (Kelly, 2019) . With small numbers of grocery stores (see below), the mode of average treatment effect estimates may not be close to the true average effect, even if the locations of grocery stores were random. This arises due to spatial correlations in outcomes even in the absence of treatments (cf. Lee and Ogburn, 2020 , in a network setting). While panel data can in principle relax one of the underlying assumptions, the common (visual) test for the absence of pre-trends carries little information about the validity of the identifying assumption in this setting. With panel data, the assumption of comparability of inner and outer ring businesses is relaxed slightly to an assumption of parallel trends. Businesses on inner and outer rings are allowed to have different average levels of customers, but trends in the inverse hyperbolic sine of the number of customers must be parallel. However, even if panel data suggested that trends between inner and outer ring businesses were indeed parallel pre COVID-19, one may question whether this is informative about changes in (potential) outcomes during COVID-19 lockdowns. Given the dramatic decrease in customers for all businesses, it is questionable that this decrease would have occurred in parallel with only an additive shift (in the inverse hyperbolic sine) for inner and outer ring businesses in the absence of treatment. Additionally, the estimand of a difference in differences estimator in this setting is the additional effect of grocery stores on nearby businesses during COVID-19 on top of any effects that may have already existed pre COVID-19. Even if the parallel trends assumption was credible, this estimand differs from the estimand of interest described above. The difference in differences estimand can be negative even though the effect of grocery stores on nearby businesses is positive during COVID-19 if the effect of grocery stores pre COVID-19 was also positive but larger in magnitude, for instance due to overall difference in the scale of the number of customers. Finally, in most instances, businesses on the outer ring around a grocery store are not actually far away from grocery stores ("untreated"), as illustrated by panel (a) of figure 10 . Here, some of the businesses on the outer ring centered around the grocery store in the center of the figure are very close to a second grocery store to the North. Applying the inner vs. outer ring estimator in this setting therefore requires restricting the sample to the neighborhoods of the few grocery stores that are sufficiently far away from other grocery stores. Specifically, to guarantee the absence of interfering grocery stores for an outer ring "no effect" distance of 0.25 miles, only grocery stores with no other grocery store within 2 × 0.25 miles can be used. Panel (b) of figure 10 shows the locations of the remaining 23 grocery stores. Compared to figure 8, these grocery stores are in more remote, less (sub-) urban neighborhoods. While the average treatment effect of grocery stores in such locations may continue to be of interest, it is plausibly distinct from the treatment effect in areas with higher population or business density. Figure 9 shows the comparison of means resulting from the inner vs. outer ring estimation. The average outcome of any distance to treatment (blue curve) is differenced with the average outcome of the outer ring (horizontal gray line), here chosen to be businesses between 0.15 and 0.25 miles from real grocery store locations. Using grocery store fixed effects improves upon these estimates slightly by allowing the weights on the outer ring of each grocery store to vary by distance from treatment. Intuitively, if 10% of all inner ring businesses are at distance 1 from grocery store A, then the outer ring businesses around grocery store A should receive 10% of the aggregate weight of all outer ring businesses for estimating the effect at distance 1 . If the fraction of inner ring businesses that are near grocery store A is different at distance 2 , then also the businesses on the outer ring of grocery store A should on aggregate receive the different weight. Estimates from this fixed effect specification are shown in row 1 of table 1. For row 2, the aggregate weight for businesses near each grocery store are constant at 1/19 (weighting each of the 19 grocery store locations equally), irrespective of the number of businesses near each grocery store, resembling the weighting of the estimand att-eq ( ) and facilitating a comparison of the effect at different distances from treatment. Note that, for the inner ring vs. outer ring estimation, I cannot estimate the effect at a distance of larger than 0.15 miles because I have to assume that there was no treatment effect at that distance to be able to define an outer ring that is not near any grocery store. The spatial experiment estimator based on the ideas proposed in this paper, also shown in table 1, suggests that there indeed likely is no treatment effect past that distance. However, : Panel (a) shows an example of a grocery store (triangle in the center) with a second "interfering" grocery store (triangle towards the top) nearby. Some businesses on the outer ring are close to (treated by) this second grocery store and therefore not a valid control group. Panel (b) shows that restricting the sample to the 23 (out of 199) grocery stores without interference leads to a sample selected heavily towards less business-dense areas compared to the overall sample shown in figure 8. the inner ring vs. outer ring estimator additionally requires that the average outcome at those longer distances is informative about the average outcome at shorter distances. As argued above, figure 9 suggests this assumption is not a particularly good approximation. This application is covered by the framework of section 4.2 for a single contiguous region with independent treatment assignment. The key idea behind identification for the proposed methods is that the location of a grocery store is as good as random between candidate locations with similar numbers and industries of nearby businesses. Figure 12 shows an example of an ideal comparison where the only difference between the (parts of the) regions is the absence of the bottom-most grocery store, and all other relative distances are the same. The approach I propose for observational data proceeds in two steps: First, it finds good "matches" for each grocery store; that is, locations without a grocery store that are similar in terms of the number, types, and relative locations of other businesses and grocery stores. Second, assume the matched data resemble the ideal experiment of randomizing grocery stores between the real and counterfactual candidate treatment locations. I recommend inverse propensity score weighting estimators based on the results of sections 3 and 4.2. Conceptually similar combinations of matching or stratification and propensity score weighting or regression adjustments have been advocated for by Abadie and Imbens (2011) , (Imbens and Rubin, 2015, ch. 17) , and Kellogg et al. (2020) , among others. The grocery store location prediction following section 5.2 discretizes the South Bay region into a fine grid and aggregates characteristics of businesses in each grid cell. Figure 13 illustrates the discretization for the surroundings of an example grocery store, see panel (a). For each grid cell, record the number of grocery stores as in panel (b) . Other characteristics of each grid cell, for instance the number of businesses by industry are recorded in similar grids as in panel (c). Figure 14 : The propensity score model can still distinguish between some of the false positives / counterfactual locations and real grocery store location, resulting in many candidate locations with low propensity score. After a propensity score matching step and re-estimation of the propensity score, overlap is better. Real and counterfactual grocery store locations have similar (estimated) propensity score. Based on this discretization, I use the method as described in section 5.2 to find counterfactual candidate grocery store locations that are indistinguishable from the real grocery store locations. Since the method can find a very large number of counterfactual grocery store locations, I use propensity score matching to narrow the sample down to a smaller but more balanced sample of real and counterfactual grocery store locations. Panel (a) of figure 14 shows the limited overlap in propensity scores before this second matching step, while panel (b) shows good overlap for the final set of candidate locations. To estimate propensity scores in this setting, I assume that grocery store openings are independent decisions at each location, assumption 7. In practice, this assumption is primarily relevant at the margin of opening (or closing) additional grocery stores relative to the existing grocery stores. Since there are neighborhoods similar in other businesses but differing in the number of grocery stores, this assumption may offer a reasonable approximation. The inverse probability weighted real and counterfactual grocery store locations are similar in everything except their exposure to real grocery stores, which differs by one additional grocery store. Figure 15 shows that the exposure to treatment is as intended: The number of grocery stores at distance between 0.15 and 0.175 miles from a business is the same between businesses at any distance from real and counterfactual grocery store locations, except businesses at that distance from a candidate grocery store location. Businesses at distance from a real grocery store have exactly one additional real grocery store at distance on average, compared to businesses at distance from counterfactual grocery store locations. Furthermore, the composition of nearby businesses is similar between real and counterfactual grocery store locations at any distance. Figure 16 shows that the fraction of restaurants among businesses at distance from counterfactual grocery store locations is comparable to the fraction of restaurants among businesses at distance from real grocery store locations. This lends credibility to the treatment effect estimates below. Treated and control businesses are alike, except for a single additional grocery at the intended distance. Distance from potential grocery store location in miles Share of Businesses That Are Restaurants Realized FALSE TRUE Figure 16 : The composition of businesses near real and counterfactual grocery store locations is similar. It is encouraging that counterfactual grocery store locations mimic the business composition pattern across distance of real grocery store locations. Since the fraction of restaurants decreases meaningfully from short distances to longer distances, inner vs. outer ring comparisons of all businesses would compare businesses in different industries. Inner ring vs. outer ring comparisons of restaurants would compare restaurants in different (business) neighborhoods. Figure 17 : Weighted mean of inverse hyperbolic sine of visits for businesses near real grocery store locations (blue line) and for businesses near counterfactual grocery store locations (red line). The difference between the two lines at a given distance is the estimate of the average treatment effect at that distance. Panel (a) includes all businesses, while panel (b) restricts the sample to restaurants. There is a substantial estimated treatment effect at very short distances of up to 0.1 miles, and no meaningful difference between treated and control businesses at larger distances. Given candidate treatment locations and propensity scores, I estimate treatment effects with the estimators of section 4.2. To interpret the estimated effect as the average effect of opening single grocery stores, rather than the marginal of adding a grocery store to existing exposure, one can make the additivity assumption 5. Additivity may be plausible if each additional grocery store brings new customers into an area. During COVID-19, customers may reduce the number of different grocery stores they shop at to limit their exposure. Furthermore, there is differentiation in the grocery store market: The customers at discount grocery outlets may be distinct from the customers at Whole Foods. Figure 17 shows the average outcome of all businesses (panel a) and restricted to restaurants (panel b) by distance from candidate treatment location, contrasting real grocery store locations (blue line) and counterfactual grocery store locations (red line). At very short distances, businesses (including restaurants) on average have more customers if a (real) grocery store is nearby. If the grocery store is 0.1 or more miles away, it has no more effect on the businesses. Table 1 shows the spatial experiment estimator, which is the same as the difference between the curves at each distance for restaurants (corresponding to panel b of figure 17 ). I also report estimates for the alternative estimator^a tt-eq ( ), which holds the aggregate weight placed on each grocery store constant across distances. I recommend this estimator for comparisons of effects across distances. Since the grocery stores causing the effects are heterogeneous in their numbers of customers, their effects on foot-traffic to nearby restaurants is likely to be heterogeneous as well. I also estimate the ATT using a doubly-robust moment (e.g. Chernozhukov et al., 2018) . The natural extension of the ATT-moment to the spatial treatment setting with interference Table 1 : Estimated effects on the inverse hyperbolic sine of number of visits to restaurants using different estimators. The first panel uses the inner vs. outer ring comparison. The second panel uses the inverse probability weighting estimators for spatial experiments proposed in this paper. The third and final panel uses a doubly-robust version of the spatial experiment estimator. For each method, I implement to estimators: the average effect of the treatment on the treated (^( )), and the equal weighted ATT estimator (^a tt-eq ( )) that has a more attractive interpretation for comparing the effect at different distances. Standard errors for the inner vs. outer ring estimators are clustered by grocery store. Standard errors for the spatial experiment estimators will be reported in future version. Note that the inner ring vs. outer ring comparison uses substantially fewer treatment locations because it requires restricting the sample to isolated grocery stores. where ( ( ) ) averages over all combinations of candidate grocery store locations and individuals satisfying ( , ) ≈ . 1{ ∈ } plays the role of the "treatment indicator." The function ( , ) gives the expected outcome (inverse hyperbolic sine of number of visits) for a business with covariates , including neighborhood characteristics, when there are grocery stores at locations . For a business near a real grocery store, the conditional mean function is evaluated in the absence of the nearby grocery store , ∖ { }, with the parameter of interest, ( ), capturing the difference between actual outcome and expected outcome in the absence of the nearby grocery store. For businesses near an unrealized candidate location , the conditional mean function is evaluated at the background treatment exposure level . The propensity score ( ) gives the probability that there is a real grocery store at candidate location , conditional on characteristics of the neighborhood of . This moment function satisfies the Neyman orthogonality condition of Chernozhukov et al. (2018) . Relative to the spatial experiment estimator, which treats the propensity score as known, this estimator has the advantage of reducing the impact of small errors in the estimated propensity score through orthogonalization. Overall, the inverse propensity score weighting estimator and the doubly-robust estimator yield similar results as shown in table 1 above. Grocery stores have an economically large positive effect during COVID-19 lockdowns only at short distances of less than 0.1 miles. Intuitively, grocery store customers do visit nearby restaurants and coffee shops, but are unlikely to walk for more than a couple of minutes from the grocery store location. For instance, at the (control) average inverse hyperbolic sine of visits of approximately 2.4, an increase of 0.5 points implies a 66% increase in the number of customers. 21 The aim of this paper is to argue that leveraging quasi-random variation in the location of spatial treatments is both conceptually attractive and feasible in many settings in practice. I propose a framework and experimental approach for estimating the effects of spatial treatments. This approach uses random variation in the realized locations of the spatial treatments for causal identification. I argue that an alternative estimator commonly used in practice is not justified by the same random variation, but instead identifies causal effects only under sometimes questionable functional form assumptions. To operationalize the (quasi-) experimental approach with observational data, I propose a machine learning method to find counterfactual locations where the treatment could have occurred but did not. The proposed method specifically leverages that neighborhood characteristics are predictive of both the location of treatments and the outcomes of individuals. Convolutional neural networks learn this rich spatial dependence structure encoding relevant institutional features from the data. I incorporate the appealing properties of generative adversarial networks in a classification problem that leads to much simpler training in practice, similar to denoising autoencoders. I illustrate the proposed methods in an application studying the causal effects of grocery stores on foot-traffic to nearby restaurants during COVID-19 lockdowns. Several key questions remain for future research. In some settings, the spatial treatment is endogenous, but geographic characteristics which are continuous in space are available as plausibly exogenous instruments (cf. Feyrer et al., 2017 Feyrer et al., , 2020 James and Smith, 2020) . It is unclear how to construct powerful instruments from such geographic characteristics and incorporate them in the causal framework of this paper. In this paper, I also assume that there is no migration response to the treatment. To allow for migration, one could either focus on outcomes at fixed geographic locations instead of outcomes of fixed individuals or embrace a local average treatment effect (Angrist et al., 1996) with a large number of compliance types if individuals move to different distances from treatment. The analysis in this paper is focused on estimating (potentially weighted) average treatment effects. In practice, decision makers may often be more interested in the optimal location for the spatial treatment. Here, consider the expectation of the term in the numerator corresponding to individual , The first step uses that the realized outcome is the potential outcome corresponding to the realized treatment. The second step rewrites the potential outcome and distance bin indicator function in terms of non-stochastic candidate locations by summing over all possible treatment locations in the region, ∑︀ ∈S ( ) 1{ = ( ) }. The third step moves the expectation into the summation, and the non-stochastic distance bin indicator function and potential outcome out. The final step resolves the expectation in terms of the probabilities determined by the experimental design, defined in section 2. The general estimator of interest in the setting without interference can be written aŝ where index denotes regions, = 1 if region is treated at some location, and is the single treatment location chosen in region (if any). The weight function ( , ) is chosen by the user to weight individuals and treatment locations as desired and primarily place weight on pairs that are distance apart. For instance, for the ATT estimator with distance bin, choose ( , ) = ( )1{ ( , ) ≤ ℎ}. The probabilities of treatment in regions and locations are given by ≡ Pr( = 1) and ( ) ≡ Pr( = { }| = 1). The first term averages over individuals at distance from a realized treatment location. The second term averages over individuals at distance from unrealized candidate treatment locations. The estimator estimates the weighted average treatment effect: with user-specified weights . The experiment considered here is a completely randomized experiment at the region level, where a fixed number of regions receive treatment at exactly one location each, and treatment in a region is assumed to have no effect on outcomes in other regions -regions are "far apart." The estimator^( ) is hard to analyze (in finite samples) because the denominators are random. This arises because, depending on treatment assignment, there may be more or fewer individuals near realized / unrealized locations. The same problem exists in standard randomized experiments when the treatment is randomized by an independent coin flip for each individual, such that the number of treated varies from assignment to assignment. In that setting, we can instead analyze the experiment with number of treated fixed at the value observed in the realized sample. Conditioning on the number of realized treatment locations is not sufficient in the spatial setting because the number of individuals would still vary since some locations have more individuals near them than other locations. Conditioning on the number of individuals restricts the assignment distribution asymmetrically -inverting an assignment generally changes the number of individuals near treatment -such that standard estimators are no longer unbiased by design. The theoretical analysis of^( ) therefore relies on an approximate estimator that fixes the denominators (at their expected values), and centers the numerators in a way that minimizes the difference between^( ) and its approximation. The approximate estimator is where ( ) and ( ) are average potential outcomes: Here, I show that the estimators^( ) and˜( ) are very close in large enough samples. This motivates the use of exact finite sample results for the mean and variance of the infeasible estimator˜( ) for inference with the feasible estimator^( ). The analysis uses the mean value theorem to derive the difference^( ) −˜( ) and argues that this difference is small in large enough samples. As a practical matter, a sample is large enough if the number of individuals near treatment and control are close to their expected values. The approximation of^( ) by˜( ) is particular close when also the average outcomes are close to their expected values. To simplify notation, define the following shorthands: where˜, ( ) and˜, ( ) replicate^, ( ) and^, ( ) but with expected values rather than sample averages in the denominators. The sample average denominators are^, ( ) and , ( ) (scaled such that they converge under suitable conditions when grows), and the expected value of the denominators is ( ) (similarly scaled). Without loss of generality, I fix the distance and weighting of interest and suppress the dependence on and in the following derivations for ease of presentation. The feasible estimator written in terms of the shorthand notation iŝ where˙is some convex combination of^and . It is straightforward to see thatΔ( , , , ) = 0. Hence the left-hand-side of the equality above is justΔ(^,^,˜,˜), such that the right-hand-side is an expression for^−˜. Hencê Each of the four terms is a product with each factor close to zero under appropriate asymptotics. For instance, with independent regions and bounded outcomes and number of individuals per region, one can get √ (^− ) → 0. That is, the difference between the estimators^( ) and˜( ) is negligible under standard asymptotic frameworks. Since the difference between estimators is very small for large samples, exact finite sample results for ( ) likely provide decent approximations for^( ) in smaller samples. Consider the expected value of the estimator˜( ). To show: (˜( )) = ( ). Since ( ) is the first term of˜( ), I proceed by showing that (˜( ) − ( )) = 0. Since the denominators are non-stochastic, it suffices to show that the expectations of the numerators are equal to zero. The "first term" and "second term" designations below therefore refer to the first and second term of˜( ) − ( ). The expectation of the numerator of the first term is: The first equality rewrites the observed outcome = ( ( ) ) in terms of potential outcome ( ) = ( ) for = . The second equality moves all non-stochastic terms out of the expectation. The third equality rewrites the expectation of indicators as probabilities. The fourth equality distributes the difference ( ) − , ( ) and replaces , ( ) by its definition. For the second term, the factor multiplying the ratio cancels with the denominator, and the numerator is equal to the first term, such that the difference is equal to zero. Analogously, the expectation of the numerator of the second term is: Hence (˜( )) = ( ). The approximate estimator˜( ) in equation 14 is the sum of three terms. Since the first term, ( ) is fixed, the variance only depends on the last two terms. For the third equality, distribute out the −1 term of −(1 − ( ∑︀ ( ))) . . ., which is non-stochastic and hence does not contribute to the variance, such that only +( ∑︀ ( )) . . . remains of the second term. The fourth and final equality above distributes out the ( ) of the second term and then combines the first and second term by factoring out ( ). For ease of notation, definē The only stochastic terms left are the ( ); they represent the design-based variation that is due to random treatment assignment. The average¯+ , ( , ) consists only of a sum of potential outcomes, which are non-stochastic in the design-based perspective, in the numerator and the expected number of individuals near treatment, which is also non-stochastic, in the denominator. The where Pr( ′ = 1| = 1) is determined by the completely randomized design. Let be the (fixed) number of treated regions in a completely randomized design. Then Pr( ′ = 1| = 1) = − 1 − 1 78 since, under assumption 2, if region receives treatment, − 1 of the remaining − 1 regions receive treatment, each with equal probability. So )︂ 2 The first equality combines the added term with the first summation and the subtracted term with the second summation. The second equality simplifies the factor of the first term, factors the second term into and ′ , and notices that both summations are the same, yielding the square in the second term. Here, the third summation is "missing" the terms where = ′ . Adding and subtracting by combining the added = ′ term into the second term and the subtracted = ′ term into the third term. Note that dropping the (unidentified) variances of treatment effects (terms four and five) unambiguously leads to a conservative estimator of the variance. The absolute value of the factor in the fourth term, , is larger than the factor in the fifth term, 2 − , and the numerator of the ratio in the fourth term is larger than the numerator of the ratio in the fifth term by Jensen's inequality (while the denominators are identical). Hence the absolute value of the fourth term is larger than the fifth terms, such that dropping both terms increases the expression, leading to a conservative estimator of the variance. To estimate the first term, takê To estimate the second term, takê 1{ ̸ ∈ } Pr( = 1) Pr( ̸ ∈ | = 1) The estimator^a dditive ( ) here generalizes the estimator in section the main text to allow for an arbitrary number of candidate locations in each region, |S |. The formula is specific to completely randomized designs within treated regions with fixed number of realized locations, and equal probability for each of the When should you adjust standard errors for clustering? Sampling-based vs. design-based uncertainty in regression analysis Synthetic control methods for comparative case studies: Estimating the effect of california's tobacco control program Bias-corrected matching estimators for average treatment effects Shift-share designs: Theory and inference Blowing it up and knocking it down: The local and city-wide effects of demolishing high concentration public housing on crime Cross-section regression with common shocks Identification of causal effects using instrumental variables Spatial Econometrics: Methods and Models. Studies in Operational Regional Science Advances in Spatial Econometrics: Methodology, Tools and Applications. New Directions in Spatial Econometrics Perspectives on Spatial Data Analysis. Advances in Spatial Science A Primer for Spatial Econometrics. Palgrave Texts in Econometrics Towards principled methods for training generative adversarial networks Estimating average causal effects under general interference, with application to a social network experiment Design-based inference for spatial experiments with interference The Impact of Machine Learning on Economics Exact p-values for network interference Experienced segregation Machine learning methods that economists should know about Using wasserstein generative adversarial networks for the design of monte carlo simulations Approximate residual balancing: debiased inference of average treatment effects in high dimensions The china syndrome: Local labor market effects of import competition in the united states Inference in experiments with matched pairs Clustering, spatial correlations, and randomization inference Who benefits from state and local economic development policies? Randomization tests of causal effects under interference Place of work and place of residence: Informal hiring networks and labor market outcomes Elasticities and the inverse hyperbolic sine transformation Program evaluation and causal inference with high-dimensional data Inference on treatment effects after selection among high-dimensional controls Inference with dependent data using cluster covariance estimators The geography of unemployment Non-random exposure to exogenous shocks: Theory and applications Quasi-experimental shift-share research designs New evidence on the finite sample properties of propensity score reweighting and matching estimators A practitioner's guide to cluster-robust inference Spatial patterns in household demand Reducing crime through environmental design: Evidence from a randomized experiment of street lighting in new york city Double/debiased machine learning for treatment and structural parameters The impacts of neighborhoods on intergenerational mobility i: Childhood exposure effects Where is the land of opportunity? the geography of intergenerational mobility in the united states Free distribution or cost-sharing? evidence from a randomized malaria prevention experiment Gmm estimation with cross sectional dependence Statistics for Spatio-Temporal Data Statistics for Spatial Data (Rev Environmental health risks and housing values: evidence from 1,600 toxic plant openings and closings Difference-in-differences techniques for spatial data: Local autocorrelation and spatial interaction The development effects of the extractive colonial economy: The dutch cultivation system in java Who wants affordable housing in their backyard? an equilibrium analysis of low-income property development Inference with difference-in-differences and other panel data. The Review of The view from above: Applications of satellite data in economics Accounting for unobservable heterogeneity in cross section using spatial first differences Schooling and labor market consequences of school construction in indonesia: Evidence from an unusual policy experiment Poverty from space: using high-resolution satellite imagery for estimating economic well-being Robust inference on average treatment effects with possibly more covariates than observations Geographic dispersion of economic shocks: Evidence from the fracking revolution: Reply Geographic dispersion of economic shocks: Evidence from the fracking revolution Sources of geographic variation in health care: Evidence from patient migration Place-based drivers of mortality: Evidence from migration Pre-event trends in the panel event-study design Finite-sample properties of propensity-score matching and weighting estimators A note on the role of the propensity score for estimating average treatment effects Text as data Big data and big cities: The promises and limitations of improved measures of urban life Bartik instruments: What, when, why, and how Generative adversarial nets Discrete Choice Modeling Identifying agglomeration spillovers: Evidence from winners and losers of large plant openings Bidding for industrial plants: Does winning a 'million dollar plant' increase welfare? Take the q train: Value capture of public infrastructure projects On the role of the propensity score in efficient semiparametric estimation of average treatment effects Generalized least squares inference in panel and multilevel models with serial correlation and fixed effects The Elements of Statistical Learning: Data Mining, Inference and Prediction Transforming auto-encoders Efficient estimation of average treatment effects using the estimated propensity score Latent space approaches to social network analysis Stability in competition Toward causal inference with interference Estimating spatial treatment effects: An application to base closures and aid delivery in afghanistan Nonparametric estimation of average treatment effects under exogeneity: A review Causal Inference for Statistics, Social, and Biomedical Sciences: An Introduction Spatial transformer networks. Advances in Neural Information Processing Systems Geographic dispersion of economic shocks: Evidence from the fracking revolution: Comment Combining satellite imagery and machine learning to predict poverty What happens when wal-mart comes to town: An empirical analysis of the discount retailing industry An adversarial approach to structural estimation Consequences of the clean water act and the demand for water quality Hac estimation in a spatial framework Combining matching and synthetic controls to trade off biases from extrapolation and interpolation The standard errors of persistence Imagenet classification with deep convolutional neural networks Dynamic spatial panel models: Networks, common shocks, and sequential exogeneity On asymptotic distribution and asymptotic efficiency of least squares estimators of spatial variogram parameters Central limit theorems for long range dependent spatial linear processes Asymptotic distributions of quasi-maximum likelihood estimators for spatial autoregressive models Network dependence can lead to spurious associations and invalid inference On how well generative adversarial networks learn densities: Nonparametric and parametric results Estimates of the impact of crime risk on property values from Megan's laws How do right-to-carry laws affect crime rates? coping with ambiguity using bounded-variation assumptions Estimating treatment effects from spatial policy experiments: an application to ugandan microfinance Worms: identifying impacts on education and health in the presence of treatment externalities Random group effects and the precision of regression estimates An illustration of a pitfall in estimating the effects of aggregate variables on micro units Alternative tests of the error components model Machine learning: an applied econometric approach Population intervention causal effects based on stochastic interventions On the application of probability theory to agricultural experiments. essay on principles. section 9 On the application of probability theory to agricultural experiments. essay on principles. section 9 Causal inference with spatio-temporal data: Estimating the effects of airstrikes on insurgent violence in iraq Spatial econometrics for misaligned data An honest approach to parallel trends Semiparametric efficiency in multivariate regression models with missing data Root-n-consistent semiparametric regression Interference between units in randomized experiments The central role of the propensity score in observational studies for causal effects Local exposure to school shootings and youth antidepressant use Determining point-of-interest visits from location data: A technical guide to visit attribution Externalities of public housing: The effect of public housing demolitions on local crime Causal inference with misspecified exposure mappings Average treatment effects in the presence of unknown interference Nonparametric density estimation under adversarial losses Nonparametric policy analysis Nonparametric policy analysis: an application to estimating hazardous waste cleanup benefits On causal inference in the presence of interference Identification and estimation of spillover effects in randomized experiments Extracting and composing robust features with denoising autoencoders Bipartite causal inference with interference The average of treated individuals at distance ± ℎ from realized treatment locations isThe term inside the square equals zero:The first equality substitutes the definition of¯+ , ( , ). The second equality splits the ratio into two separate sums, one of treated, the other of control potential outcomes. For the first term, the third equality factors out 1 = 1 , which is constant across regions by assumption 2, and cancels ( ) 1 ( ) . For the second term, the third equality factors out 1 1− = 1 1− , which is constant across regions by assumption 2, and notes that the sum of conditional probabilities is equal to 1 in each region, ∑︀ ( ) = 1. Both terms are equal to zero by the definitions of , ( ) and , ( ).Hence, under assumption 2 (completely randomized experiment), equation 15 becomes var (︂˜()︂ 2 where = = by assumption 2.Bernoulli trial Under assumption 3 (Bernoulli trial), equation 15 becomesTo treat the variances under assumptions 2 and 3 jointly, define Both under assumption 2 and under assumption 3, the variance of˜( ) depends on squares of (sums of) the potential outcome sums¯+ , ( , ).Consider the first square of potential outcomes above. By applying the binomial theorem twice, 22 rewrite the squared sum of potential outcomes as the difference between estimable marginal variances and an inestimable (approximate) treatment effect variance. By dropping the inestimable variance of treatment effects, one obtains a conservative estimate of the variance.Since ( + ) 2 = 2 + 2 + 2 and ( − ) 2 = 2 − 2 + 2 , ( + ) 2 = 2 2 + 2 2 − ( − ) 2 . Similarly for the second square of potential outcomes above:Substituting these expressions into equation 17, The variance in equation 18 consists of five terms. The first and third terms resemble a variance of outcomes of treated individuals. The second term resembles a variance of outcomes of control individuals. The fourth and fifth terms resemble variances of treatment effects. It is only possible to identify the effect of a candidate treatment location on individual , ( ), if is the closest realized treatment location to with positive probability. One can take^,The estimator^n earest ( ) is equal to 0 if the region of individual is treated but location is not the closest realized treatment location to . This happens both when is not realized itself, and when another realized treatment location ′ is closer to . If is the closest realized location to ,^n earest ( ) is equal to the outcome of scaled by the inverse of the probability of this event. If the region is not treated,^n earest ( ) is equal to the outcome of scaled by the inverse of the probability of region not being treated. Clearly,^n earest ( ) is an unbiased inverse probability weighting estimator of ( ) ≡ ( ) − (0) under the assumption that only the nearest realized treatment matters.