key: cord-0146896-flsci81n
authors: Li, Wenrui; Sussman, Daniel L.; Kolaczyk, Eric D.
title: Causal Inference under Network Interference with Noise
date: 2021-05-10
journal: nan
DOI: nan
sha: ee22ac97ced8b07d2cf5337f8f94ac2c6448700b
doc_id: 146896
cord_uid: flsci81n

Increasingly, there is a marked interest in estimating causal effects under network interference due to the fact that interference manifests naturally in networked experiments. However, network information generally is available only up to some level of error. We study the propagation of such errors to estimators of average causal effects under network interference. Specifically, assuming a four-level exposure model and Bernoulli random assignment of treatment, we characterize the impact of network noise on the bias and variance of standard estimators in homogeneous and inhomogeneous networks. In addition, we propose method-of-moments estimators for bias reduction. We illustrate the practical performance of our estimators through simulation studies in British secondary school contact networks.

In recent years, there has been an enormous interest in the assessment of treatment effects within networked systems. Naturally, interference (Cox and Cox (1958) ) cannot realistically be assumed away when doing experiments on networks. The outcome of one individual may be affected by the treatment assigned to other individuals, which violates the 'stable unit treatment value assumption' (SUTVA) (Neyman (1923) , Rubin (1990) ).

As a result, much of what is considered standard in the traditional design of randomized experiments and the corresponding analysis for causal inference does not apply directly in this context. Moreover, network information generally is available only up to some level of error, also known as network noise. For example, there is often measurement error associated with network constructions, where, by 'measurement error' we will mean true edges being observed as non-edges, and vice versa. Such edge noise occurs in self-reported contact networks where participants may not perceive and recall all contacts correctly (Smieszek et al. (2012) ). It can also be found in biological networks (e.g., of gene regulatory relationships), which are often based on notions of association (e.g., correlation, partial correlation, etc.) among experimental measurements of gene activity levels that are determined by some form of statistical inference. We investigate how network noise impacts estimators of average causal effects under network interference and how to account for the noise.

We assume the observed graph is a noisy version of a true graph. Let G = (V, E) be an undirected graph and G obs = (V, E obs ) be the observed graph, where we assume that the vertex set V is known. Denote the adjacency matrix of G by A = (A i,j ) Nv×Nv and that of G obs byÃ = (Ã i,j ) Nv×Nv . Hence A i,j = 1 if there is a true edge between the i-th vertex and the j-th vertex, and 0 otherwise, whileÃ i,j = 1 if an edge is observed between the i-th vertex and the j-th vertex, and 0 otherwise. We assume throughout that G and G obs are simple.

We express the marginal distributions of theÃ i,j in the form (Balachandran et al.

where E c = {{i, j} : i, j ∈ V ; i < j}. Drawing by analogy on the example of network construction based on hypothesis testing, α i,j can be interpreted as the probability of a Type-I error on the (non)edge status for vertex pair {i, j} ∈ E c , while β i,j is interpreted as the probability of Type-II error, for vertex pair {i, j} ∈ E. Our interest is in characterizing the manner in which the uncertainty in theÃ i,j propagates to estimators of average causal effects.

Let z i = 1 indicate that individual i ∈ V received a given treatment. We will refer to z = (z 1 , · · · , z Nv ) ∈ {0, 1} Nv as the treatment assignment vector. Let p z = P(Z = z) be the probability that treatment assignment z is generated by the experimental design.

Additionally, let y i (z) denote the outcome for individual i under treatment assignment z. In the worst case, there will be 2 Nv possible exposures for each of the N v individuals, making causal inference impossible. To avoid this situation, we adopt the notion of so-called exposure mappings, introduced by Aronow and Samii (2017). We say that i is exposed to condition k = 1, . . . , K if f (z, x i ) = c k , where f is the exposure mapping, z is the treatment assignment vector, and x i is a vector of additional information specific to individual i. Under interference, these authors offer a simple, four-level categorization of exposure (K = 4) that we revisit here and throughout this paper. Taking the vector x i to be the ith column of the adjacency matrix A (i.e., x i = A .i ), they define

where the inner product z A ·i is the number of treated neighbors of individual i.

In the general exposure mapping framework of Aronow and Samii (2017), potential outcomes are dependent only on the exposure conditions for each unit. Suppose each individual i has K potential outcomes y i (c 1 ), · · · , y i (c K ) and is exposed to one and only one condition. Then, define

to be the average causal contrast between exposure condition k versus l. Consider again, for example, the exposure mapping function defined in (1.2). A natural set of contrasts is τ(c 01 , c 00 ), τ(c 10 , c 00 ), and τ(c 11 , c 00 ), which capture the average indirect treatment effect, the average direct treatment effect, and the average total treatment effect, respectively. Now consider the problem of inference for causal effects under network interference. The

Horvitz-Thompson framework accounts for unequal-probability sampling through the use of inverse probability weighting (Horvitz and Thompson (1952) ) and is adapted by Aronow and Samii (2017) under exposure mappings. In noise-free networks, assuming all individuals have nonzero exposure probabilities for all exposure conditions, the estimator

is well-defined and unbiased for y(c k ), where the exposure probabilities p e i (c k ) are defined as z p z I {f (z,x i )=c k } . In turn,τ(c k , c l ) =ŷ(c k ) −ŷ(c l ) is an unbiased estimator of τ(c k , c l ).

However, in noisy networks, exposure levels will be misclassified. For example, in the four-level exposure model, for a node i, the expected confusion matrix for observed (rows) versus true (columns) exposures has the following form

P i (c 11 , c 11 ) P i (c 11 , c 10 ) 0 0 P i (c 10 , c 11 ) P i (c 10 , c 10 ) 0 0 0 0 P i (c 01 , c 01 ) P i (c 01 , c 00 )

(1.5) wherec k represents the exposure level in observed networks and P i (c k , c l ) =

The two off-diagonal blocks are equal to 0, since network noise does not affect treatment status. The four symbols P i (c k , c l ), k = l are cases where exposure levels are misclassified. In the general exposure mapping framework, the estimators (1.4) for y(c k ) are in fact

whereX i is a noisy version of x i , andp e i (c k ) = z p z I {f (z,X i )=c k } . From (1.6), we can see that the errors come from two parts: incorrect exposure probabilities and misclassified exposure levels.

In this paper, we will address the following important questions. First, what is the impact of ignoring network noise? Second, how can we account for network noise?

Awareness of interference goes back at least 100 years (e.g., Ross (1916) ), and its impact on standard theory and methods has been studied previously in certain specific contexts, including interference localized to an individual across different rounds of treatment in clinical trials with crossover designs (Grizzle (1965) ), interference based on spatial proximity of treated units (Kempton and Lockwood (1984) ) and interference within blocks (Hudgens and Halloran (2008) ). For network interference, an assumption that has gained traction is that the causal effects can be passed along edges in the network. A highly studied assumption is to assume that unit outcomes are only impacted by their neighbors in the network (Manski Extensive work regarding uncertainty analysis has been done in causal inference without the network structure or interference. Many studies have explored the effects of uncertainty in propensity scores on causal inference. For instance, there have been efforts to develop Bayesian propensity score estimators to incorporate such uncertainties into causal inference (e.g., An (2010), Alvarez and Levin (2014)). And there are some studies on the properties for particular matching estimators for average causal effects (e.g., Abadie and Imbens (2006) , Schafer and Kang (2008) ). But, to our best knowledge, there has been little attention to date given towards uncertainty analysis of estimators for average causal effects under network interference. Exceptions include a Bayesian procedure which accounts for network uncertainty and relies on a linear response assumption to increase estimation precision (Toulis and Kao (2013) ), and structure learning techniques to estimate causal effects under data dependence induced by a network represented by a chain graph model, when the structure of this dependence is not known a priori (Bhattacharya et al. (2019)).

As remarked above, there appears to be little in the way of a formal and general treatment of the error propagation problem in estimators of average causal effects under network interference. However, there are several areas in which the probabilistic or statistical treatment of uncertainty enters prominently in network analysis. Model-based approaches include statistical methodology for predicting network topology or attributes with models that explicitly include a component for network noise (Jiang et al. (2011), Jiang and Kolaczyk (2012) ), the 'denoising' of noisy networks (Chatterjee et al. (2015) ), the adaptation of methods for vertex classification using networks observed with errors (Priebe et al. (2015) ), a regression model on network-linked data that is based on a flexible network effect assumption and is robust to errors in the network structure (Le and Li (2020) ), and a general Bayesian framework for reconstructing networks from observational data (Young et al. (2020) ). The other common approach to network noise is based on a 'signal plus noise' perspective. For example, Balachandran et al. (2017) introduced a simple model for noisy networks that, conditional on some true underlying network, assumed we observed a version of that network corrupted by an independent random noise that effectively flips the status of (non)edges. Later, Chang et al. (2020) developed method-of-moments estimators for the underlying rates of error when replicates of the observed network are available. In a somewhat different direction, uncertainty in network construction due to sampling has also been studied in some depth. See, for example, Kolaczyk (2009 , Chapter 5) or Ahmed et al. (2014 for surveys of this area. However, in that setting, the uncertainty arises only from sampling-the subset of vertices and edges obtained through sampling are typically assumed to be observed without error.

Our contribution in this paper is to quantify how network errors propagate to standard estimators of average causal effects under network interference, and to provide new estimators for average causal effects when replicates of the observed network are available. Adopting the noise model proposed by Balachandran et al. (2017) , we characterize the impact of network noise on the bias and variance of standard estimators (Aronow and Samii (2017)) under a four-level exposure model and Bernoulli random assignment of treatment, and we illustrate the asymptotic behaviors on networks for varying degree distributions. Additionally, we propose method-of-moments estimators of average causal effects, when replicates of the observed network are available. Numerical simulation in the context of social contact networks in British secondary schools suggests that high accuracy is possible for networks of even modest size.

The organization of this paper is as follows. In Section 2 we present the bias and variance of standard estimators in noisy networks under a four-level exposure model and Bernoulli random assignment of treatment. Section 3 contains our proposed method-of-moments estimators for the true average causal effects. Numerical illustrations are reported in Section 4. Finally, we conclude in Section 5 with a discussion of future directions for this work. All proofs are relegated to supplementary materials.

In this section, we characterize the impact of network noise on biases and variances of standard estimators under a four-level exposure model and Bernoulli random assignment of treatment. Specifically, we show results for two typical classes of networks: homogeneous and inhomogeneous. By the term homogeneous we mean the degrees follow a zero-truncated Poisson distribution, and by inhomogeneous, the degrees follow a Pareto distribution with an exponential cutoff (Clauset et al., 2009) . Note that many real networks present a bounded scale-free behavior with a connectivity cut-off due to the finite size of the network or to the presence of constraints limiting the addition of new links in an otherwise infinite network (Amaral et al. (2000) ). The exponential cutoff is most widely used.

We consider two typical classes of networks: homogeneous and inhomogeneous. The formal definitions are as follows.

Homogeneous network setting The degree distribution of G is a zero-truncated Poisson distribution with meand.

Inhomogeneous network setting The degree distribution of G is a Pareto distribution with an exponential cutoff with rate λ, shape ζ, lower bound d L , upper bound N v − 1 and meand.

The degree distribution is the probability distribution of the degrees over the whole network.

Remark 2 Note thatd, λ and d L depend on N v . For notational simplicity, we omit N v .

Remark 3 In the inhomogeneous network setting, by the definition of Pareto distribution with an exponential cutoff, λ, ζ, d L ,d and N v satisfy the equation

Here we focus on a general formulation of the problem in which we make the following assumptions on networks and the treatment assignment.

Assumption 1 (Constant marginal error probabilities) Assume that α i,j = α and β i,j = β for all i < j, so the marginal error probabilities are P(Ã i,j = 0|A i,j = 1) = β and P(Ã i,j = 1|A i,j = 0) = α.

Assumption 2 (Independent noise) The random variablesÃ i,j , for all i < j, are conditionally independent given A i,j .

Assumption 3 (Large Graphs) The number of vertices N v → ∞.

In Assumption 1, we assume that both α and β remain constant over different edges.

Under Assumptions 1 and 2, the distribution ofd i is

Assumption 3 reflects both the fact that the study of large graphs is a hallmark of modern applied work in complex networks and, accordingly, our desire to understand asymptotic behaviors of estimators for average causal effects and provide concise descriptions in terms of biases and variances for large graphs.

Assumption 4 The treatment probability p satisfies p = o(1), p = ω(1/N v ),d = Θ(1/p).

Letting C ij denote the number of common neighbors between vertices i and j in G,

Finally, the potential outcomes are bounded, |y i (c k )| ≤ c < ∞, for all values i and c k , where c is a constant.

Assumption 4 entails that, as N v grows, the expected number of treated individuals also grows but is dominated by N v asymptotically. And the average number of treated neighbors is bounded. The amount of vertex pairs having common neighbors is also limited in scope as N v grows which ensures a sufficiently large set of independent exposures. Assumption 4 is an assumption used in proving the consistency ofτ(c k , c l ) in noise-free homogeneous and inhomogeneous networks. See Appendix 8.1 for details.

Assumption 5 1 − β = Ω(1), α = Θ(1/(pN v )), and α = o(p).

Remark 4 Note that α and β can be constants or o(1) as N v → ∞. For notational simplicity, we omit N v .

, which is consistent with Assumption 4.

By making assumptions on the underlying rates of error α and β, we will see that regularity conditions hold for noisy homogeneous and inhomogeneous networks in Appendix 8.2 .

Assuming a four-level exposure model and Bernoulli random assignment of treatment, we quantify the biases of standard estimators in homogeneous and inhomogeneous network settings. We begin with the following general result.

Theorem 1 Assume a four-level exposure model and Bernoulli random assignment of treatment with probability p. Under Assumptions 1 -3, 5, p = o(1), p = ω(1/N v ) and the potential outcomes are bounded, we have

and d i is the degree of the i-th vertex in the noise-free network G.

Theorem 1 then directly leads to the following corollary in homogeneous and inhomogeneous network settings.

Corollary 1 (Homogeneous and inhomogeneous) Assume a four-level exposure model and Bernoulli random assignment of treatment with p. In both homogeneous and inhomogeneous network settings, under Assumptions 1 -3, 5, p = o(1), p = ω(1/N v ) and the potential outcomes are bounded, the bias statement in Theorem 1 holds.

The proof of Theorem 1 is in supplementary material C. Corollary 1 directly follows from Theorem 1.

Biases of standard estimators in homogeneous and inhomogeneous network settings have the same expressions. Biases ofỹ A&S (c 11 ) andỹ A&S (c 01 ) depend on both α and β, while biases ofỹ A&S (c 10 ) andỹ A&S (c 00 ) only depend on β. Biases ofỹ A&S (c 11 ) andỹ A&S (c 10 ) are related to τ(c 11 , c 10 ). And biases ofỹ A&S (c 01 ) andỹ A&S (c 00 ) are related to τ(c 01 , c 00 ). These relationships follow because the network noise affects observed edges but not treatment status.

Letỹ A&S,i (c k ) denote the Aronow and Samii estimator for y i (c k ) in noisy networks, which corresponds to the i-th element ofỹ A&S (c k ) in (1.6). We summarize in the following table the asymptotic biases ofỹ A&S,i (c k ) for high (top row) and low (bottom row) degree nodes. 

We see that there are four cases whereỹ A&S,i (c k ) is asymptotically unbiased. The reason is that the corresponding entries in the expected confusion matrix (1.5) go to 0. For the other four cases, the corresponding entries in the expected confusion matrix approach 1, which leads to nontrivial biases. Note that the asymptotic biases ofỹ A&S,i (c k ) is between 0 and the corresponding ±τ i (c k , c l ) when d i = Θ(1/p).

We analyze the variances of standard estimators in homogeneous and inhomogeneous network settings.

Theorem 2 (Homogeneous) Assume a four-level exposure model and Bernoulli random assignment of treatment with probability p. In the homogeneous network setting, under

Theorem 3 (Inhomogeneous) Assume a four-level exposure model and Bernoulli random assignment of treatment with probability p. In the inhomogeneous network setting, under

Note that the variances go to zero as the number of nodes tends towards infinity for both cases. Therefore, in noisy networks, the bias would appear to be the primary concern for estimating average causal effects.

As we saw in Section 2, standard estimators are biased in both homogeneous and inhomogeneous network settings. Thus, it is important to have new estimators for bias reduction. We present method-of-moments estimators in Section 3.1, and show unbiasedness and consistency under a four-level exposure model and Bernoulli random assignment of treatment in Section 3.2. The method-of-moments estimators require either knowledge or consistent estimators of α and β. We adopt the estimators in Chang et al. (2020), which require at least three replicates of the observed network.

We construct method-of-moments estimators (MME) by reweighting the observed outcomes based on the expected confusion matrix. For convenience, we denote

We then combine the observed outcome 1(x i ) y i and the observed exposure level into a vector, denoted byỹ i ,ỹ

(3.1)

By taking the expectation with respect to treatment and network noise, we obtain

Note that P i depends on d i , α and β. Therefore, we use P (d i , α, β) for explicitness.

The method of moments estimator for y i is defined as

The valuesα andβ are consistent estimators of α and β, which we provide an example of consistent estimators later. If α and β are known, we substituteα andβ in (3.3) and

(3.4) with the known values, and this does not change the asymptotic behavior we state in Section 3.2.

We define the method-of-moments estimator for Nv i=1 y i /N ṽ

whereỹ A&S,i is the Aronow and Samii estimator of node i in the noisy network. Recall from the bias statements in Table 2 .1 thatỹ A&S,i (c 11 ) andỹ A&S,i (c 01 ) are asymptotically unbiased for nodes with high degrees. Andỹ A&S,i (c 10 ) andỹ A&S,i (c 00 ) are asymptotically unbiased for small degree nodes. Therefore, we do not need to correct biases for those cases. We will show thatỹ MME,i is asymptotically unbiased with small variance for nodes with degree on the order of 1/p in Theorem 4 and 5. Otherwise, asymptotically unbiased estimators with small variances may not exist due to the structure of this specific four-level

nodes. These mean that we lose almost all information about y i (c 11 ) and y i (c 01 ) for small degree nodes, and y i (c 10 ) and y i (c 00 ) for high degree nodes.

In general, we suggest to use terms of the same orders of magnitude in (3.5) to approximate Θ(·). That is, writing 1/p = a × 10 b , where 1/ √ 10 ≤ a < √ 10, we represent the order of magnitude with b. Next, we rewriteỹ MME as

where C 1 = 10 b / √ 10 and C 2 = √ 10 · 10 b . For sparse and inhomogeneous networks with small sample sizes, C 1 may be close to the average degree and thus we recommend to

(3.7)

Remark 6 As we will see later, in this specific four-level exposure model,ỹ MME is asymptotically unbiased and consistent in both homogeneous and inhomogeneous network settings.

Our estimators require knowledge or, more realistically, consistent estimates of the parameters α and β governing the noise. Here, we adopt the consistent MME estimators in Chang et al. (2020) , which require at least three replicates of the observed network. Define relevant quantities as follows:

where δ is the edge density in the true network G, u 1 is the expected edge density in one observed network, u 2 is the expected density of edge differences in two observed networks, and u 3 is the average probability of having an edge between two arbitrary nodes in one observed network but no edge between the same nodes in the other two observed networks.

The method-of-moments estimators for u 1 , u 2 and u 3 arê 

Input:Ã = (Ã i,j ) Nv×Nv ,Ã * = (Ã i,j, * ) Nv×Nv ,Ã * * = (Ã i,j, * * ) Nv×Nv , α 0 , ε Output:α,β Computeû 1 ,û 2 ,û 3 defined in (3.8); Initializeα = α 0 , α 0 =α + 10ε; while |α − α 0 | > ε do α 0 ←α,β ←û 2 −α 0 +û 1 α 0 u 1 −α 0 ,δ ← (û 1 −α 0 ) 2 u 1 −û 2 −2û 1 α 0 +α 2 0 ,α ←û 3 −δβ 2 (1−β) (1−δ)(1−α 0 ) 2 .

We consider the asymptotic behavior of the method-of-moments estimatorsỹ MME (c k ) as

Theorem 4 (Homogeneous) Assume a four-level exposure model and Bernoulli random assignment of treatment with p. In the homogeneous network setting, under Assumptions 1 -5,ỹ MME (c k ) is an asymptotically unbiased and consistent estimator of y(c k ) for all c k .

Theorem 5 (Inhomogeneous) Assume a four-level exposure model and Bernoulli random assignment of treatment with p. In the inhomogeneous network setting, under Assumptions 1 -5, λ = Θ(p) and λ > p,ỹ MME (c k ) is an asymptotically unbiased and consistent estimator of y(c k ) for all c k .

Note thatỹ MME (c k ) is an asymptotically unbiased and consistent estimator of y(c k ) in both homogeneous and inhomogeneous network settings. Proofs of Theorem 4 and 5 appear in supplementary material C.

We conduct some simulations to illustrate the finite sample properties of the proposed estimation methods. We consider the data and network construction described in Kucharski et al. (2018) . These data were collected from 460 unique participants across four rounds of data collection conducted between January and June 2015 in year 7 groups in four UK secondary schools, with 7,315 identifiable contacts reported in total. They used a process of peer nomination as a method for data collection: students were asked, via the research questionnaire, to list the six other students in year 7 at their school that they spend the most time with. For each pair of participants in a specific round of data collection, a single link was defined if either one of the participants reported a contact between the pair (i.e.

there was at least one unidirectional link, in either direction). Our analysis focuses on the single link contact network.

For each school, we construct a 'true' adjacency matrix A: if an edge occurs between a pair of vertices more than once in four rounds, we view that pair to have a true edge. The noisy, observed adjacency matricesÃ,Ã * ,Ã * * are generated according to (1.1). We set α = 0.005 or 0.010, and β = 0.05, 0.10, or 0.15. We assume that both α and β are unknown.

For treatment effects we adopt a simple model in the spirit of the 'dilated effects' model of Rosenbaum (Rosenbaum, 1999) and suppose y i (c 11 ) = 10, y i (c 10 ) = 7, y i (c 01 ) = 5, y i (c 00 ) = 1. We set p = 0.1 and explore the effect of α, β on the performance of estimatorsỹ(·).

We run Monte Carlo simulation of 10,000 trials and compute three kinds of estimators:

Aronow and Samii estimators in noise-free networks, Aronow and Samii estimators in noisy networks, and method-of-moments estimators in noisy networks. For the method-of-moments estimators, we first obtain estimatorsα andβ by Algorithm 1. The networks are sparse with inhomogeneous degree distribution and small sizes, so we computeỹ MME by (3.7). Also, we run 1,000 times bootstrap resampling of estimators to obtain 95% confidence intervals for biases and standard deviations of estimators. Biases and standard deviations are shown in Figure 4 .1 and 4.2. Error bars are 95% confidence intervals.

From the plots, we see that method-of-moments estimators outperform Aronow and Samii estimators in noisy networks, and essentially perform the same on noisy networks as 

Here we have quantified biases and variances of standard estimators in noisy networks and developed a general framework for estimation of true average causal effects in contexts wherein one has observations of noisy networks. Our approach requires knowledge or consistent estimates of the corresponding noise parameters, the latter which can be obtained with as few as three replicates of network observations. We employ method-of-moments Error bars are 95% confidence intervals. Simulations in British secondary schools contact networks demonstrate that substantial inferential accuracy by method-of-moments estimators is possible in networks of even modest size when nontrivial noise is present.

We have pursued a frequentist approach to the problem of uncertainty quantification for estimating average causal effects. If the replicates necessary for our approach are unavailable in a given setting, a Bayesian approach is a natural alternative. For example, posterior-predictive checks for goodness-of-fit based on examination of a handful of network summary measures is common practice (e.g., Bloem-Reddy and Orbanz (2018)). Note, however, that the Bayesian approach requires careful modeling of the generative process underlying G and typically does not distinguish between signal and noise components. Our analysis is conditional on G, and hence does not require that G be modeled. It is effectively a 'signal plus noise' model, with the signal taken to be fixed but unknown. Related work has been done in the context of graphon modeling, with the goal of estimating network motif frequencies (e.g., Latouche and Robin (2016)). However, again, one typically does not distinguish between signal and noise components in this setting. Additionally, we note that the problem of practical graphon estimation itself is still a developing area of research.

Our work here sets the stage for extensions to other potential outcome frameworks and exposure models. Here we sketch the key elements of one such extension. For example,

consider the exposure mapping f as following: are more likely to spread between people closely connected in a social network. And being in contact with more people with the disease means that, in theory, they are more likely to contract the disease.

As an illustration, suppose that treatment is assigned to the N v individuals in a network Table 6 .

Then, we can construct regularity conditions for the average causal effect estimators to be consistent. Similarly, one can quantify biases and variances of standard estimators in noisy networks and develop a general framework for estimation of true average causal effects.

These require additional work due to the complexities of formulas for exposure probabilities.

Our choice to work with independent network noise is both natural and motivated by convenience. A precise characterization of the dependency is typically problem-specific and hence a topic for further investigation. Data and code accessibility: No primary data are used in this paper. Secondary data source is taken from Kucharski et al. (2018) . These data and the code necessary to reproduce the results in this paper are available at https://github.com/KolaczykResearch/ CausInfNoisyNet.

This work was supported in part by ARO award W911NF1810237. This work was also supported by the Air Force Research Laboratory and DARPA under agreement number FA8750-18-2-0066 and by a grant from MIT Lincoln Labs.

In this appendix, we provide arguments for the consistency of contrast estimates in noise-free networks and regularity conditions in noisy networks. And we present the exposure probabilities in the generalized four-level exposure model.

We first establish conditions for the estimatorτ(c k , c l ) to converge to τ(c k , c l ) as N v → ∞.

We will show that, under two regularity conditions,τ(c k , c l )

that these conditions are similar to but slightly more general than the conditions in Aronow and Samii (2017).

We will also make an assumption about the amount of dependence among exposure conditions in the population.

Condition 2 implies that the amount of pairwise clustering in exposure conditions is limited in scope as N v grows. Condition 2 can be relaxed, though Condition 1 would likely need to be strengthened accordingly.

Proposition 1 Given Conditions 1 and 2,τ(c k , c l )

Assuming the four-level exposure model in (1.2) and Bernoulli random assignment of treatment with probability p, we consider the consistency of the estimatorτ(c k , c l ) in two typical classes of networks: homogeneous and inhomogeneous.

Proposition 2 (Homogeneous) Assume a four-level exposure model and Bernoulli random assignment of treatment with p. In the homogeneous network setting, under Assumption

Proposition 3 (Inhomogeneous) Assume a four-level exposure model and Bernoulli random assignment of treatment with p. In the inhomogeneous network setting, under Assumption 4, λ = Θ(p) and λ > p, we haveτ(c k , c l )

Proofs for Propositions 1 -3 appear in the supplementary material A.

Note that, under Assumption 4, Condition 1 does not hold for levels c 10 and c 00 when the degrees follow a Pareto distribution with shape ζ > 1. This is because there are more high degree nodes, and 1/p e i (c 10 ) and 1/p e i (c 00 ) increase exponentially when the degree d i increases. See supplementary material E for the proof.

Recall that under the Condition 1 and 2,τ(c k , c l ) P − → τ(c k , c l ) as N v → ∞. By making assumptions on underlying rates of error α and β, we will show that similar regularity conditions hold for noisy homogeneous and inhomogeneous networks. These conditions will then be used in our characterization of bias and variance in Sections 2.2 and 2.3. Definẽ

Proposition 4 (Homogeneous) Assume a four-level exposure model and Bernoulli random assignment of treatment with p. In the homogeneous network setting, under Assumptions 1 -5, for all values i and c k , 

Noise"

A Proofs of propositions for noise-free networks

Notice that

. 

[τ(c k , c l )] = o(1). Since E[τ(c k , c l )] = τ(c k , c l ), we haveτ(c k , c l ) L 2 − → τ(c k , c l ). This impliesτ(c k , c l ) P − → τ(c k , c l ) as N v → ∞.

By Proposition 1, it suffices to show Condition 1 and 2 are satisfied. For notational simplicity, we define E[1/p e (c k )] := 1 Nv Nv i=1 1/p e i (c k ), i.e., the average of the inverse of the exposure probability over a finite population N v .

Under Assumption 4, we obtain

By the continuous mapping theorem, we have

. Thus, we

.

Since Therefore, Condition 1 is satisfied.

Next, we show Condition 2 is also satisfied. Define a pairwise dependency indicator g ij

For the level c 00 , we have Similarly, we can show (A.1) for other exposure levels. Therefore, Condition 2 is satisfied.

First, we compute the normalization parameter of the Pareto distribution with an exponential cutoff. By definitions, we have

Note that, by integration by parts, we have

Therefore, we obtain

Next, we show d L = Θ(d). By definitions of d L andd, we have d L = O(d). Therefore, it suffices to show that d L = Ω(d). We prove it by contradiction. Assume d L = o(d), then by

On the other hand, note that

where Γ(·, ·) is the upper incomplete gamma function. Recall the properties of the upper incomplete gamma function, Γ(s, x) → −x s /s as x → 0 and s < 0, and Γ(s, x) → x s−1 e −x as x → ∞ (Jameson (2016), Olver (1997) , Temme (2011)). Then by (A.5), we obtain

which is in contradiction to (A.4). Thus, we have d L = Θ(d).

Then, we prove Proposition 3. By Proposition 1, it suffices to show Condition 1 and 2 are satisfied. Under Assumption 4 and p < λ, we obtain

Similarly, we have E[1/p e (c 00 )] = O(1).

Next, we show E[1/p e (c 11 )] = O(1/p) and E[1/p e (c 01 )] = O(1). Note that

Similarly, we have E[1/p e (c 01 )] = O(1).

Analogous to the proof of Proposition 2, we can show Condition 2 is also satisfied.

In the observed network, the four exposure probabilities become:

whered i is the degree of vertex i in the observed network.

(i) For all values i and c k , P(p e i (c k ) > 0) → 1 as N v → ∞.

Note that P(p e i (c 10 ) > 0) = P(p e i (c 00 ) > 0) = 1 and P(p e i (c 11 ) > 0) = P(p e i (c 01 ) > 0) = P(d i > 0). Under Assumption 1 -5, we have

Thus, for all values i and c k , we have P 

.

. Define a pairwise dependency indicatorg ij in the observed network such that ifg ij = 0, then f (Z,Ã ·i ) ⊥ ⊥ f (Z,Ã ·j ), else letg ij = 1. Then, we havep e ij (c k ) =p e i (c k )p e j (c k ) ifg ij = 0. Thus, we obtain

By the triangle inequality, we have

. LetC ij denote the number of common neighbors between vertex i and j in the observed network. Note thatg ij = 1 ifC ij ≥ 1 or

, and the three binomial random variables are independent. Then, we have

. By Young's inequality and Hölder's inequality, we have

For the level c 00 , we have 

The proof of Proposition 5 is the same as that of Proposition 4.

First, we show the following lemmas that will be used in the proofs of theorems for noisy networks.

Lemma 1. Let X 1 , X 2 , · · · and Y 1 , Y 2 , · · · be two nonnegative random variable sequences.

Similarly, by continuous mapping theorem,

Thus, we obtain

. This completes the proof.

Lemma 2. Let T 1 , T 2 , · · · , X 1 , X 2 , · · · , and Y 1 , Y 2 , · · · be three random variable sequences.

and |Y n − b| ≤ c 2 h(n), where P(|X n − a| > c 1 g(n)) = o(1) and P(|Y n − b| > c 2 h(n)) = o(1), c 1 = ω(1), c 2 = ω(1). Then, we have f n (X n , Y n , T n ) P − → c as n → ∞.

Proof. By triangle inequality, we have

Then, for any ε n > 0, we have

where the last step follows by the pigeonhole principle and the sub-additivity of the probability measure. Note that lim n→∞ P(|f n (X n , Y n , T n ) − f n (a, b, T n )| > ε n /2) = 0 and lim n→∞ P(|f n (a, b, T n ) − c| > ε n /2) = 0. Thus, we obtain lim n→∞ P(|f ( n X n , Y n , T n ) − c| > ε) = 0.

Lemma 3. Let T 1 , T 2 , · · · , X 1 , X 2 , · · · , and Y 1 , Y 2 , · · · be three random variable sequences.

Proof. By the definition of the expectation, we have

Ef n (X n , Y n , T n ) − Ef n (a, b, T n ) = E f n (X n , Y n , T n ) − f n (a, b, T n )) · I {|Xn−a|>c 1 g(n) or |Yn−b|>c 2 h(n)} + E f n (X n , Y n , T n ) − f n (a, b, T n )) · I {|Xn−a|≤c 1 g(n) and |Yn−b|≤c 2 h(n)} Since f n is bounded, we have

as n → ∞.

In the noisy network, Aronow and Samii estimators for y(c k ) becomẽ

The conditional biases for these estimators are

Based on the noisy network model, we have

For exposure levels c 10 and c 00 , by taking the expectation with respect toÃ, we obtain (1) and

For exposure levels c 11 and c 01 , by Lemma 1, we have For the exposure level c 00 , we have

p e ij (c 00 , c k , c 00 , c k ) −p e i (c 00 , c k )p e j (c 00 , c k )

for k 1 , k 2 , k 3 , k 4 = 1, 2, · · · , K, i, j = 1, 2, · · · , N v .

Note that k ∈{00,01}p e i (c 00 , c k ) =p e i (c 00 ) and k ∈{00,01} k ∈{00,01}p e ij (c 00 , c k , c 00 , c k ) = p e ij (c 00 ). Then, (C.1) leads to 

The proof of Theorem 3 is same as that of Theorem 2.

(i) Unbiasedness.

Note that P(c 1 /p ≤d i ≤ c 2 /p) → 1, c 1 , c 2 are constants, and E[ỹ A&S,i |Ã] is bounded.

By the definition ofỹ i in (4.1), we havẽ 

Then,ỹ MME,i in (4.3) can be written as Remark 1. Note thatα,β are method-of-moments estimators based on three observed networksÃ,Ã * , andÃ * * . Thus, the expectations in (C.2) are actually conditional onÃ,Ã * , andÃ * * . For notational simplicity, we omitÃ * ,Ã * * .

Next, we compute the expectation of E ỹ MME,i (c k ) · I {d i =Θ(1/p)} |Ã . For the exposure level c 11 , notice that 

Similarly, we can show

Then, direct computations yield to Thus, we obtain E[ỹ MME,i (c 11 ) · I {d i =Θ(1/p)} ] = y i (c 11 ) + o(1). Analogously, we can show E ỹ MME,i (c k ) · I {d i =Θ(1/p)} ] = y i (c k ) + o(1) for other exposure levels.

(ii) Consistency.

Sinceỹ MME (c k ) is an asymptotically unbiased estimator of y(c k ), it suffices to show Var(ỹ MME (c k )) = o(1). Note thatα P − → α,β P − → β, Var(ỹ MME (c k )|Ã) and E(ỹ MME (c k )|Ã) are bounded, by Lemma 3, we have Var(ỹ MME (c k )) (unknown error rates) − Var(ỹ MME (c k )) (known error rates) = o(1). Next, we compute Var(ỹ MME (c k )) when α and β are known,

i.e.,d i =d i −(Nv−1)α 1−α−β . By Cauchy-Schwarz inequality and the inequality 2uv ≤ u 2 + v 2 , we have Following the proof of Theorem 2, we can show These complete the proof.

The proof of Theorem 5 is same as that of Theorem 4. For m i = 1, direct computations lead to the results in Theorem 8. Here, we show the case when m i ≥ 2. Note that, for all 1 ≤ x ≤ d i , we have (Das (2016))

where C 3 and C 4 are positive constants. Together with p e i (c 01 ) = 1 − p − p e i (c 00 ), we obtain Note that p e i (c 11 ) = p/(1 − p) · p e i (c 01 ) and p e i (c 10 ) = p/(1 − p) · p e i (c 00 ). The results for exposure levels c 11 and c 10 follow.

In this section, we show that assume a four-level exposure model and Bernoulli random assignment of treatment with p. In the inhomogeneous graph where the asymptotic degree distribution is the Pareto distribution with shape ζ > 1, lower bound d L , upper bound N v − 1 and meand, under Assumption 4, Condition 1 doesn't hold for levels c 10 .

As N v → ∞, we have

Note that lim x→∞ (1 − p) −x x −(ζ+1) = lim x→∞ p ζ+1 e px . By (E.1), we have E[1/p e (c 10 )] = Ω(N v · (d L p) ζ · e pNv ). (E.2)

Next, we compute the order of d L . By the definition of expectation, we have

Therefore, as N v → ∞, we obtain d L = Θ(d) when ζ > 1. By (E.2), we have E[1/p e (c 10 )] = Ω(N v ). Therefore, condition 1 doesn't hold.

Matrix estimation by universal singular value thresholding

Power-law distributions in empirical data

Planning of experiments

The two-period change-over design and its use in clinical trials

A generalization of sampling without replacement from a finite universe

Toward causal inference with interference

Network-based auto-probit modeling for protein function prediction

A latent eigenprobit model with link uncertainty for prediction of protein-protein interactions

Inter-plot competition in variety trials of field beans (vicia faba l.)

Statistical Analysis of Network Data

Structure and consistency of self-reported social contact networks in british secondary schools

Variational bayes model averaging for graphon functions and motif frequencies inference in w-graph models

Linear regression and its inference on noisy network-linked data

Identification of treatment response with social interactions

Sur les applications de la theorie des probabilites aux experiences agricoles: essai des principes (masters thesis); justification of applications of the calculus of probabilities to the solutions of certain questions in agricultural experimentation. excerpts english translation (reprinted)

Statistical inference on errorfully observed graphs

Reduced sensitivity to hidden bias at upper quantiles in observational studies with dilated treatment effects

An application of the theory of probabilities to the study of a priori pathometry.-part i

A brief note on estimates of binomial coefficients

The incomplete gamma functions. The Mathematical Gazette

Asymptotics and special functions, ak peters

Special functions: An introduction to the classical functions of mathematical physics