key: cord-0474710-mxgcwhrs
authors: Stensrud, Mats J.; Smith, Louisa H.
title: Identification of vaccine effects when exposure status is unknown
date: 2021-11-22
journal: nan
DOI: nan
sha: 557c9d338eba26f344c01f0795f756743d4dd4a6
doc_id: 474710
cord_uid: mxgcwhrs

Results from randomized controlled trials (RCTs) help determine vaccination strategies and related public health policies. However, defining and identifying estimands that can guide policies in infectious disease settings is difficult, even in an RCT. The effects of vaccination critically depend on characteristics of the population of interest, such as the prevalence of infection, the number of vaccinated, and social behaviors. To mitigate the dependence on such characteristics, estimands (and study designs) that require conditioning or intervening on exposure to the infectious agent have been advocated. But a fundamental problem for both RCTs and observational studies is that exposure status is often unavailable or difficult to measure, which has made it impossible to apply existing methodology to study vaccine effects that account for exposure status. In this work, we present new results on this type of vaccine effects. Under plausible conditions, we show that point identification of certain relative effects is possible even when the exposure status is unknown. Furthermore, we derive sharp bounds on the corresponding absolute effects. We apply these results to estimate the effects of the ChAdOx1 nCoV-19 vaccine on SARS-CoV-2 disease (COVID-19) conditional on post-vaccine exposure to the virus, using data from a large RCT.

Vaccines are one of the most important inventions in modern medicine [1] . Justification for real-life vaccination strategies relies heavily on results from large-scale vaccine randomized controlled trials (RCTs). However, the nature of communicable disease means that defining and evaluating vaccine effects requires consideration of population characteristics such as the prevalence of current and prior infection, mixing patterns, and concurrent public health measures.

Policy-relevant estimands for vaccine trials have been discussed extensively (see Halloran et al. [2] for an overview), in particular in the context of the SARS-CoV-2 disease (COVID-19) pandemic [3] [4] [5] [6] [7] [8] [9] . However, as of yet, methods to study outcomes conditional on (or under interventions on) exposure to the infectious agent are rarely used. A key problem is that exposure status is often difficult, or even impossible, to measure in practice [2, 10] . For example, Halloran and Struchiner [11] write that measuring susceptibility to infection "might not be easy in practice and might indeed require considerable assumptions regarding who is infectious and when, how infectious the persons are, and who is exposing whom." Challenge trials, in which participants are intentionally exposed, are one option for controlling exposure status but involve serious ethical issues [12] [13] [14] .

This article specifically targets effects that account for exposure status, even when it is unmeasured. We provide new results on the interpretation and identification of causal effects of vaccines from RCTs and observational studies. The results include identification results for the causal effect of a vaccine on clinical outcomes, conditional on an unmeasured exposure to the infectious agent. Specifically, we show that, under a plausible no effect on exposure assumption, the relative effectthough not the absolute effect -of the vaccine can be point identified in an RCT.

Furthermore, under the same assumption, we derive sharp bounds for the absolute effect. We clarify how these effects are related to existing estimands, and we give identification results on per-exposure effects [10, 15] , a type of controlled direct effect, even when the exposure is unmeasured, as is often the case in practice.

The article is organized as follows. Section 2 presents the data structure and the notation. Section 3 provides definitions and interpretation of causal estimands. Section 4 contains new results on the identification of causal estimands, including point identification results for the relative causal effect conditional on exposure, and partial identification results for the absolute causal effect conditional on exposure.

Section 5 presents results for point identification of absolute causal effects conditional on exposure when external data on exposure risk are available, and suggests a sensitivity analysis when external data are unavailable. Section 6 extends the results to time-to-event outcomes, in a setting in which individuals can be censored due to loss to follow-up. Section 7 describes how our new parameters can be estimated using existing estimators, even when the outcome is unmeasured. Section 8 implements the new results in a study of the ChAdOx1 nCoV-19 (Oxford) vaccine against COVID-19.

Suppose that we have data from a randomized experiment with n individuals who are assigned a binary treatment A ∈ {0, 1} at baseline (A = 1 indicates receiving vaccine, A = 0 indicates placebo or other control). As is common in vaccine trial settings [16, 17] , 1 we consider inference in a much larger population from which the trial participants are drawn, so that interactions among patients in the trial are negligible; thus, we suppose the individuals are iid and omit the i subscript.

Let L be a vector of baseline covariates. To simplify the presentation, we suppose L is discrete, but the results generalize to continuous L. Let E ∈ {0, 1} be an indicator of whether an individual is exposed to the infectious agent (e.g., being in close contact with an actively contagious individual), which may be unobserved in the study. We first consider Y ∈ R ≥0 to be the outcome of interest (e.g., disease severity oe hospitalization) measured at a given time after randomization, where we define Y = 0 when an individual does not have the outcome. In Section 6, we extend the results to censored time-to-event outcomes.

We use superscripts to denote counterfactuals [18, 19] . For example, Y a=1 and Y a=0 are the outcomes of interest when the treatment is, possibly contrary to fact, fixed to active vaccine (a = 1) or control (a = 0).

3.1. The average treatment effect (ATE). To motivate the new contributions in this manuscript, we first review the conventional average treatment effect (ATE) of A on the outcome Y ,

which compares the average outcome in the trial population had everyone been treated (a = 1) versus not treated (a = 0). This contrast can be identified without additional assumptions when the trial is perfectly executed (perfect randomization and no losses to follow-up). However, as with any trial, the magnitude of (1) depends on the specific setting in which the RCT was conducted; in a vaccine trial, crucial characteristics include the number of currently infected in the population, the number of previously infected, the mixing pattern, and additional public health measures that may be simultaneously implemented. To generalize the results from the RCT to a policy-relevant setting, we must account for these characteristics, which is far from straightforward. some of the concerns that are raised about the ATE in vaccine trials, we could attempt to adjust for exposure to the infectious agent [2, 11] . However, defining causal effects conditional on exposure status is not straightforward because exposure status is a post-treatment variable. In particular, a naive contrast of counterfactual outcomes conditional on exposure status,

is not a causal effect when the treatment affects the post-treatment event; it compares counterfactual outcomes in different subpopulations of individuals. This is illustrated by the path A → E in the causal directed acyclic graph DAG in Figure 1a, which leads to an indirect effect of vaccination on the outcome Y through the path A → E → Y . This indirect effect is plausible if participants know their treatment status; for example, one may expect that vaccinated individuals show a reduction in protective behaviours, which increases the risk of being exposed.

3.2. The principal stratum effect (PSE). A principal stratum effect (PSE) [18, 20] compares counterfactual outcomes among individuals with the same counterfactual exposure status. We can define a particular PSE among those individuals who would be exposed to the infectious agent regardless of treatment assignment,

Unlike (2), the PSE (3) is a contrast of counterfactual outcomes in the same (sub)population of individuals, and it is therefore a causal effect. 2 However, the conditioning set in (3) is defined by exposures in the same individual under two different treatments and, without further assumptions, it is impossible to observe the individuals in this subpopulation [18] , even when E is measured. Thus, the PSE is defined in an unknown subpopulation, and the practical relevance of the PSE has been seriously questioned [21] [22] [23] [24] . In the next subsection, we will motivate an additional assumption, which ensures that the PSE is equivalent to another conditional estimand that can be identified and that arguably has subject-matter interest.

3.3. The causal effect conditional on exposure (CECE). As an alternative to the PSE, consider a contrast of counterfactual outcomes conditional on exposure status in the observed data,

Like (3), the contrast in (4) is a causal effect as it compares the same subpopulation of individuals under different treatment. Unlike (3), the conditioning set in (4) is observable when E is measured. Without additional assumptions, however, the interpretation of (4) is not straightforward, because an individual's exposure status in the observed world (E) is not guaranteed to be equal to the exposure status under an intervention that fixes the treatment to be a (E a ). Thus, in general we cannot interpret (4) as a direct effect of treatment A on the outcome Y outside of the treatment effects on exposure status.

But there is at least one setting in which differences in exposure status would not be expected between treatment groups: a blinded RCT, which is the context of many vaccine efficacy studies. The following mechanistic assumption formalizes the notion that receiving the vaccine does not exert effects on exposure status E.

Assumption (No effect on exposure).

regardless of the treatment that was assigned (i.e., vaccine or placebo). The DAG in Figure 1b describes the causal structure of a blinded RCT, in which this assumption would be expected to be met, as there is no path A → E and therefore no indirect effect of vaccination on the outcome through the path A → E → Y .

Under assumption (5), the contrasts (2)-(4) are equal, that is,

Halloran and Struchiner [11] also advocated contrasts of (counterfactual) outcomes in exposed individuals, under the assumption that "people did not change their behavior after randomization" [11] [Page 147]. Condition (5) formalizes when such contrasts are unambiguous causal effects, i.e. contrast of outcomes in the same (sub)population of individuals.

Because we focus on blinded RCTs in this work, we will use assumption (5) extensively, and under (5) we will denote the contrasts (3)-(4) collectively as the causal effect conditional on exposure (CECE), which is also equal to (2) . 3 The CECE mitigates some of the concerns that are raised about the generalizability of the ATE (1), because the CECE is confined to those individuals who are exposed to the infectious agent (in the observed data, and regardless of treatment assignment).

Thus, assumption (5) ensures that the CECE has a mechanistic interpretation as an average causal effect given exposure to the infectious agent. However, the CECE is defined among those who would be exposed in a given study, and the subset who is exposed is context-dependent.

3.4. The controlled direct effect (CDE). A special case of a controlled direct effect (CDE) [27] , also called a per-exposure effect or a challenge effect [10, 11] , is defined with respect to an intervention on the treatment A and the exposure E,

This CDE corresponds to the effect that is identified by a challenge trial [28] ; that is, a study where the participants are subject to an intervention where they are guaranteed to be physically exposed to the infectious agent. 4 Unlike the ATE (1), the CDE is defined in a controlled setting, in which all individuals are exposed to the infectious agent. Thus, this effect is insensitive to the risk of exposure in the observed population.

Finally, the estimands considered in Sections 3.1-3.4 can be defined conditional on any baseline covariate L. The distinction between estimands conditional on L and marginal estimands will be of interest when we study identification in Section 4.

To motivate the new identification results in this work, we first review three standard identifiability conditions for the ATE.

Assumption (Treatment exchangeability).

which e.g. holds in the Single World Intervention Graph (SWIG) [19] in Figure   1c , even if L is unmeasured.

Assumption (Positivity).

Conditions (7)-(9) hold by design in an RCT where treatment is (unconditionally) randomly assigned. These three conditions allow us to identify the ATE (1) 

However, our focus is on estimands (4) and (6), which are defined with respect to (counterfactual) statuses of the exposure E, so which require additional assumptions. (4). Under the no effect on exposure assumption (5) and conditions (7)-(9), it is straightforward to express the CECE as a functional of factual variables,

but the CECE, as defined as an arbitrary contrast ("vs."), is not point identified

For example, the absolute CECE,

is not possible to estimate from the observed data.

To identify the CECE, we therefore introduce an additional assumption, which relates the unmeasured E to Y , and which is plausible in infectious disease settings.

Assumption (Exposure necessity).

The exposure necessity assumption states that only individuals who were exposed to the infectious agent (say, virus) can experience the outcome (say, symptomatic disease). Thus, the exposure is a necessary condition for experiencing the outcome.

Many exposures of interest meet this criterion (e.g., contact with some amount of live virus), though sometimes researchers may be interested in other exposures that do not necessarily satisfy this criterion (e.g., sharing a home or classroom with an infected individual).

Our first theorem shows that the relative CECE is identified under the conditions we have introduced so far, which are expected to hold in a blinded RCT.

Theorem 1 (Relative CECE). Under the no effect on exposure assumption (5), standard identifiability conditions (7)-(9) and exposure necessity (10), the relative CECE is equal to

The proof is given in Appendix A. From our considerations in Section 3.3 and our derivations in Section 4.1, it follows that Theorem 1 also gives an identification result for the relative principal stratum effect. 5 Interestingly, Theorem 1 shows that the relative CECE is equal to the conventional ATE on the relative risk scale,

which is routinely reported in RCTs. 6 Thus, we give plausible conditions for a new interpretation of this estimand.

Whereas the absolute CECE is not point identified, our next theorem gives partial identification of the absolute CECE for a binary Y in terms of strict bounds. 7 To simplify the presentation of the subsequent results we suppose, without loss of

Theorem 2 (Absolute CECE). Under the no effect on exposure assumption (5) and

conditions (7)- (10), the absolute CECE is partially identified by the sharp bounds

The proof is given in Appendix A.

Remark on Theorem 2. The lower bound on the absolute CECE is equal to the absolute ATE, which is routinely calculated in randomized controlled trials.

Thus, Theorem 2 gives us a new interpretation of a standard risk difference -as a lower bound on the absolute CECE. Furthermore, this lower bound is equal to the absolute CECE if everybody is exposed.

The upper bound is 1 minus the relative ATE, which is a quantity that is often reported as the vaccine efficacy in randomized controlled trials [2] , e.g. during the COVID-19 pandemic [30] . The absolute CECE is equal to this bound if an unvaccinated individual (A = 0) will be experience the outcome (Y = 1) if and only if she is exposed (E = 1). 6 The fact that the relative CECE is identified by the same functional as the relative ATE is related to the known result in epidemiology that diagnostic tests that have perfect specificity will give unbiased estimates of risk ratios, even if these tests do mis-classify disease cases. We discuss this in Appendix C. 7 Zhao et al [29] studied another interesting setting where relative -but not absolute -risks could be point identified. Their causal question, which concerned racial discrimination in policing, was studied in a setting where the treatment (equivalent to our A) was unmeasured, but the mediator (equivalent to our E) was measured. Their estimand of interest was the conventional ATE.

It follows from Theorem 2 that the larger E(Y | A = 0), the more informative are the bounds. In particular, the lower bound is equal to the upper bound when Assumption (Exposure exchangeability).

Condition (11) is a classical exchangeability condition, analogous to assumptions that are typically implemented to identify per protocol effects in trials, causal effects from observational data, and mediation effects. This assumption is stronger than exchangeability assumption (7) . In particular, (11) does not hold unless we measure common causes of Y and E, as illustrated in the SWIG in Figure 1d .

Assumption (Exposure positivity).

Assumption (Exposure consistency).

When we impose conditions (11)-(13), the CDE can be expressed as a functional of factual variables,

However, because we do not measure E, it is not possible to identify the CDE from our observed data; we cannot identify the term E(Y | E = 1, A = a, L) without measuring E. In particular, the absolute CDE cannot be identified unless we measure E. Nevertheless, our next theorem shows that the relative CDE conditional on L is point identified.

Theorem 3 (CDE conditional on L). Under conditions (5), (10) and (11)-(13), the relative CDE conditional on the baseline covariate L is

The proof is given in Appendix A. The following corollary relates the CECE within a subpopulation defined by L and the CDE. (5), (7)- (10) and (11)-(13), the relative CECE given L = l and the relative CDE conditional on the baseline covariate L = l are equal, that is,

It is crucial that the covariate vector L in Theorem 3 and Corollary 1 is sufficient to adjust for confounding, i.e. to ensure that exposure exchangeabillity (11) holds. Thus, identification of the conditional CDEs requires stronger assumptions compared to identification of the CECE. Although the conditional CECE can be defined and estimated within any set of baseline covariates, it is only interpretable as a conditional CDE when that set of covariates consists of those sufficient to adjust for confounding.

The marginal CDE is not point identified. Whereas the conditional relative CDE can be point identified under (5), (10) and (11) 

Using laws of probability and (11)- (13) , P (L = l | Y a=0,e=1 = 1) can be written as

, which depends on probabilities conditional on E = 1 that are not estimable from observed data. However, we can point-identify the marginal CDE under the additional strong assumption that E(Y a=0,e=1 ) = 1, that is, the exposure deterministically causes the outcome if untreated. Then, P (L = l | Y a=0,e=1 = 1) = P (L = l), and thus the marginal relative CDE is point identified by

The marginal absolute PPE is point identified as

Consider a binary outcome Y ∈ {0, 1} (e.g., an indicator of symptomatic disease).

Suppose that the investigator has external knowledge about the risk of experiencing the outcome given exposure among the unvaccinated, that is, P (Y = 1 | E = 1, A = 0). Alternatively, suppose that the investigator has external knowledge about the risk of being exposed among the unvaccinated, that is, P (E = 1 | A = 0).

Knowledge of either of these probabilities could have been collected among trial eligible individuals who did not participate in the randomized experiment, or among a subset of the trial participants.

Our next corollary shows that knowledge of either P (Y = 1 | E = 1, A = 0) or P (E = 1 | A = 0) allows point identification of the absolute CECE, when we also assume the same identification conditions as in Theorem 2.

Corollary 2 (Point identification of the absolute CECE). Under the no effect on exposure assumption (5) and conditions (7)- (10),

.

The proof of Corollary 2 is given in Appendix D. Besides giving point identification results in settings with knowledge from external data, Corollary 2 motivates sensitivity analyses for the magnitude of the absolute CECE using sensitivity parameters that are justified by subject-matter reasoning; that is, the investigator can evaluate (15) and (16) under different values of the marginal sensitivity parameters P (Y = 1 | E = 1, A = 0) and P (E = 1 | A = 0), respectively.

In both RCTs and observational studies, it is common to evaluate vaccine effects on time-to-event outcomes. Our results generalize to settings where the exposure status and the outcome of interest are both time-to-event variables, which possibly are censored due to losses to follow-up.

Suppose that Y k and E k are time-to-event variables indicating whether an individual has experienced the event by time k (i.e., Y k = 1) and has been exposed by time k (i.e., E k = 1 means exposure has occurred at least once), respectively. Let C k indicate loss to follow-up (censoring) by time k > 0 (see the SWIG in Figure 2 ).

To align with the established causal inference literature [18, 19, 32] , suppose that we are interested in outcomes in discrete time intervals k = 0, . . . K, and define the temporal (and topological) order (C k , E k , Y k ) in each interval k > 0. This setting will converge to a continuous time setting when we let the time intervals become small. We continue to use superscripts to denote counterfactuals, and we formally consider a counterfactual estimand under interventions on the baseline treatment A and the censoring variable C k [33, 34] . For example, Y a,c=0 k is the counterfactual outcome of interest by time k when treatment is assigned to a and there is no loss to follow-up (c = 0, using an overbar to denote the variable's entire history). In Appendix B, we give more details on the time-to-event notation, and we state generalizations of the exchangeability, positivity, consistency, exposure necessity and the no effect on exposure conditions to settings with time-to-event outcomes (see conditions (20)- (26) ). Under these conditions, we can identify the relative CECE as a ratio of cumulative incidences, as described in the next theorem.

Theorem 4 (Relative and absolute CECE for time to event outcomes). Under exchangeability, positivity, consistency, exposure necessity and the no effect on exposure assumption for time-to-event outcomes (conditions (20)- (26) in Appendix B), the relative CECE at time k, 0 ≤ k ≤ K, is identified by the ratio of cumulative incidences,

and

Under the same conditions, the absolute CECE is partially identified by the sharp bounds Excess and etiologic fractions. Following Greenland and Robins [36] , the excess (prevented) fraction quantifies the excess of outcomes under treatment vs. control.

When the no effect on exposure assumption (5) and conditions (7)-(10) hold, the excess fraction among the exposed in interval k is 9 We restrict all our discussion to results on risks, not rates such as hazards. Despite the fact that hazards are sometimes reported as "efficacy parameters" in infectious disease settings, there are well-known limitations of considering causal estimands on the hazard scale, see e.g. [18, 35] , because of the conditioning on a post-treatment event -here previous outcomes Y -that is affected by treatment.

which quantifies the proportionate increase in caseload under no treatment [36, 37] .

In particular, the excess fraction conditional on exposure is equal to the unconditional excess fraction. Furthermore, (17) is often what is reported as the vaccine efficacy in clinical studies [2] .

The excess fraction should not be confused with the etiologic fraction, which is the fraction caused (or prevented) by treatment. where we can compute confidence intervals using standard estimators for risk ratios. 10 The estimator of the lower bound on the absolute CECE is aCECE L = µ(0) −μ(1), which is simply a difference in means estimator. The estimator for the relative conditional CDE is defined analogously to rCECE, where we also include L in the conditioning set, that is, rCDE =μ(1, l)/μ(0, l).

For the identifying functionals in Section 6, which are cumulative incidences, let µ k (a) andμ k (a, l) be estimators of µ k (a) and µ k (a, l), respectively. Then, we can follow standard procedures for calculating ratios of cumulative incidence functions with confidence intervals, see e.g. [ ing PPE. To parameterize P (E = 1 | A = 0), we might propose that 60% of the trial participants were exposed to such an amount of virus particles at some point during the 80-days period. Given the observed data, this would imply that P (Y = 1 | E = 1, A = 0) = 0.052; that is, E, here denoting a given amount of virus particles, was sufficient to cause symptomatic COVID-19 in just over 5% of unvaccinated participants during 80 days of follow-up. In this setting, we would estimate aCECE = 0.037 (Figure 3) . Suppose now that we rather set the sensitivity Note that estimating the CDE requires data on covariates L (such as comorbidities, smoking, work occupation and age, to name a few) to justify condition (11), so we have not attempted to estimate it, because this information is unavailable.

Our results are perhaps surprising: it has been suggested that estimating vaccine effects conditional on exposure status "requires information on who is infectious and when, and whom they contact and how" [2, 43] .

We have required that the exposure (say, close contact with an infectious individual) is necessary for the outcome of interest to occur (say, symptomatic disease), as stated in our exposure necessity condition (10) . An alternative approach would involve adapting the definition of exposure to something that is possible to measure.

For example, one might define exposure as close contact with infected people who present overt disease. However, such definitions have explicitly been discouraged, precisely because they would lead to an underestimate of the exposure in settings where some infections are inapparent [2] . In the case of the CECE, we have considered the exposure to be any event such that the exposure necessity condition holds.

The CDE, however, generally requires more detailed specification of E in order to justify the exchangeability condition, consistent with the idea that it must be a well-defined intervention. In future work, we will formally consider generalizability and transportability of the CECE.

When a necessary exposure is unmeasured, we have shown that relative effects can be point identified under plausible conditions, but absolute effects can only be bounded under the same conditions. Often both relative and absolute effects are of interest. However, the most commonly reported and publicized results are relative effects, as in major studies on different COVID-19 vaccines [41, 44, 45] . For researchers in these fields, the results presented in this work gives valuable, new interpretations to the numbers they compute. Figure 1 . The DAG in (a) describes a study where A is randomly assigned. The DAG in (b) further encodes the no effect on exposure assumption, which is supposed to hold in a blinded RCT. The graph in (c) is a SWIG where we have fixed the treatment to a. This SWIG can be used to study identifiability conditions for the CECE, which is identified even if L is unmeasured. The SWIG in (d) describes interventions on both A when E (e is fixed to 1), which allows us to study identifiability conditions for the CDE. Unlike the CECE, the CDE would require measurement of L.

A a E a,c=0 Proof of Theorem 1. For any a, a ′ ∈ {0, 1}, use laws of probability and conditions (5) and (7)- (10) to express

Thus,

where the second and third line again follow due to assumption (5) and (7)- (10) .

Proof of Theorem 2. We first derive an upper bound.

due to (10) and laws of prob.

The last line is an equality when (E = 1 ⇐⇒ Y = 1) | A = 0.

The last line in (19) is an equality when P (E = 1) = 1.

Proof of Theorem 3.

where the last equality follows from

using (10) and (5), similarly to the proof of Theorem (1).

Proof of Corollary 1. The result follows from including L in the conditioning set in all the derivations of Theorem 1, which then gives the same identification result as in Theorem 3.

We re-introduce the terminology from Section 6. Let Y k and E k be time-toevent variables indicating whether an individual has experienced the event by time k (Y k = 1) and being exposed by time k (E k = 1), respectively. Let C k denote loss to follow-up (censoring) by interval k > 0, and we define the temporal (and topological) order (C k , E k , Y k ) in each interval k > 0. Suppose we are interested in outcomes in time intervals k = 0, . . . , K. We adopt the convention that random variables with a negative subscript are equal to 0 (e.g., Y −1 ≡ 0).

Let the history of a random variable be denoted by an overbar, e.g. Y k = (Y 0 , Y 1 , ..., Y k ) is the history of the event of interest through interval k. Further, let the future of a random variable through K be denoted by an underline, e.g.

Consider now classical identifiability conditions for causal effects in time-to-event settings, which are just extensions of (7)- (9) .

Condition (20) holds when A is randomly assigned. Condition (21) requires that losses to follow-up are independent of future counterfactual events, given the measured past; this assumption, which corresponds to classical independent censoring assumptions, does not hold by design in a randomised trial, as losses to follow-up are not randomly assigned in practice. The treatment exchangeability conditions are satisfied in the SWIG in Figure 2 .

Assumption (Positivity).

for k < K. The positivity conditions require that for any possible history of treatment assignment and covariates among those who are event-free and uncensored at k, some subjects will remain uncensored at the next time k + 1.

Assumption (Consistency).

if A = a and C k = 0,

Consistency holds if any individual who has data history consistent with the intervention under a counterfactual scenario, would have observed outcomes that are equal to the counterfactual outcomes.

Besides the classical identifiability conditions, we introduce the following conditions, which generalize exposure necessity (10) and the no effect on exposure assumption (5) from the main text.

Assumption (Time-varying exposure necessity). 

This assumption says that the risk of exposure by any time k is the same among treated and untreated. Consider a situation in which a vaccine A prevents or delays the outcome Y . Under blinding, condition (26) would still hold because prior infection would be the only thing preventing future exposure, but under (25) , anyone with the outcome would have already been exposed. However, we must assume that blinding continues to be successful; that is, this assumption would be violated if over time individuals notice that they are not getting infected after the same level of exposure as people around them, and therefore conclude that they have been vaccinated and change behavior.

Under these conditions we sketch a proof for Theorem 4.

Sketch of proof of Theorem 4. We can invoke (25)- (26) to find that

where we used exposure necessity in the first equality, laws of probability in the second equality and the last equality follows because E(E a=0,c=0

under (26) .

Then, using treatment exchangeability, consistency and positivity, it follows that E(Y a,c=0 k ) can be expressed in terms of the cumulative incidence function at k, µ k (a).

The proof for the additive CECE follows the same structure as the proof of Theorem 2.

Appendix C. Parallel to risk ratio under perfect specificity A well-known result in epidemiology is the fact that under so-called non-differential misclassification of the outcome with perfect specificity, the exposure-outcome risk ratio is unbiased, although the risk difference is not. For example, in the setting of possibly incomplete disease ascertainment in exposed and unexposed cohorts, Lawrence and Greenwald described how a screening program could be implemented to remove false positive cases, resulting in an unbiased risk ratio [46] . The requirement of perfect specificity parallels our exposure necessity assumption, and that of non-differential misclassification parallels our assumption of no effect on exposure.

We demonstrate these parallels with the DAGs in Figure 4 . Each has one partially deterministic arrow and one independence assumption, though the causal structures differ. The partially deterministic arrow and the independence assumption allow in each case for an unbiased ratio measure, as we demonstrate in the following derivation. Take Y to be a binary outcome and A any exposure of interest (also binary for simplicity). We denote a misclassified version of the outcome with Y * . Then we have for the misclassification setting that where the second equality uses the appropriate partially deterministic arrow assumption and the third equality the appropriate independence assumption. Figure 4 . Simplified DAGs demonstrating the parallels described in Appendix C. (a) Non-differential misclassification of the outcome.

The assumption that outcome misclassification doesn't depend on exposure results in A ⊥ ⊥ Y * | Y . The heavier arrow from Y to Y * represents the perfect specificity assumption: Y = 0 =⇒ Y * = 0. (b) The setting from the main text (simplified to remove common causes of E and Y ). The no effect on exposure assumption results in A ⊥ ⊥ E. The heavier arrow from E to Y represents the exposure necessity assumption: E = 0 =⇒ Y = 0.

Proof of Corollary 2 from Section 5. We follow the same strategy as for the lower bound (18) in the main text. 

where we used exposure necessity (10) in the 5th equality, which implies that P (Y = The 4th line of the proof of Corollary 2 motivates an alternative sensitivity analysis: the investigator can specify the marginal risk of being exposed to the infectious agent given no treatment, that is, P (E = 1 | A = 0), and then point identify the risk difference.

The contribution of vaccination to global health: past, present and future

Design and analysis of vaccine studies

Clinical endpoints for evaluating efficacy in covid-19 vaccine trials

Assessment of immune correlates of protection via controlled vaccine efficacy and controlled risk

Understanding covid-19 vaccine efficacy

Interpreting vaccine efficacy trial results for infection and transmission

Estimands and inference in cluster-randomized vaccine trials

Vaccine efficacy at a point in time. medRxiv

Evaluation of post-introduction covid-19 vaccine effectiveness: Summary of interim guidance of the world health organization

Estimating the per-exposure effect of infectious disease interventions

Causal inference in infectious diseases

Covid-19 human challenge studies: ethical issues. The Lancet Infectious Diseases

A strategic approach to covid-19 vaccine r&d

Studies that intentionally infect people with disease-causing bugs are on the rise

Randomization and baseline transmission in vaccine field trials

Estimating vaccine efficacy over time after a randomized study is unblinded

Estimability and interpretation of vaccine efficacy using frailty mixing models

A new approach to causal inference in mortality studies with a sustained exposure period-application to control of the healthy worker survivor effect

Single world intervention graphs (swigs): A unification of the counterfactual and graphical approaches to causality

Principal stratification in causal inference

Principal stratification designs to estimate input data missing due to death-discussion

Principal stratification and attribution prohibition: good ideas taken too far

Imagine a can opener. the magic of principal stratum analysis

Principal stratification-uses and limitations

Principal stratification designs to estimate input data missing due to death

Conditional separable effects

Identifiability and exchangeability for direct and indirect effects

Assessing vaccine effects in repeated low-dose challenge experiments

A note on post-treatment selection in studying racial discrimination in policing

Leveraging pathogen sequence and contact tracing data to enhance vaccine trials in emerging epidemics

On the collapsibility of measures of effect in the counterfactual causal framework

Causal inference. CRC

Correcting for noncompliance and dependent censoring in an aids clinical trial with inverse probability of censoring weighted (ipcw) logrank tests

A causal framework for classical statistical estimands in failure-time settings with competing events

The hazards of hazard ratios

Conceptual problems in the definition and interpretation of attributable fractions

Estimability and estimation of excess and etiologic fractions

Some problems in interval estimation

Fieller's theorem vs. the delta method for significance intervals for ratios

Summarizing differences in cumulative incidence functions

Safety and efficacy of the chadox1 ncov-19 vaccine (azd1222) against sars-cov-2: an interim analysis of four randomised controlled trials in brazil, south africa, and the uk

High sars-cov-2 attack rate following exposure at a choir practice-skagit county, washington

Study designs for evaluating different efficacy and effectiveness aspects of vaccines

Efficacy and safety of the mrna-1273 sars-cov-2 vaccine

Safety and efficacy of the bnt162b2 mrna covid-19 vaccine through 6 months

Epidemiologic screening: A method to add efficiency to epidemiologic research