key: cord-0591774-5ocf2vxx
authors: Sun, Haoqi; Leone, Michael J.; Liu, Lin; Mukerji, Shabani S.; Robbins, Gregory K.; Westover, M. Brandon
title: Clinically Relevant Mediation Analysis using Controlled Indirect Effect
date: 2020-06-21
journal: nan
DOI: nan
sha: c5b0d69333093077973ba163bf7a83e64328b058
doc_id: 591774
cord_uid: 5ocf2vxx

Mediation analysis allows one to use observational data to estimate the importance of each potential mediating pathway involved in the causal effect of an exposure on an outcome. However, current approaches to mediation analysis with multiple mediators either involve assumptions not verifiable by experiments, or estimate the effect when mediators are manipulated jointly which precludes the practical design of experiments due to curse of dimensionality, or are difficult to interpret when arbitrary causal dependencies are present. We propose a method for mediation analysis for multiple manipulable mediators with arbitrary causal dependencies. The proposed method is clinically relevant because the decomposition of the total effect does not involve effects under cross-world assumptions and focuses on the effects after manipulating (i.e. treating) one single mediator, which is more relevant in a clinical scenario. We illustrate the approach using simulated data, the"framing"dataset from political science, and the HIV-Brain Age dataset from a clinical retrospective cohort study. Our results provide potential guidance for clinical practitioners to make justified choices to manipulate one of the mediators to optimize the outcome.

Inferring causal effects and the mediating pathways from observational and/or experimental data is one of the most important problems in healthcare and artificial intelligence (1) . In animal and some human studies, it is possible to conduct a randomized controlled trial (RCT) to infer the causal effect of a particular intervention on an outcome. RCTs are considered the gold standard of causal inference given their ability to limit/reduce multiple sources of bias (2) . However, an RCT may not be feasible or ethical for certain interventions. In these cases, researchers must conduct observational studies instead and adjust for potential biases using statistical methods. Advances in statistical methods in causal inference (3; 4; 5; 6) have led to the possibility of studying causal effects in a mathematically principled way using observational data to guide healthcare practice. These methods often allow estimating causal effects in settings where subjects were assigned to an exposure non-randomly based on their characteristics such as age or disease severity at admission (4) . Note that throughout this paper we will use the word "exposure" instead of "intervention" or "treatment". "Exposure" is more general which includes intervention or treatment, or observational factors such as a disease. We use the terminology from the potential outcomes framework developed by Neyman, Rubin, and Robins (7; 8; 9) : when the assignment is equal to the observed exposure, the outcome is called "factual outcome"; otherwise it is called "counterfactual outcome"; either of which is called "potential outcome".

Mediation analysis is an important sub-field of causal inference. It aims at measuring the relative importance of each mediating pathway, by decomposing the total effect (TE) into parts including mediation due to mediators, and interactions due to the co-existence of exposure and mediators (10; 11) . The mediators are defined as those causally affected by the exposure, while also causally affecting the outcome. The categorization of a variable as being a mediator or a confounder is determined by human knowledge or temporal ordering, if any. The total effect can be decomposed in various ways including (1) controlled direct effect and eliminated effect;

(2) natural direct effect and natural indirect effect (12) ; and

(3) 4-way decomposition: controlled direct effect, reference interaction, mediated interaction, and pure indirect effect (10) .

The extension of these approaches into multiple mediators with arbitrary causal dependency is challenging: For decomposition (1) , the eliminated effect represents all effects other than the controlled direct effect (13) which cannot be contributed to each mediator. For decomposition (2) , although the division into natural direct and indirect effects is simple and can be done even in the presence of interaction, this decomposition involves cross-world effects (nested counterfactuals with different exposures), which does not correspond to any randomized experiment performed via interventions on the exposure and/or mediator (13) ; the identification of these effects also requires a strong "sequential ignorability" assumption which rules out the possibility of assessing each mediator when they are causally dependent (14) . For decomposition (3), the pure indirect effect in the case of multiple mediators requires estimating the joint potential outcome, i.e. the potential outcome when the exposure and all mediators would have been set to particular values, which is not clinically practical and suffers from the curse of dimensionality when the number of mediators is large.

In this work, we propose a "clinically relevant" mediation analysis approach to decompose the total effect for multiple manipulable mediators with arbitrary causal dependencies, which overcomes the above limitations. Note that we limit the scope to binary exposure and mediator. Here, clinical relevance means (1) the mediators are manipulable, such as treating a co-morbidity (mediator) of a disease (exposure); (2) the decomposition involves terms related to the effect due to one mediator being treated, which mimics clinical practice that a physician might focus on treating one mediator at one time, rather than treating all mediators jointly. Intervening on a single mediator makes it possible to rank mediators (comorbidities) based on their CIE (the change in the outcome if everyone's k-th comorbitidy is treated). So that we can make decision to give priority to the top mediators (comorbidities) to spend resources (doctor's attention, medication, research). Since CIE can be viewed as the total effect of mediator on outcome, its effect includes the downstream mediators in case of multiple mediators and confounders; In terms of multiple mediators and confounding, CIE k includes all its downstream effect, in other words, CIE k is the total effect of treating the k-th mediator on the outcome.; and (3) there is no cross-world effects, and thus all the quantities in our decomposition can be experimentally validated, such as using the parallel (encouragement) experiment design (15) . In particular, we propose decomposing the total effect into two components for each mediator: the "controlled direct effect"; and "scaled controlled indirect effect" which is a function of CIE (CIE is the effect due to one mediator being treated).

Causal inference and mediation analysis make up an under-represented but scientifically valuable field in artificial intelligence and machine learning applied to healthcare. They help healthcare practitioners and researchers understand the underlying data-generating mechanisms by prospectively or retrospectively observing patients. In general, machine learning algorithms that take causality into account have great potential to guide decision-making in healthcare based not on association but on causality, improving the algorithm performance and transferability to different settings since the causal mechanisms are stable (16) . The approach developed in this paper provides a new method for mediation analysis that, when applied to a clinical problem, can provide insight into the consequences of preventing or treating a co-morbidity that mediates the effect of a particular disease on a particular outcome. This is a core problem in medicine; much of medicine is devoted to mitigating the effects of a disease by treating a resultant co-morbidity. Our method provides an improved way to quantify the possible effect, or clinical benefit, of such a mitigation strategy on downstream clinical outcomes. Additionally, by allowing arbitrary causal dependencies among multiple mediators, this provides flexibility for a clinician to consider the mediator of interest in clinically realistic scenario.

The existing works mostly focus on the extension of decomposition (2), i.e. the natural direct and indirect effect approach. (17) and (18) extend it to multiple mediators by considering all mediators jointly as one vector-valued mediator, so that the "sequential ignorability" assumption (no exposure-induced mediator-outcome confounder) still holds. (19) still estimates the indirect effect of each mediator (although the "sequential ignorability" assumption is violated), but uses sensitivity analysis to assess the robustness of their results to the violation of the assumption. As we mentioned above, this approach is not clinically relevant since the natural effects cannot be verified by any experiment. There are also works focusing on the extension of decomposition (3) . (20) extends the 4-way decomposition to the finest decomposition that unifies multiple mediators and interactions for causally independent mediators. With more mediators, it becomes incrementally difficult to define, identify, and estimate these components.

Our approach is closer to the interventional effect approach in (18; 21; 22) . The interventional indirect effect is defined as the contrast in the outcome if we fix the exposure, while changing the mediator from a sampled value from the distribution of the mediator among all subjects with one exposure to a sampled value from the distribution from another exposure. However, the sum of interventional direct and indirect effect is not equal to the total effect. In contrast, in our appraoch, the controlled direct effect and the scaled controlled indirect effect add up to the total effect. And we fix the mediator to 0 or 1 for the ease of clinical practice.

We use Y to denote the outcome, e.g. mortality, cognitive test score, or a physiological measurement. A denotes the exposure (e.g. taking a pill, infection with HIV or coronavirus, or developing a disease such as Alzheimer's). M k denotes the k-th mediator, e.g. a co-morbid medical condition which worsens the outcome. L denotes the set of covariates, e.g. a patient's age, gender, race, smoking status, and years of education. Here we limit the scope to binary A and M ; Y is discrete or continuous; and L is a vector of any type of variable. There are K mediators.

In general, given a causal DAG, the total effect (TE) can be decomposed into (proof in Appendix A)

where

Y k (a, m) = Y (a, M 1 (a), · · · , M k−1 (a), m, M k+1 (a), · · · , M K (a)) ;

M k (a) = M k (a, P a{M k }(a)) .

Here we denote Y (A = a, M k = m), or simply Y k (a, m), as the potential outcome of Y when A would have been a, the k-th mediator would have been m, and the other mediators were behaving as if A was a. CDE k (0) is the controlled direct effect for the k-th mediator, defined as the contrast in the potential outcome when the exposure changes from 0 to 1, while fixing the k-th mediator to be 0; other mediators were behaving as if A was a. sCIE k is the scaled controlled indirect effect for the k-th mediator, defined as the controlled indirect effect scaled by the potential outcome of the k-th mediator when fixing the exposure to 1, subtracting the same quantity but when fixing the exposure to 0. CIE k (a) is the controlled indirect effect of the k-th mediator, defined as the contrast in the potential outcome when the k-th mediator changes from 0 to 1, while fixing the exposure to a and other mediators were behaving as if A was a. P a{M k }(a) = {M j (a)} j∈Parent of M k which is the set of causal parents of the k-th mediator in the given DAG.

Note that there is no cross-world potential outcome such as M k (1, P a{M k }(0)). Also note that Equation (6) 

, M 2 (a, M 1 (a))); and so forth. If the k-th mediator is not causally affected by the exposure, the a in the parenthesis can be dropped.

We have the following corollary (proof in Appendix B):

which shows the total effect can also be decomposed as the average of the CDEs of all mediators, and the average of the sCIEs of all mediators, reflecting the average percentage of direct and indirect effects across all mediators. This corollary also provides an alternative way to estimate the total effect, which could serve as a less biased estimate by canceling the model mis-specification biases from each single mediator. This is a trade of precision for accuracy, because the estimate of the average sCIE is improved, but knowing the contribution of any particular mediator is lost.

Suppose (omitting subscript k)

We can look at the extreme cases

When ∆C = 0, i.e. CIE(0) = CIE(1), hence no interaction between the mediator and exposure, sCIE only contains the mediated effect which is the difference in the outcome if that mediator is changed from 0 to 1, scaled by the increase in the probability of the mediator. When ∆M = 0, i.e. M (0) = M (1), hence no mediation, sCIE only contains the interaction between the mediator and the exposure, scaled by the constant probability of the mediator. Therefore when ∆M = 0 and/or ∆C = 0, sCIE is a mixture of mediation and interaction effects. In contrast, CIE is the total effect of mediator on the outcome.

There are three assumptions needed to identify M k (a) and Y k (a, m), and hence CDE k (0), CIE k (a), and sCIE k , from observational data.

1. Consistency assumption: an individual's potential outcome under the observed exposure is equal to the observed outcome

Consistency may be violated if there are multiple versions of exposure (23) . It is unlikely the case in the "framing" dataset. In the case of the HIV-BA dataset, although there are multiple ways to contract HIV-1, we consider HIV-1 infection status as a single exposure because the viral processes in the body following infection are generally similar across patients.

2. Positivity assumption: there is a positive probability of receiving every level of exposure for every combination of values of exposure, mediator of interest, and confounding variables in the population. Usually, large sample size can alleviate this assumption. Positivity assumption is an important assumption for weighting based estimation methods such as inverse propensity weight and doubly robust estimation.

3. Ignorability assumption: the exposed and unexposed subjects have equal distributions of potential outcomes when conditioned on confounding variables. This is sometimes referred as exchangeability assumption.We need two ignorability assumptions:

These assumptions can be equivalently expressed as the causal DAG is correct. Hence, we can prove the above equations for multiple mediators with arbitrary causal dependency using d-separation in the single world intervention graph (SWIG) (24) . The proof is given in Appendix C. Note that we are not using natural direct or indirect effect, therefore the much stronger sequential ignorability assumption is not needed (14).

CDE, CIE and sCIE are defined as functions of M k (·) and Y (·), which need to be estimated from data. Therefore, the unbiasedness property (consistency, zero bias in the limit of infinite data, not to be confused with the consistency assumption in Section 3.3 for causal inference) partially depends on the unbiasedness of M k (·) and Y (·) (other than other biases such as selection bias or measurement error).

To this end, we can use doubly robust estimation (25) , which entails less biased estimation. The doubly robust property is described by a class of models which admits a doubly robust first order influence function (26) . Their influence function has the form of product of two models' influence functions. For example, suppose Y and A are univariate random variables that are dependent on observed data X, the expected product of two conditional expectations ψ

] is a doubly robust estimator (27) ; the other well-known example is the doubly robust estimator for the total effect (average treatment effect), which is unbiased if at least one of the outcome (f function below) or propensity model (g function below) is unbiased.

The doubly robust estimator is written as

where

Here we used the principled approach introduced in (27). In estimating either TE of exposure on outcome, or CIE (TE of of mediator on outcome), we want to minimize the bias E[ψ − ψ], where ψ is the ground truth TE and ψ is the estimated TE. Directly minimizing this bias is impossible due to unknown ψ. Instead, we minimize a pseudo-risk over different choice of models, where the optimal model choice is least sensitive to perturbations due to model mis-specification. Here we used the mixed minmax solution, which is proved to have a doubly robust property, i.e. zero bias if at least one candidate estimation model is correctly specified. Here, we choose from (1) 2 -norm penalized linear regression or logistic regression; (2) 2 -norm penalized support vector machine (SVM) classifier; (3) random forest; and (4) XGBoost, a type of gradient boosting tree. For the ordinal outcome in the framing dataset introduced later, we used pairwise approach (28) to convert ordinal regression problem into binary classification and then solved using the above models.

We used nested cross-validation to fit the models. Nested cross-validation consists of an inner loop and an outer loop. The purpose of the outer loop is to compute an unbiased estimate when applied to data not part of the training set. The purpose of the inner loop was to find the best hyper-parameter, C the strength of 2 -norm penalty, to avoid overfitting. The outer loop divided the data into multiple folds. Each fold was used as the testing set, while the other folds were combined and further divided into inner folds. Each inner fold was used as the validation set, while the other inner folds were combined as the training set. The model was trained with a particular C on the training set and evaluated on the validation set. The C with the best average validation performance was chosen and re-fit using the combined training and validation sets. The model was then used to estimate the causal effects on the testing set. The final reported effects were the average effects on the testing sets from the outer loop. The confidence intervals were obtained using bootstrapping 1,000 times.

The simulated data is generated based on the causal ordering implicated by the DAG, i.e. L −→ A −→ M −→ Y . Each variable is generated as a generalized linear function of its causal parents plus noise. We first randomly generate the coefficients, take the inner product between the coefficients and causal parents plus intercept. The intercept is manually chosen to make the average of the inner product zero. We then added Gaussian noise with standard deviation 1. For binary variables such as A and M , we further applied the sigmoid transformation, and binarized it using a threshold of 0.5. The sample size N is 1,000; the number of covariates in L is 2; and the number of mediators in M is 2 or 3 depending on the DAG we study.

We also used a public dataset "framing" used in the R package "mediation" (14) . The detailed description of the framing data can be found in (29) . It is a randomized experiment in which the subjects are shown immigration stories with different framing. The exposure is whether the story is framed positively and features an European immigrant. The covariates include age, gender, education level, and income. The mediators include negative emotion and perceived harm. Emotion measures subjects' negative feeling, and is converted to 1 if more or equal to 8. Perceived harm is with respect to increased immigration, and is converted to 1 if more or equal to 7. The outcome is a four-point scale measuring the attitudes toward increased immigration. There are 265 subjects in this dataset. Note that since the exposure is randomized, we used outcome regression instead of doubly robust estimation for this dataset.

The "HIV-Brain Age" (HIV-BA) dataset comes from a retrospective cohort study which investigates the effect of HIV-1 infection (exposure) on brain age index (BAI) predicted by the sleep electroencephalogram (EEG) (30) (outcome) through multiple mediating comorbidities and sideeffects. The outcome BAI is in unit of years, and bigger value represents older age, hence worse outcome. The cohort is composed of participants with a possible sleep disorder who underwent a full-night diagnostic sleep study at a hospital's sleep lab. The HIV+ subset were those who were diagnosed with HIV infection prior to their sleep study and are currently on antiretroviral therapy based on clinical chart review. The HIV-subset never had HIV infection. The exposure is HIV infection (binary). The covariates are age, gender, race, alcoholism and smoking history. The mediators are hyperlipidemia, heart valve disorders, and insomnia (all binary). The outcome is the brain age index, which is a continuous number in unit of years. There are 43 HIV+ and 3,048 HIV-subjects.

We assume two causally independent mediators as shown in Figure 1 . It represents the case that A takes effect on Y through two independent mechanisms M 1 and M 2 . Note that here we use the example of 2 mediators, but in general it can be multiple. 

The results are shown in Table 1 . The model selection method described in Section 3.5 correctly selected linear models (logistic regression and linear regression) for the propensity models and outcome models when estimating M k (a) and Y k (a, m). By "correct" we mean that the data is generated using a generalized linear model. Since this is simulated data, we can get the ground truth effects by directly manipulating A and M 's. All effects except CDE for M 1 and M 2 and the total effect for M 1 is within the 95% confidence interval. The bias in estimating CDE, and hence in total effect, is due to the bias in the estimated coefficient when using the 2 penalized linear models. The confidence interval for sCIE is in general wider than that for CIE because sCIE is a function of CIE(a) and M (a) which jointly considers the exposure and mediator. 

For the framing dataset, we used emotion and perceived harm as the two independent mediators. The model selection method selected linear SVM for the outcome models when estimating Y k (a, m) (outcome regression is used since the exposure is assigned at random). The result is shown in Table 2 , which is consistent with the finding that emotion (35.6% sCIE) is a leading mediator compared to perceived harm (18.8% sCIE) when people are making decisions about immigration. But interestingly, the CIE of perceived harm is higher than emotion. In other words, directly reducing perceived harm could be more effective than directly improving the negative emotion (directly intervene the mediator), but it is more difficult to induce perceived harm than to induce negative emotion using different ways of framing (change mediator by intervening the exposure), due to the scaling of mediation effect as well as interaction effect (Equation (10) and (11)). The total effect estimated in (14) 

We assume three causally dependent mediators as shown in Figure 2 . It represents the case that A has an effect on Y through three mechanisms M 1 , M 2 , and M 3 , while M 1 also causes M 2 and M 3 , and M 2 also causes M 3 . Note that here we use the example of 3 mediators, but in general it can be multiple and arbitrary causal dependencies as long as there are no cycles. 

In Table 3 we show the result. The model selection method described in Section 3.5 again correctly selected linear models (logistic regression and linear regression) for the propensity models and outcome models when estimating M k (a) and Y k (a, m). Since this is simulated data, we can get the ground truth effects by directly manipulating A and M 's. The true effects are within the 95% confidence interval for M 1 and M 3 . M 2 tends to overestimate the indirect effect and underestimate the direct effect. 

People with HIV take antiretroviral therapy drugs, where some of the drugs, such as Lopinavir, Saquinavir, and Stavudine is associated with high cholesterol level in the blood (hyperlipidemia) (31) , which in turns increases the risk of heart disorders such as heart valve disorder (32) , and eventually leads to sleep disorders such as insomnia. Therefore, in this case M 1 is hyperlipidemia; M 2 is heart valve disorder; and M 3 is the insomnia. The model selection method selected linear models (logistic regression for exposure and linear regression for outcome) when estimating M k (a) and Y k (a, m). The results indicate that hyperlipidemia is an important comorbidity in HIV+ subjects. The relatively large CIE means that directly treating hyperlipidemia has an substantial effect on brain age index in HIV+ subjects; the relatively large sCIE means that HIV infection itself can substantially increases the prevalence of hyperlipidemia, which subsequently has both high mediation and interaction effects on brain age index. Heart valve disorder also has relatively large CIE; but smaller sCIE, indicating the relatively weaker increase in the prevalence of heart valve disorder due to HIV infection and hyperlipidemia and the interaction with them. On the other hand, insomnia has limited effect on brain age index compared to hyperlipidemia and heart valve disorder. Due to the limited number of 43 HIV+ subjects, the confidence interval is very wide for HIV+ subjects, indicating the importance of having enough samples for mediation analysis (detailed in Limitations in Section 5).

We have presented a clinically relevant method of mediation analysis with multiple manipulable mediators and arbitrary causal dependency, using observational data. Our approach is clinically relevant because it makes the observational data useful for doctors to think about clinical decisionmaking, as detailed in the following aspects: Since the decomposition eliminates cross-world considerations, the effects are directly related to what would happen if they took a particular course of action to treat one comorbidity. The elimination of cross-world considerations is also a lead-in to confirmation of a hypothesis in a clinical trial. It is also clinically practical since the controlled indirect effect focuses on the effect of manipulating (treating) one single mediator rather than all of them jointly. 

The last equation shows the consequences of reducing ∆M . The meaning of reducing ∆M is intuitive. If a certain medication or preventive measure reduces the risk of the mediator by a known percent, that percent multiplied by CIE(1) is the amount of outcome prevented, averaged across the exposed population. On the other hand, sCIE can also be viewed as the effect of the mediator of interest on the outcome in the exposed population (∆M · CIE(1)) beyond the baseline level of exposure-mediator interaction in the unexposed population (∆C · M (0)). Cross validation and model estimation in causal inference Regularization and using cross validation to select the regularization strength is in general not advised in effect estimation, since the loss function of regularized models do not respect the target causal effect. The idea of using perturbation as a pseudo-risk, as used in Section 3.5, represents a possible direction. Other possibilities include optimizing regularization that improves the consistency assumption, such as minimizing the difference between the factual branch of model-based potential outcome vs. the observed value, such as in (33) .

Extension to path-specific analysis Path-specific analysis is an extension to mediation analysis by looking at the effect mediated by a path (a bundle of nodes and edges) (34) . Longitudinal setting represents a typical use case in path-specific analysis (35) . The idea is TE=

, where Y π is the effect specific to path pi. The decomposition is analogous to TE=[Y (a, M (a)) − Y (a, M (a ))] + [Y (a, M (a ) − Y (a , M (a ))] =NIE+NDE, which still requires cross-world counterfactuals. In contrast, our approach represents a "controlled" flavor, vs. "natural" flavor, which is TE = CDE+sCIE = CDE(0)+f(CIE(1), CIE(0)), and may be

The controlled flavor has the advantages that CIE directly simulates what if the mediator path are intervened, answering the clinically relevant question: what if I intervene the mediating path? The disadvantage is that CIE is TE of the mediator on outcome, which is subject to unmeasured confounding. The natural flavor has the advantage that it deals with unmeasured confounding, but NIE and NDE cannot be interpreted in the clinical relevant way. As an important future work, extension of the controlled flavor into path-specific effects is needed.

Limitations First, our analysis is limited to the case where both A and M are binary (0 or 1) making it restrictive in applications. Although it is a helpful simplification to indicate if the mediator (comorbidity) is treated or not, in reality comorbidities can be reduced without being fully treated. Second, we have not considered other types of contrast. In the present work we have focused on the difference between two potential outcomes. But depending on the data type of Y and M , different decomposition equations need to be derived and validated.

The analysis of HIV-BA dataset is limited in terms of the number of HIV+ patients. Mediation analysis requires a relatively large sample size. This is because mediation analysis divides the data into multiple strata, i.e. samples with and without the presence of each mediator in both exposed and unexposed groups. And there should be enough samples in each stratum to reduce sampling bias. In the case of nested cross-validation, the sample size should be even larger to make sure each fold in the inner loop has enough samples. The fact that our approach deals with each mediator one by one reduces the need for large sample size so that the samples need not grow with the number of mediators. This is helpful but does not completely resolve this limitation. Monte Carlo based power analysis can be done by generating the data using models estimated from actual data, up to the point significance is shown (36) .

Other limitations are, as in all causal inference studies, we did not consider all potential biases in the real data examples, including (1) unmeasured confounding, i.e. incomplete or incorrect variable list in L. We have not done sensitivity analysis to address this; (2) selection bias, especially in the HIV-BA dataset, the dataset comes from a hospital sleep lab, where the prevalence of sleep disorders is higher than that in the general population; and (3) measurement noise, i.e. possible subjectivity in the framing dataset, and measurement noise in predicted brain age in the HIV-BA dataset since it is based on a single night of brain activity monitoring, not multiple.

The proposed approach can be used to assess the importance of multiple manipulable mediators with arbitrary causal dependencies. In the case of healthcare problems where the mediators are comorbidities or side-effects of certain exposures, our approach provides principled guidance for choosing which mediator to treat in order to optimize the healthcare outcome.

A Proof of Equation (1) We have the total effect as TE = Y (a = 1) − Y (a = 0) = Y (1) − Y (0) = Y (1, M 1 (1, P a{M 1 }(1)), · · · , M K (1, P a{M K }(1))) − Y (0, M 1 (0, P a{M 1 }(0)), · · · , M K (0, P a{M K }(0))) .

Expanding the k-th mediator, we have Y (1) = Y (1, M 1 (1, P a{M 1 }(1)), · · · , M K (1, P a{M K }(1))) = Y (1, M 1 (1, P a{M 1 }(1)), · · · , 1, · · · , M K (1, P a{M K }(1))) M k (1, P a{M k }(1)) + Y (1, M 1 (1, P a{M 1 }(1)), · · · , 0, · · · , M K (1, P a{M K }(1))) (1 − M k (1, P a{M k }(1))) = Y (1, M 1 (1, P a{M 1 }(1)), · · · , 0, · · · , M K (1, P a{M K }(1))) + M k (1, P a{M k }(1)) Y (1, M 1 (1, P a{M 1 }(1)), · · · , 1, · · · , M K (1, P a{M K }(1))) − Y (1, M 1 (1, P a{M 1 }(1)), · · · , 0, · · · , M K (1, P a{M K }(1))) = Y (1, M 1 (1, P a{M 1 }(1)), · · · , 0, · · · , M K (1, P a{M K }(1))) + M k (1, P a{M k }(1))CIE k (1) .

Similarly, we have Y (0) = Y (0, M 1 (0, P a{M 1 }(0)), · · · , M K (0, P a{M K }(0))) = Y (0, M 1 (0, P a{M 1 }(0)), · · · , 0, · · · , M K (0, P a{M K }(0))) + M k (0, P a{M k }(0))CIE k (0) .

Therefore,

= Y (1, M 1 (1, P a{M 1 }(1)), · · · , 0, · · · , M K (1, P a{M K }(1))) − Y (0, M 1 (0, P a{M 1 }(0)), · · · , 0, · · · , M K (0, P a{M K }(0))) + M k (1, P a{M k }(1))CIE k (1) − M k (0, P a{M k }(0))CIE k (0)

B Proof of Corollary 3.0.1

Equation (24) and (25) are general equations obtained by expanding the k-th mediator. We repeat this for all mediators 1, . . . , K, so that T E = CDE 1 (0) + sCIE 1 ;

· · · T E = CDE K (0) + sCIE K .

Therefore,

We can graphically prove Equation (14) M k (a) ⊥ ⊥ A | L by constructing the single world intervention graph (SWIG) as in Figure 3b . The conditional independence is true since all connections between M k (a) and A must go through L, which is blocked by conditioning on L based on d-separation.

We can also graphically prove Equation (15) Y k (a, m) ⊥ ⊥ A, M k | L by constructing the SWIG as in Figure 3c . The conditional independence is true since all connections between Y k (a, m) and A, M k must go through L, which is blocked by conditioning on L based on d-separation. Figure 3 : (a) A general causal graph where the mediators in the dashed circle represent multiple mediators with arbitrary causal dependence. Both L and A causally affect each mediator; each mediator causally affect the outcome Y . Here we study the k-th mediator M k , which has M 1 and M 2 as its parents and M 3 and M 4 as its children. (b) The SWIG of panel a when intervening A to a, so that the exposure value a and the observed A are separated; and the mediators becomes potential outcome for a. We ignored the arrows pointing into the outcome. (c) The SWIG of panel a when intervening A to a and M k to k. Note that there are three versions of M k : M k is the observed value when no intervention is applied; M k (a, M 1 (a), M 2 (a)) is the potential outcome of M k when intervening A to a; and m is the intervened value of M k .

Causal inference in public health

Understanding and misunderstanding randomized controlled trials

Causal Inference: What If

Using genetic data to strengthen causal inference in observational research

A survey on causal inference

On the application of probability theory to agricultural experiments

Bayesian inference for causal effects: The role of randomization

A new approach to causal inference in mortality studies with a sustained exposure period-application to control of the healthy worker survivor effect

Explanation in causal inference: methods for mediation and interaction

A general approach to causal mediation analysis

Interpretation and identification of causal mediation

Causality and psychopathology: Finding the determinants of disorders and their cures

Mediation: R package for causal mediation analysis

Identification and sensitivity analysis for multiple causal mechanisms: Revisiting evidence from framing experiments

Elements of causal inference: foundations and learning algorithms

Mediation analysis with multiple mediators

Effect decomposition in the presence of an exposure-induced mediator-outcome confounder

Causal mediation analysis with multiple mediators

Decomposition of the total effect in the presence of multiple mediators and interactions

Interventional effects for mediation analysis with multiple mediators

Interventional effect models for multiple mediators

The consistency statement in causal inference: a definition or an assumption?

Single world intervention graphs (swigs): A unification of the counterfactual and graphical approaches to causality

Estimation of regression coefficients when some regressors are not always observed

Higher order influence functions and minimax estimation of nonlinear functionals

Selective machine learning of doubly robust functionals

Learning to rank for information retrieval

What triggers public opposition to immigration? anxiety, group cues, and immigration threat

Brain age from the electroencephalogram of sleep

The open cardiovascular medicine journal

Degenerative aortic stenosis, dyslipidemia and possibilities of medical treatment

Malts: Matching after learning to stretch

Counterfactual graphical models for longitudinal mediation analysis with unobserved confounding

Estimation of personalized effects associated with causal pathways

Determining power and sample size for simple and complex mediation models

This work was supported by the Developmental Award from the Harvard University Center for AIDS Research (HU CFAR NIH/NIAID fund 5P30AI060354-16).