Causal Mediation Analysis for Stochastic Interventions

Authors: Nima Hejazi and Iván Díaz

What’s medshift?

The medshift R package is designed to provide facilities for estimating a parameter that arises in a decomposition of the population intervention causal effect into the (in)direct effects under stochastic interventions in the setting of mediation analysis. medshift is designed as an implementation to accompany the methodology described in Díaz and Hejazi (2020). Implemented estimators include the classical substitution (G-computation) estimator, an inverse probability weighted (IPW) estimator, an efficient one-step estimator using cross-fitting (Pfanzagl and Wefelmeyer 1985; Zheng and van der Laan 2011; Chernozhukov et al. 2018), and a cross-validated targeted minimum loss (TML) estimator based on the method of universal least favorable submodels (van der Laan and Rose 2011; Zheng and van der Laan 2011; van der Laan and Gruber 2016). medshift integrates with the sl3 R package (Coyle et al. 2020) to allow constructed estimators to leverage machine learning and implements its TML estimator via the architecture exposed by the tmle3 R package.


Install the most recent version from the master branch on GitHub via remotes:



To illustrate how medshift may be used to estimate the effect of applying a stochastic intervention to the treatment (A) while keeping the mediator(s) (Z) fixed, consider the following example:


# produces a simple data set based on ca causal model with mediation
make_simple_mediation_data <- function(n_obs = 1000) {
  # baseline covariate -- simple, binary
  W <- rbinom(n_obs, 1, prob = 0.50)

  # create treatment based on baseline W
  A <- as.numeric(rbinom(n_obs, 1, prob = W / 4 + 0.1))

  # single mediator to affect the outcome
  z1_prob <- 1 - plogis((A^2 + W) / (A + W^3 + 0.5))
  Z <- rbinom(n_obs, 1, prob = z1_prob)

  # create outcome as a linear function of A, W + white noise
  Y <- Z + A - 0.1 * W + rnorm(n_obs, mean = 0, sd = 0.25)

  # full data structure
  data <-, Z, A, W))
  setnames(data, c("Y", "Z", "A", "W"))

# set seed and simulate example data
example_data <- make_simple_mediation_data()

# compute one-step estimate for an incremental propensity score intervention
# that triples (delta = 3) the individual-specific odds of receiving treatment
os_medshift <- medshift(W = example_data$W, A = example_data$A,
                        Z = example_data$Z, Y = example_data$Y,
                        delta = 3, estimator = "onestep",
                        estimator_args = list(cv_folds = 3))
#>       lwr_ci    param_est       upr_ci    param_var     eif_mean    estimator 
#>       0.7401     0.788136     0.836172     0.000601 4.408236e-17      onestep

For details on how to use data adaptive regression (machine learning) techniques in the estimation of nuisance parameters, consider consulting the vignette that accompanies this package.


If you encounter any bugs or have any specific feature requests, please file an issue.


Contributions are very welcome. Interested contributors should consult our contribution guidelines prior to submitting a pull request.


After using the medshift R package, please cite the following:

      title={Causal mediation analysis for stochastic interventions},
      author={D{\'\i}az, Iv{\'a}n and Hejazi, Nima S},
      url = {},
      doi = {10.1111/rssb.12362},
      journal={Journal of the Royal Statistical Society: Series B
        (Statistical Methodology)},
      publisher={Wiley Online Library}

      author = {Hejazi, Nima S and D{\'\i}az, Iv{\'a}n},
      title = {{medshift}: Causal mediation analysis for stochastic
      year  = {2020},
      url = {},
      note = {R package version 0.1.4}


Chernozhukov, Victor, Denis Chetverikov, Mert Demirer, Esther Duflo, Christian Hansen, Whitney Newey, and James Robins. 2018. “Double/Debiased Machine Learning for Treatment and Structural Parameters.” The Econometrics Journal 21 (1).

Coyle, Jeremy R, Nima S Hejazi, Ivana Malenica, and Oleg Sofrygin. 2020. sl3: Modern Pipelines for Machine Learning and Super Learning.

Díaz, Iván, and Nima S Hejazi. 2020. “Causal Mediation Analysis for Stochastic Interventions.” Journal of the Royal Statistical Society: Series B (Statistical Methodology). Wiley Online Library.

Pfanzagl, J, and W Wefelmeyer. 1985. “Contributions to a General Asymptotic Statistical Theory.” Statistics & Risk Modeling 3 (3-4): 379–88.

van der Laan, Mark J, and Susan Gruber. 2016. “One-Step Targeted Minimum Loss-Based Estimation Based on Universal Least Favorable One-Dimensional Submodels.” The International Journal of Biostatistics 12 (1): 351–78.

van der Laan, Mark J, and Sherri Rose. 2011. Targeted Learning: Causal Inference for Observational and Experimental Data. Springer Science & Business Media.

Zheng, Wenjing, and Mark J van der Laan. 2011. “Cross-Validated Targeted Minimum-Loss-Based Estimation.” In Targeted Learning, 459–74. Springer.