## Introduction

For a more general introduction to the targeted maximum likelihood estimator introduced in Díaz and van der Laan (2018) and the corresponding software implementation found in this package, a better resource to consult is the introductory vignette. This document builds on that work, describing an inverse probability of censoring weighted TML estimator (IPCW-TMLE) for the same target parameter when a censoring process is introduced into the observed data structure. In this case the observed data structure may be denoted $$O = (W, \Delta A, Y)$$, where $$W$$ is a set of baseline covariates, $$A$$ is a continuous-valued treatment or intervention, $$Y$$ is an outcome of interest, and $$\Delta$$ is an indicator of censoring ($$1$$ corresponding to being observed, $$0$$ to being missing). Perhaps the best way to appreciate the utilities provided for computing IPCW-TMLEs in this R package is to work through examples, so let’s simply jump to that.

To start, let’s load the packages we’ll use and set a seed for simulation:

library(data.table)
library(haldensify)
## haldensify v0.0.5: Highly Adaptive Lasso Conditional Density Estimation
library(sl3)
library(txshift)
## txshift v0.3.3: Efficient Estimation of the Causal Effects of Stochastic Interventions
set.seed(429153)

## Data and Notation

Consider $$n$$ observed units $$O_1, \ldots, O_n$$, where each random variable $$O = (W, A, Y)$$ corresponds to a single observational unit. Let $$W$$ denote baseline covariates (e.g., age, sex, education level), $$A$$ an intervention variable of interest (e.g., nutritional supplements), $$Y$$ an outcome of interest (e.g., disease status), and $$\Delta$$ a sampling process that masks the full data structure. Though it need not be the case, let $$A$$ be continuous-valued, i.e. $$A \in \mathbb{R}$$. Let $$O_i \sim \mathcal{P} \in \mathcal{M}$$, where $$\mathcal{M}$$ is the nonparametric statistical model defined as the set of continuous densities on $$O$$ with respect to some dominating measure. To formalize the definition of stochastic interventions and their corresponding causal effects, we introduce a nonparametric structural equation model (NPSEM), based on Pearl (2000), to define how the system changes under posited interventions: \begin{align*}\label{eqn:npsem} W &= f_W(U_W) \\ \Delta &= f_{\Delta}(W, U_{\Delta}) \\ A &= f_A(W, U_A) \\ Y &= f_Y(A, W, U_Y), \end{align*} We denote the observed data structure $$O = (W, \Delta, \Delta A, Y)$$, in which some observations are missing — for now, let us consider a simple case-control sampling mechanism wherein the treatment is partially censored. The sampling mechanism $$\Delta \in \{0, 1\}$$ allows units assigned $$1$$ to be observed and forces those assigned $$0$$ to be censored.

Letting $$A$$ denote a continuous-valued treatment, we assume that the distribution of $$A$$ conditional on $$W = w$$ has support in the interval $$(l(w), u(w))$$ – for convenience, let this support be a.e. That is, the minimum natural value of treatment $$A$$ for an individual with covariates $$W = w$$ is $$l(w)$$; similarly, the maximum is $$u(w)$$. Then, a simple stochastic intervention, based on a shift $$\delta$$, may be defined $$$\label{eqn:shift} d(a, w) = \begin{cases} a - \delta & \text{if } a > l(w) + \delta \\ a & \text{if } a \leq l(w) + \delta, \end{cases}$$$ where $$0 \leq \delta \leq u(w)$$ is an arbitrary pre-specified value that defines the degree to which the observed value $$A$$ is to be shifted, where possible. For the purpose of using such a shift in practice, the present software provides estimators for a shift function that assumes that the density of treatment $$A$$, conditional on the covariates $$W$$, has support a.e.

### Simulate Data

# simulate simple data for tmle-shift sketch
n_obs <- 500  # sample size
n_w <- 1  # just one baseline covariate for this example
tx_mult <- 2  # multiplier for the effect of W = 1 on the treatment

# baseline covariate -- simple, binary
W <- as.numeric(replicate(n_w, rbinom(n_obs, 1, 0.5)))

# set and organize treatment based on baseline W
A <- as.numeric(rnorm(n_obs, mean = tx_mult * W, sd = 1))

# create outcome as linear function of A, W + white noise
Y <- A + W + rnorm(n_obs, mean = 0, sd = 1)

# censoring based on covariates
C <- rbinom(n_obs, 1, plogis(W))

# treatment shift parameter
delta <- 0.5

## Methodology

Note: In all examples below, the argument eif_reg_type is set to "glm" (whereas the default is "hal") when invoking an IPCW-TMLE. This is done in an effort to speed along the computation process, as this argument controls the manner in which a nuisance regression associated with the efficient influence function of the IPCW-TMLE is fit. As stated in the documentation, setting this argument to "glm" fits this EIF nuisance regression with a generalized linear model (GLM; an assumption-laden parametric form) while the default value of "hal" fits the nuisance regression with the highly adaptive lasso (HAL; a recently developed nonparametric estimator) via the hal9001 R package (Coyle, Hejazi, and van der Laan 2019). In practice, we recommend leaving this argument to the default and fitting the EIF nuisance regression with HAL; however, this is significantly slower than simple using a GLM.

### Inverse Probability Weighting with Targeted Maximum Likelihood Estimation

tmle_hal_shift_1 <- txshift(W = W, A = A, Y = Y,
C = C, V = c("W", "Y"),
delta = delta,
fluctuation = "standard",
max_iter = 2,
ipcw_fit_args = list(fit_type = "glm"),
g_fit_args = list(fit_type = "hal",
n_bins = 5,
grid_type = "equal_mass",
lambda_seq =
exp(seq(-1, -9,
length = 300))),
Q_fit_args = list(fit_type = "glm",
glm_formula = "Y ~ ."),
eif_reg_type = "glm"
)
summary(tmle_hal_shift_1)
##      lwr_ci   param_est      upr_ci   param_var    eif_mean   estimator
##      1.8917      2.0738      2.2559      0.0086 -1.5522e-04        tmle
##      n_iter
##           1

When computing any such TML estimator, we may, of course, vary the regressions used in fitting the nuisance parameters; however, an even simpler variation is to fit the step for the fluctuation submodels with a weighted method, simply weighting each observation by an auxiliary covariate (often denoted $$H_n$$, and sometimes called the “clever covariate” in the literature) rather than using such a covariate directly in the submodel regression fit.

tmle_hal_shift_2 <- txshift(W = W, A = A, Y = Y,
C = C, V = c("W", "Y"),
delta = delta,
fluctuation = "weighted",
max_iter = 2,
ipcw_fit_args = list(fit_type = "glm"),
g_fit_args = list(fit_type = "hal",
n_bins = 5,
grid_type = "equal_mass",
lambda_seq =
exp(seq(-1, -9,
length = 300))),
Q_fit_args = list(fit_type = "glm",
glm_formula = "Y ~ ."),
eif_reg_type = "glm"
)
summary(tmle_hal_shift_2)
##      lwr_ci   param_est      upr_ci   param_var    eif_mean   estimator
##      1.8925      2.0747      2.2568      0.0086 -1.5527e-04        tmle
##      n_iter
##           1

### Interlude: Constructing Optimal Stacked Regressions with sl3

To easily incorporate ensemble machine learning into the estimation procedure, we rely on the facilities provided in the sl3 R package (Coyle et al. 2020). For a complete guide on using the sl3 R package, consider consulting https://tlverse.org/sl3.

# SL learners to be used for most fits (e.g., IPCW, outcome regression)
lrnr_lib <- make_learner_stack("Lrnr_mean", "Lrnr_glm", "Lrnr_ranger")
sl_learner <- Lrnr_sl$new(learners = lrnr_lib, metalearner = Lrnr_nnls$new())

# SL learners for conditional densities to be used for the propensity score fit
lrnr_dens_lib <- make_learner_stack(list("Lrnr_haldensify", n_bins = 3,
grid_type = "equal_range",
lambda_seq = exp(seq(-1, -7,
length = 100))),
list("Lrnr_haldensify", n_bins = 3,
bin_method = "equal_mass",
lambda_seq = exp(seq(-1, -7,
length = 100)))
)
sl_learner_density <- Lrnr_sl$new(learners = lrnr_dens_lib, metalearner = Lrnr_solnp_density$new())

### Estimating the IPCW-TMLE with Optimal Stacked Regressions

Using the framework provided by the sl3 package, the nuisance parameters of the TML estimator may be fit with ensemble learning, using the cross-validation framework of the Super Learner algorithm of van der Laan, Polley, and Hubbard (2007). In principal, it would be desirable to estimate the parameter using data-adaptive stacked regressions for the sampling mechanism (via ipcw_fit_args, the treatment mechanism (via g_fit_args), and the outcome mechanism (via Q_fit_args). This could be done with a call to the txshift function like the following; however, we avoid running the following code chunk for the sake of saving time in this vignette.

tmle_sl_shift_1 <- txshift(W = W, A = A, Y = Y,
C = C, V = c("W", "Y"),
delta = delta,
fluctuation = "standard",
max_iter = 2,
ipcw_fit_args = list(fit_type = "sl",
sl_learnrs = sl_learner),
g_fit_args = list(fit_type = "sl",
sl_learners_density =
sl_learner_density),
Q_fit_args = list(fit_type = "sl",
sl_learners = sl_learner),
eif_reg_type = "glm"
)
summary(tmle_sl_shift_1)

As before, we may vary the regression for the submodel fluctuation procedure by weighting each observation by the value of the auxiliary covariate rather than using such an auxiliary covariate directly in the regression procedure. Note that we again avoid running the following code chunk for the sake of saving time in this vignette.

tmle_sl_shift_2 <- txshift(W = W, A = A, Y = Y,
C = C, V = c("W", "Y"),
delta = delta,
fluctuation = "weighted",
max_iter = 2,
ipcw_fit_args = list(fit_type = "sl",
sl_learners = sl_learner),
g_fit_args = list(fit_type = "hal",
n_bins = 5,
grid_type = "equal_mass",
lambda_seq =
exp(seq(-1, -9,
length = 300))),
Q_fit_args = list(fit_type = "sl",
sl_learners = sl_learner),
eif_reg_type = "glm"
)
summary(tmle_sl_shift_2)

### Estimating Inefficient IPCW-TMLEs with Optimal Stacked Regressions

Utilizing the sl3 R package in exactly the same manner, it is also possible to estimate inefficient IPCW-TML estimators with txshift. The inefficient IPCW-TMLE performs nearly all of the same computations as the efficient version of the estimator, eschewing only the iterative targeting procedure performed over a more complex efficient influence function; the interested reader is referred to Rose and van der Laan (2011) for further details on the differences between the efficient and inefficient IPCW-TMLEs. By default, an efficient IPCW-TMLE is computed when censoring is detected; however, this may be disable by setting the ipcw_efficiency argument to FALSE like so

tmle_sl_ineffic <- txshift(W = W, A = A, Y = Y,
C = C, V = c("W", "Y"),
delta = delta,
fluctuation = "standard",
ipcw_fit_args = list(fit_type = "sl",
sl_learners = sl_learner),
g_fit_args = list(fit_type = "hal",
n_bins = 5,
grid_type = "equal_mass",
lambda_seq =
exp(seq(-1, -9,
length = 300))),
Q_fit_args = list(fit_type = "sl",
sl_learners = sl_learner),
ipcw_efficiency = FALSE
)
summary(tmle_sl_ineffic)
##      lwr_ci   param_est      upr_ci   param_var    eif_mean   estimator
##      1.8224      2.0958      2.3692      0.0195 -4.5051e-15        tmle
##      n_iter
##           0

Please note that in such cases, the value of n_iter in the resulting output should be exactly zero, as this indicates that the iterative procedure that is used to achieve efficiency has been skipped.

### Statistical Inference for Targeted Maximum Likelihood Estimates

For a discussion of the procedure for obtaining statistical inference for TML estimators, the interested reader is referred to the introductory vignette of this package. Here, we focus on addressing the issue of how censoring impacts the inferential procedure…

(ci_shift <- confint(tmle_hal_shift_1))
##   lwr_ci      est   upr_ci
## 1.891726 2.073822 2.255918

In some special cases it may be useful for the experienced user to compute the censoring mechanism, treatment mechanism, and outcome mechanism regressions separately (i.e., outside of the txshift wrapper function), instead applying this user-facing wrapper only to invoke the targeting steps involved in computing the TML estimator for the treatment shift parameter. In such cases, the optional arguments ipcw_fit_ext, gn_fit_ext and Qn_fit_ext may be utilized. We present a case of using these arguments here:

# compute the censoring mechanism and produce IPC weights externally
pi_mech <- plogis(W)
ipc_weights_out <- (as.numeric(C == 1) / pi_mech)[C == 1]
ipcw_out <- list(pi_mech = pi_mech, ipc_weights = ipc_weights_out)

# compute treatment mechanism (propensity score) externally
## first, produce the down-shifted treatment data
gn_downshift <- dnorm(A - delta, mean = tx_mult * W, sd = 1)
## next, initialize and produce the up-shifted treatment data
gn_upshift <- dnorm(A + delta, mean = tx_mult * W, sd = 1)
## now, initialize and produce the up-up-shifted (2 * delta) treatment data
gn_upupshift <- dnorm(A + 2 * delta, mean = tx_mult * W, sd = 1)
## then, initialize and produce the un-shifted treatment data
gn_noshift <- dnorm(A, mean = tx_mult * W, sd = 1)
## finally, put it all together into an object like what's produced internally
gn_out <- as.data.table(cbind(gn_downshift, gn_noshift, gn_upshift,
gn_upupshift))[C == 1, ]
setnames(gn_out, c("downshift", "noshift", "upshift", "upupshift"))

# compute outcome regression externally
Qn_noshift <- (W + A - min(Y)) / diff(range(Y))
Qn_upshift <- (W + A + delta - min(Y)) / diff(range(Y))
Qn_noshift[Qn_noshift < 0] <- .Machine$double.neg.eps Qn_noshift[Qn_noshift > 1] <- 1 - .Machine$double.neg.eps
Qn_upshift[Qn_upshift < 0] <- .Machine$double.neg.eps Qn_upshift[Qn_upshift > 1] <- 1 - .Machine$double.neg.eps
Qn_out <- as.data.table(cbind(Qn_noshift, Qn_upshift))[C == 1, ]
setnames(Qn_out, c("noshift", "upshift"))

# invoke the wrapper function only to apply the targeting step
tmle_shift_spec <- txshift(W = W, A = A, Y = Y, delta = delta,
C = C, V = c("W", "Y"),
fluctuation = "standard",
max_iter = 2,
ipcw_fit_args = list(fit_type = "external"),
g_fit_args = list(fit_type = "external"),
Q_fit_args = list(fit_type = "external"),
ipcw_fit_ext = ipcw_out,
gn_fit_ext = gn_out,
Qn_fit_ext = Qn_out,
eif_reg_type = "glm")
summary(tmle_shift_spec)
##     lwr_ci  param_est     upr_ci  param_var   eif_mean  estimator     n_iter
##     1.8795      2.073     2.2666     0.0098 6.9825e-05       tmle          1

## References

Coyle, Jeremy R, Nima S Hejazi, Ivana Malenica, and Oleg Sofrygin. 2020. sl3: Modern Pipelines for Machine Learning and Super Learning. https://github.com/tlverse/sl3. https://doi.org/10.5281/zenodo.1342293.

Coyle, Jeremy R, Nima S Hejazi, and Mark J van der Laan. 2019. hal9001: The Scalable Highly Adaptive Lasso. https://github.com/tlverse/hal9001. https://doi.org/10.5281/zenodo.3558313.

Díaz, Iván, and Mark J van der Laan. 2018. “Stochastic Treatment Regimes.” In Targeted Learning in Data Science: Causal Inference for Complex Longitudinal Studies, 167–80. Springer Science & Business Media.

Pearl, Judea. 2000. Causality. Cambridge university press.

Rose, Sherri, and Mark J van der Laan. 2011. “A Targeted Maximum Likelihood Estimator for Two-Stage Designs.” The International Journal of Biostatistics 7 (1): 1–21.

van der Laan, Mark J, Eric C Polley, and Alan E Hubbard. 2007. “Super Learner.” Statistical Applications in Genetics and Molecular Biology 6 (1).