`vignettes/ipcw_txshift.Rmd`

`ipcw_txshift.Rmd`

For a more general introduction to the targeted maximum likelihood estimator introduced in Díaz and van der Laan (2018) and the corresponding software implementation found in this package, a better resource to consult is the introductory vignette. This document builds on that work, describing an inverse probability of censoring weighted TML estimator (IPCW-TMLE) for the same target parameter when a censoring process is introduced into the observed data structure. In this case the observed data structure may be denoted \(O = (W, \Delta A, Y)\), where \(W\) is a set of baseline covariates, \(A\) is a continuous-valued treatment or intervention, \(Y\) is an outcome of interest, and \(\Delta\) is an indicator of censoring (\(1\) corresponding to being observed, \(0\) to being missing). Perhaps the best way to appreciate the utilities provided for computing IPCW-TMLEs in this R package is to work through examples, so let’s simply jump to that.

To start, let’s load the packages we’ll use and set a seed for simulation:

`## haldensify v0.0.5: Highly Adaptive Lasso Conditional Density Estimation`

`## txshift v0.3.3: Efficient Estimation of the Causal Effects of Stochastic Interventions`

set.seed(429153)

Consider \(n\) observed units \(O_1, \ldots, O_n\), where each random variable \(O = (W, A, Y)\) corresponds to a single observational unit. Let \(W\) denote baseline covariates (e.g., age, sex, education level), \(A\) an intervention variable of interest (e.g., nutritional supplements), \(Y\) an outcome of interest (e.g., disease status), and \(\Delta\) a sampling process that masks the full data structure. Though it need not be the case, let \(A\) be continuous-valued, i.e. \(A \in \mathbb{R}\). Let \(O_i \sim \mathcal{P} \in \mathcal{M}\), where \(\mathcal{M}\) is the nonparametric statistical model defined as the set of continuous densities on \(O\) with respect to some dominating measure. To formalize the definition of stochastic interventions and their corresponding causal effects, we introduce a nonparametric structural equation model (NPSEM), based on Pearl (2000), to define how the system changes under posited interventions: \[\begin{align*}\label{eqn:npsem} W &= f_W(U_W) \\ \Delta &= f_{\Delta}(W, U_{\Delta}) \\ A &= f_A(W, U_A) \\ Y &= f_Y(A, W, U_Y), \end{align*}\] We denote the observed data structure \(O = (W, \Delta, \Delta A, Y)\), in which some observations are missing — for now, let us consider a simple case-control sampling mechanism wherein the treatment is partially censored. The sampling mechanism \(\Delta \in \{0, 1\}\) allows units assigned \(1\) to be observed and forces those assigned \(0\) to be censored.

Letting \(A\) denote a continuous-valued treatment, we assume that the distribution of \(A\) conditional on \(W = w\) has support in the interval \((l(w), u(w))\) – for convenience, let this support be *a.e.* That is, the minimum natural value of treatment \(A\) for an individual with covariates \(W = w\) is \(l(w)\); similarly, the maximum is \(u(w)\). Then, a simple stochastic intervention, based on a shift \(\delta\), may be defined \[\begin{equation}\label{eqn:shift}
d(a, w) =
\begin{cases}
a - \delta & \text{if } a > l(w) + \delta \\
a & \text{if } a \leq l(w) + \delta,
\end{cases}
\end{equation}\] where \(0 \leq \delta \leq u(w)\) is an arbitrary pre-specified value that defines the degree to which the observed value \(A\) is to be shifted, where possible. For the purpose of using such a shift in practice, the present software provides estimators for a shift function that assumes that the density of treatment \(A\), conditional on the covariates \(W\), has support *a.e.*

# simulate simple data for tmle-shift sketch n_obs <- 500 # sample size n_w <- 1 # just one baseline covariate for this example tx_mult <- 2 # multiplier for the effect of W = 1 on the treatment # baseline covariate -- simple, binary W <- as.numeric(replicate(n_w, rbinom(n_obs, 1, 0.5))) # set and organize treatment based on baseline W A <- as.numeric(rnorm(n_obs, mean = tx_mult * W, sd = 1)) # create outcome as linear function of A, W + white noise Y <- A + W + rnorm(n_obs, mean = 0, sd = 1) # censoring based on covariates C <- rbinom(n_obs, 1, plogis(W)) # treatment shift parameter delta <- 0.5

**Note:** In all examples below, the argument `eif_reg_type`

is set to `"glm"`

(whereas the default is `"hal"`

) when invoking an IPCW-TMLE. This is done in an effort to speed along the computation process, as this argument controls the manner in which a nuisance regression associated with the efficient influence function of the IPCW-TMLE is fit. As stated in the documentation, setting this argument to `"glm"`

fits this EIF nuisance regression with a generalized linear model (GLM; an assumption-laden parametric form) while the default value of `"hal"`

fits the nuisance regression with the highly adaptive lasso (HAL; a recently developed nonparametric estimator) via the `hal9001`

R package (Coyle, Hejazi, and van der Laan 2019). In practice, we recommend leaving this argument to the default and fitting the EIF nuisance regression with HAL; however, this is significantly slower than simple using a GLM.

tmle_hal_shift_1 <- txshift(W = W, A = A, Y = Y, C = C, V = c("W", "Y"), delta = delta, fluctuation = "standard", max_iter = 2, ipcw_fit_args = list(fit_type = "glm"), g_fit_args = list(fit_type = "hal", n_bins = 5, grid_type = "equal_mass", lambda_seq = exp(seq(-1, -9, length = 300))), Q_fit_args = list(fit_type = "glm", glm_formula = "Y ~ ."), eif_reg_type = "glm" ) summary(tmle_hal_shift_1)

```
## lwr_ci param_est upr_ci param_var eif_mean estimator
## 1.8917 2.0738 2.2559 0.0086 -1.5522e-04 tmle
## n_iter
## 1
```

When computing any such TML estimator, we may, of course, vary the regressions used in fitting the nuisance parameters; however, an even simpler variation is to fit the step for the fluctuation submodels with a *weighted* method, simply weighting each observation by an auxiliary covariate (often denoted \(H_n\), and sometimes called the “clever covariate” in the literature) rather than using such a covariate directly in the submodel regression fit.

tmle_hal_shift_2 <- txshift(W = W, A = A, Y = Y, C = C, V = c("W", "Y"), delta = delta, fluctuation = "weighted", max_iter = 2, ipcw_fit_args = list(fit_type = "glm"), g_fit_args = list(fit_type = "hal", n_bins = 5, grid_type = "equal_mass", lambda_seq = exp(seq(-1, -9, length = 300))), Q_fit_args = list(fit_type = "glm", glm_formula = "Y ~ ."), eif_reg_type = "glm" ) summary(tmle_hal_shift_2)

```
## lwr_ci param_est upr_ci param_var eif_mean estimator
## 1.8925 2.0747 2.2568 0.0086 -1.5527e-04 tmle
## n_iter
## 1
```

`sl3`

To easily incorporate ensemble machine learning into the estimation procedure, we rely on the facilities provided in the `sl3`

R package (Coyle et al. 2020). For a complete guide on using the `sl3`

R package, consider consulting https://tlverse.org/sl3.

# SL learners to be used for most fits (e.g., IPCW, outcome regression) lrnr_lib <- make_learner_stack("Lrnr_mean", "Lrnr_glm", "Lrnr_ranger") sl_learner <- Lrnr_sl$new(learners = lrnr_lib, metalearner = Lrnr_nnls$new()) # SL learners for conditional densities to be used for the propensity score fit lrnr_dens_lib <- make_learner_stack(list("Lrnr_haldensify", n_bins = 3, grid_type = "equal_range", lambda_seq = exp(seq(-1, -7, length = 100))), list("Lrnr_haldensify", n_bins = 3, bin_method = "equal_mass", lambda_seq = exp(seq(-1, -7, length = 100))) ) sl_learner_density <- Lrnr_sl$new(learners = lrnr_dens_lib, metalearner = Lrnr_solnp_density$new())

Using the framework provided by the `sl3`

package, the nuisance parameters of the TML estimator may be fit with ensemble learning, using the cross-validation framework of the Super Learner algorithm of van der Laan, Polley, and Hubbard (2007). In principal, it would be desirable to estimate the parameter using data-adaptive stacked regressions for the sampling mechanism (via `ipcw_fit_args`

, the treatment mechanism (via `g_fit_args`

), and the outcome mechanism (via `Q_fit_args`

). This could be done with a call to the `txshift`

function like the following; however, we avoid running the following code chunk for the sake of saving time in this vignette.

tmle_sl_shift_1 <- txshift(W = W, A = A, Y = Y, C = C, V = c("W", "Y"), delta = delta, fluctuation = "standard", max_iter = 2, ipcw_fit_args = list(fit_type = "sl", sl_learnrs = sl_learner), g_fit_args = list(fit_type = "sl", sl_learners_density = sl_learner_density), Q_fit_args = list(fit_type = "sl", sl_learners = sl_learner), eif_reg_type = "glm" ) summary(tmle_sl_shift_1)

As before, we may vary the regression for the submodel fluctuation procedure by weighting each observation by the value of the auxiliary covariate rather than using such an auxiliary covariate directly in the regression procedure. Note that we again avoid running the following code chunk for the sake of saving time in this vignette.

tmle_sl_shift_2 <- txshift(W = W, A = A, Y = Y, C = C, V = c("W", "Y"), delta = delta, fluctuation = "weighted", max_iter = 2, ipcw_fit_args = list(fit_type = "sl", sl_learners = sl_learner), g_fit_args = list(fit_type = "hal", n_bins = 5, grid_type = "equal_mass", lambda_seq = exp(seq(-1, -9, length = 300))), Q_fit_args = list(fit_type = "sl", sl_learners = sl_learner), eif_reg_type = "glm" ) summary(tmle_sl_shift_2)

Utilizing the `sl3`

R package in exactly the same manner, it is also possible to estimate *inefficient* IPCW-TML estimators with `txshift`

. The inefficient IPCW-TMLE performs nearly all of the same computations as the efficient version of the estimator, eschewing only the iterative targeting procedure performed over a more complex efficient influence function; the interested reader is referred to Rose and van der Laan (2011) for further details on the differences between the efficient and inefficient IPCW-TMLEs. By default, an efficient IPCW-TMLE is computed when censoring is detected; however, this may be disable by setting the `ipcw_efficiency`

argument to `FALSE`

like so

tmle_sl_ineffic <- txshift(W = W, A = A, Y = Y, C = C, V = c("W", "Y"), delta = delta, fluctuation = "standard", ipcw_fit_args = list(fit_type = "sl", sl_learners = sl_learner), g_fit_args = list(fit_type = "hal", n_bins = 5, grid_type = "equal_mass", lambda_seq = exp(seq(-1, -9, length = 300))), Q_fit_args = list(fit_type = "sl", sl_learners = sl_learner), ipcw_efficiency = FALSE ) summary(tmle_sl_ineffic)

```
## lwr_ci param_est upr_ci param_var eif_mean estimator
## 1.8224 2.0958 2.3692 0.0195 -4.5051e-15 tmle
## n_iter
## 0
```

Please note that in such cases, the value of `n_iter`

in the resulting output should be ** exactly zero**, as this indicates that the iterative procedure that is used to achieve efficiency has been skipped.

For a discussion of the procedure for obtaining statistical inference for TML estimators, the interested reader is referred to the introductory vignette of this package. Here, we focus on addressing the issue of how censoring impacts the inferential procedure…

(ci_shift <- confint(tmle_hal_shift_1))

```
## lwr_ci est upr_ci
## 1.891726 2.073822 2.255918
```

In some special cases it may be useful for the experienced user to compute the censoring mechanism, treatment mechanism, and outcome mechanism regressions separately (i.e., outside of the `txshift`

wrapper function), instead applying this user-facing wrapper only to invoke the *targeting* steps involved in computing the TML estimator for the treatment shift parameter. In such cases, the optional arguments `ipcw_fit_ext`

, `gn_fit_ext`

and `Qn_fit_ext`

may be utilized. We present a case of using these arguments here:

# compute the censoring mechanism and produce IPC weights externally pi_mech <- plogis(W) ipc_weights_out <- (as.numeric(C == 1) / pi_mech)[C == 1] ipcw_out <- list(pi_mech = pi_mech, ipc_weights = ipc_weights_out) # compute treatment mechanism (propensity score) externally ## first, produce the down-shifted treatment data gn_downshift <- dnorm(A - delta, mean = tx_mult * W, sd = 1) ## next, initialize and produce the up-shifted treatment data gn_upshift <- dnorm(A + delta, mean = tx_mult * W, sd = 1) ## now, initialize and produce the up-up-shifted (2 * delta) treatment data gn_upupshift <- dnorm(A + 2 * delta, mean = tx_mult * W, sd = 1) ## then, initialize and produce the un-shifted treatment data gn_noshift <- dnorm(A, mean = tx_mult * W, sd = 1) ## finally, put it all together into an object like what's produced internally gn_out <- as.data.table(cbind(gn_downshift, gn_noshift, gn_upshift, gn_upupshift))[C == 1, ] setnames(gn_out, c("downshift", "noshift", "upshift", "upupshift")) # compute outcome regression externally Qn_noshift <- (W + A - min(Y)) / diff(range(Y)) Qn_upshift <- (W + A + delta - min(Y)) / diff(range(Y)) Qn_noshift[Qn_noshift < 0] <- .Machine$double.neg.eps Qn_noshift[Qn_noshift > 1] <- 1 - .Machine$double.neg.eps Qn_upshift[Qn_upshift < 0] <- .Machine$double.neg.eps Qn_upshift[Qn_upshift > 1] <- 1 - .Machine$double.neg.eps Qn_out <- as.data.table(cbind(Qn_noshift, Qn_upshift))[C == 1, ] setnames(Qn_out, c("noshift", "upshift")) # invoke the wrapper function only to apply the targeting step tmle_shift_spec <- txshift(W = W, A = A, Y = Y, delta = delta, C = C, V = c("W", "Y"), fluctuation = "standard", max_iter = 2, ipcw_fit_args = list(fit_type = "external"), g_fit_args = list(fit_type = "external"), Q_fit_args = list(fit_type = "external"), ipcw_fit_ext = ipcw_out, gn_fit_ext = gn_out, Qn_fit_ext = Qn_out, eif_reg_type = "glm") summary(tmle_shift_spec)

```
## lwr_ci param_est upr_ci param_var eif_mean estimator n_iter
## 1.8795 2.073 2.2666 0.0098 6.9825e-05 tmle 1
```

Coyle, Jeremy R, Nima S Hejazi, Ivana Malenica, and Oleg Sofrygin. 2020. *sl3: Modern Pipelines for Machine Learning and Super Learning*. https://github.com/tlverse/sl3. https://doi.org/10.5281/zenodo.1342293.

Coyle, Jeremy R, Nima S Hejazi, and Mark J van der Laan. 2019. *hal9001: The Scalable Highly Adaptive Lasso*. https://github.com/tlverse/hal9001. https://doi.org/10.5281/zenodo.3558313.

Díaz, Iván, and Mark J van der Laan. 2018. “Stochastic Treatment Regimes.” In *Targeted Learning in Data Science: Causal Inference for Complex Longitudinal Studies*, 167–80. Springer Science & Business Media.

Pearl, Judea. 2000. *Causality*. Cambridge university press.

Rose, Sherri, and Mark J van der Laan. 2011. “A Targeted Maximum Likelihood Estimator for Two-Stage Designs.” *The International Journal of Biostatistics* 7 (1): 1–21.

van der Laan, Mark J, Eric C Polley, and Alan E Hubbard. 2007. “Super Learner.” *Statistical Applications in Genetics and Molecular Biology* 6 (1).