Estimate Counterfactual Mean Under Stochastic Shift in Exposure

txshift(
W,
A,
Y,
C = rep(1, length(Y)),
V = NULL,
delta = 0,
estimator = c("tmle", "onestep"),
fluctuation = c("standard", "weighted"),
max_iter = 10,
ipcw_fit_args = list(fit_type = c("glm", "sl", "external"), sl_learners = NULL),
g_fit_args = list(fit_type = c("hal", "sl", "external"), n_bins = c(10, 25),
grid_type = c("equal_range", "equal_mass"), lambda_seq = exp(seq(-1, -13, length =
300)), use_future = FALSE, sl_learners_density = NULL),
Q_fit_args = list(fit_type = c("glm", "sl", "external"), glm_formula = "Y ~ .",
sl_learners = NULL),
eif_reg_type = c("hal", "glm"),
ipcw_efficiency = TRUE,
ipcw_fit_ext = NULL,
gn_fit_ext = NULL,
Qn_fit_ext = NULL
)

## Arguments

W A matrix, data.frame, or similar containing a set of baseline covariates. A numeric vector corresponding to a treatment variable. The parameter of interest is defined as a location shift of this quantity. A numeric vector of the observed outcomes. A numeric indicator for whether a given observation was subject to censoring, used to compute an IPC-weighted estimator in cases where two-stage sampling is performed. The default assumes no censoring. The covariates that are used in determining the sampling procedure that gives rise to censoring. The default is NULL and corresponds to scenarios in which there is no censoring (in which case all values in the preceding argument C must be uniquely 1). To specify this, pass in a character vector identifying variables amongst W, A, Y thought to have played a role in defining the sampling/censoring mechanism (C). This argument also accepts a data.table (or similar) object composed of combinations of variables W, A, Y; use of this option is NOT recommended. A numeric value indicating the shift in the treatment to be used in defining the target parameter. This is defined with respect to the scale of the treatment (A). The type of estimator to be fit, either "tmle" for targeted maximum likelihood or "onestep" for a one-step estimator. The method to be used in the submodel fluctuation step (targeting step) to compute the TML estimator. The choices are "standard" and "weighted" for where to place the auxiliary covariate in the logistic tilting regression. A numeric integer giving the maximum number of steps to be taken in iterating to a solution of the efficient influence function. A list of arguments, all but one of which are passed to est_ipcw. For details, consult the documentation of est_ipcw. The first element (i.e., fit_type) is used to determine how this regression is fit: generalized linear model ("glm") or Super Learner ("sl"), and "external" a user-specified input of the form produced by est_ipcw. NOTE THAT this first argument is not passed to est_ipcw. A list of arguments, all but one of which are passed to est_g. For details, consult the documentation of est_g. The first element (i.e., fit_type) is used to determine how this regression is fit: "hal" to estimate conditional densities via the highly adaptive lasso (via haldensify), "sl" for sl3 learners used to fit Super Learner to densities via Lrnr_haldensify or similar, and "external" for user-specified input of the form produced by est_g. NOTE that this first argument is not passed to est_g. A list of arguments, all but one of which are passed to est_Q. For details, consult the documentation for est_Q. The first element (i.e., fit_type) is used to determine how this regression is fit: "glm" for a generalized linear model for the outcome regression, "sl" for sl3 learners used to fit a Super Learner for the outcome regression, and "external" for user-specified input of the form produced by est_Q. NOTE that this first argument is not passed to est_g. Whether a flexible nonparametric function ought to be used in the dimension-reduced nuisance regression of the targeting step for the censored data case. By default, the method used is a nonparametric regression based on the Highly Adaptive Lasso (from hal9001). Set this to "glm" to instead use a simple linear regression model. In this step, the efficient influence function (EIF) is regressed against covariates contributing to the censoring mechanism (i.e., EIF ~ V | C = 1). Whether to invoke an augmentation of the IPCW-TMLE procedure that performs an iterative process to ensure efficiency of the resulting estimate. The default is TRUE; only set to FALSE if possible inefficiency of the IPCW-TMLE is not a concern. The results of an external fitting procedure used to estimate the two-phase censoring mechanism, to be used in constructing the inverse probability of censoring weighted TML or one-step estimator. The input provided must match the output of est_ipcw exactly; thus, use of this argument is only recommended for power users. The results of an external fitting procedure used to estimate the exposure mechanism (generalized propensity score), to be used in constructing the TML or one-step estimator. The input provided must match the output of est_g exactly; thus, use of this argument is only recommended for power users. The results of an external fitting procedure used to estimate the outcome mechanism, to be used in constructing the TML or one-step estimator. The input provided must match the output of est_Q exactly; thus, use of this argument is only recommended for power users.

## Value

S3 object of class txshift containing the results of the procedure to compute a TML or one-step estimate of the counterfactual mean under a modified treatment policy that shifts a continuous-valued exposure by a scalar amount delta. These estimates can be augmented to be consistent and efficient when two-phase sampling is performed.

## Details

Construct a one-step estimate or targeted minimum loss estimate of the counterfactual mean under a modified treatment policy, automatically making adjustments for two-phase sampling when a censoring indicator is included. Ensemble machine learning may be used to construct the initial estimates of nuisance functions using sl3.

## Examples

set.seed(429153)
n_obs <- 100
W <- replicate(2, rbinom(n_obs, 1, 0.5))
A <- rnorm(n_obs, mean = 2 * W, sd = 1)
Y <- rbinom(n_obs, 1, plogis(A + W + rnorm(n_obs, mean = 0, sd = 1)))
C <- rbinom(n_obs, 1, plogis(W + Y)) # two-phase sampling

# construct a TML estimate (set estimator = "onestep" for the one-step)
tmle <- txshift(
W = W, A = A, Y = Y, delta = 0.5,
estimator = "tmle",
g_fit_args = list(
fit_type = "hal", n_bins = 5,
grid_type = "equal_mass",
lambda_seq = exp(-1:-9)
),
Q_fit_args = list(
fit_type = "glm",
glm_formula = "Y ~ ."
)
)

# construct a TML estimate under two-phase sampling
ipcwtmle <- txshift(
W = W, A = A, Y = Y, delta = 0.5,
C = C, V = c("W", "Y"),
estimator = "tmle", max_iter = 5,
ipcw_fit_args = list(fit_type = "glm"),
g_fit_args = list(
fit_type = "hal", n_bins = 5,
grid_type = "equal_mass",
lambda_seq = exp(-1:-9)
),
Q_fit_args = list(
fit_type = "glm",
glm_formula = "Y ~ ."
),
eif_reg_type = "glm"
)