Targeted Data-Adaptive Estimation and Inference for Differential Methylation Analysis

Author: Nima Hejazi

## What’s methyvim?

methyvim is an R package that provides facilities for differential methylation analysis based on variable importance measures (VIMs), a class of statistically estimable target parameters that arise in causal inference.

The statistical methodology implemented computes targeted minimum loss-based estimates of several well-characterized variable importance measures:

For discrete-valued treatments or exposures:

• The average treatment effect (ATE): The effect of a binary exposure or treatment on the observed methylation at a target CpG site is estimated, controlling for the observed methylation at all other CpG sites in the same neighborhood as the target site, based on an additive form. In particular, the parameter estimate represents the additive difference in methylation that would have been observed at the target site had all observations received the treatment versus the scenario in which none received the treatment.

• The relative risk (RR): The effect of a binary exposure or treatment on the observed methylation at a target CpG site is estimated, controlling for the observed methylation at all other CpG sites in the same neighborhood as the target site, based on an geometric form. In particular, the parameter estimate represents the multiplicative difference in methylation that would have been observed at the target site had all observations received the treatment versus the scenario in which none received the treatment.

For continous-valued treatments or exposures:

• A nonparametric variable importance measure (NPVI): The effect of continous-valued exposure or treatment (the observed methylation at a target CpG site) on an outcome of interest is estimated, controlling for the observed methylation at all other CpG sites in the same neighborhood as the target (treatment) site, based on a parameter that compares values of the treatment against a reference value taken to be the null. In particular, the implementation provided is designed to assess the effect of differential methylation at the target CpG site on a (typically) phenotype-level outcome of interest (e.g., survival), in effect providing an nonparametric evaluation of the impact of methylation at the target site on said outcome.

In all cases, an estimator of the target parameter is constructed via targeted minimum loss-based estimation.

These methods allow differential methylation effects to be quantified in a manner that is largely free of assumptions, especially of the variety exploited in parametric models. The statistical algorithm consists in several major steps:

1. Pre-screening of genomic sites is used to isolate a subset of sites for which there is cursory evidence of differential methylation. For the sake of computational feasibility, targeted minimum loss-based estimates of VIMs are computed only for this subset of sites. Several screening approaches are available, adapting core routines from the following R packages: limma, tmle.npvi.
2. Nonparametric VIMs are estimated for the specified parameter, currently adapting routines from the tmle.npvi and tmle R packages.
3. Since pre-screening is performed prior to estimating VIMs, we make use of a multiple testing correction uniquely suited to such settings. Due to the multiple testing nature of the estimation problem, a variant of the Benjamini & Hochberg procedure for controlling the False Discovery Rate (FDR) is applied. Specifically, we apply the modified marginal Benjamini & Hochberg step-up False Discovery Rate controlling procedure for multi-stage analyses (FDR-MSA).

## Installation

Install the most recent stable release from GitHub via devtools:

devtools::install_github("nhejazi/methyvim")

## Issues

If you encounter any bugs or have any specific feature requests, please file an issue.

## Contributions

It is our hope that methyvim will grow to be widely adopted as a tool for the nonparametric assessment of variable importance in studies of differential methylation. To that end, contributions are very welcome, though we ask that interested contributors consult our contribution guidelines prior to submitting a pull request.

## Citation

After using the methyvim R package, please cite the following:

    @article{hejazi2017methyvim,
doi = {},
url = {},
year  = {2017},
month = {},
publisher = {},
volume = {},
author = {Hejazi, Nima S and Hubbard, Alan E and {van der Laan}, Mark
J},
title = {methyvim: Targeted and model-free differential methylation
analysis in R},
journal = {}
}

The contents of this repository are distributed under the MIT license. See file LICENSE for details.