Targeted Learning with Moderated Statistics for Biomarker Discovery

Author: Nima Hejazi

## What’s biotmle?

biotmle is an R package that facilitates biomarker discovery by generalizing the moderated t-statistic (Smyth 2004) for use with target parameters that have asymptotically linear representations (van der Laan and Rose 2011). The set of methods implemented in this R package rely on the use of targeted minimum loss-based estimates (TMLE) to transform biological sequencing data (e.g., microarray, RNA-seq) based on the influence curve representation of a particular causal target parameter (e.g., average treatment effect). The transformed data (rotated into influence curve space) may then be subjected to a moderated test for differences between the statistical estimate of the target parameter and a hypothesized value of said parameter (usually a null value defined in relation to the parameter itself). Such an approach provides a valid statistical hypothesis test of a statistically estimable causal parameter while controlling the variance such that the error rate (of the test) is more strongly controlled relative to testing procedures that do not moderate the variance estimate (Hejazi et al., n.d.).

## Installation

For standard use, install from Bioconductor using BiocManager:

if (!("BiocManager" %in% installed.packages())) {
install.packages("BiocManager")
}
BiocManager::install("biotmle")

To contribute, install the bleeding-edge development version from GitHub via devtools:

devtools::install_github("nhejazi/biotmle")

Current and prior Bioconductor releases are available under branches with numbers prefixed by “RELEASE_”. For example, to install the version of this package available via Bioconductor 3.6, use

devtools::install_github("nhejazi/biotmle", ref = "RELEASE_3_6")

## Example

For details on how to best use the biotmle R package, please consult the most recent package vignette available through the Bioconductor project.

## Issues

If you encounter any bugs or have any specific feature requests, please file an issue.

## Contributions

Contributions are very welcome. Interested contributors should consult our contribution guidelines prior to submitting a pull request.

## Citation

After using the biotmle R package, please cite it:

    @article{hejazi2017biotmle,
author = {Hejazi, Nima S and Cai, Weixin and Hubbard, Alan E},
title = {biotmle: Targeted Learning for Biomarker Discovery},
journal = {The Journal of Open Source Software},
volume = {2},
number = {15},
month = {July},
year  = {2017},
publisher = {The Open Journal},
doi = {10.21105/joss.00295},
url = {https://doi.org/10.21105/joss.00295}
}

## Funding

The development of this software was supported in part through grants from the National Institutes of Health: P42 ES004705-29 and R01 ES021369-05.

## License

© 2016-2018 Nima S. Hejazi

The contents of this repository are distributed under the MIT license. See file LICENSE for details.

## References

Hejazi, Nima S, Sara Kherad-Pajouh, Mark J van der Laan, and Alan E Hubbard. n.d. “Variance Stabilization of Targeted Estimators of Causal Parameters in High-Dimensional Settings.” https://arxiv.org/abs/1710.05451.

Smyth, Gordon K. 2004. “Linear Models and Empirical Bayes Methods for Assessing Differential Expression in Microarray Experiments.” Statistical Applications in Genetics and Molecular Biology 3 (1). Walter de Gruyter: 1–25. https://doi.org/10.2202/1544-6115.1027.

van der Laan, Mark J., and Sherri Rose. 2011. Targeted Learning: Causal Inference for Observational and Experimental Data. Springer Science & Business Media.