WGBSage: Guess ages using various Horvath-style (Genome Biology, 2012)...

Description Usage Arguments Details Value

Description

Note: the accuracy of the prediction will increase or decrease depending on how various hyperparameters are set by the user. This is NOT a hands-off procedure, and the defaults are only a starting point for exploration. It will not be uncommon to tune padding, minCovg, and minSamp for each WGBS or RRBS experiment (and the latter may be impacted by whether dupes are removed prior to importing data). Consider yourself forewarned. In the near future we may add support for arbitrary region-coefficient inputs and result transformation functions, which of course will just make the problems worse.

Usage

1
2
3
4
WGBSage(x, model = c("horvath", "horvathshrunk", "hannum",
  "skinandblood"), padding = 15, useENSR = FALSE, useHMMI = FALSE,
  minCovg = 5, impute = FALSE, minSamp = 5, genome = NULL,
  dropBad = FALSE, ...)

Arguments

x

a BSseq object (must have assays named M and Cov)

model

which model ("horvath","shrunk","hannum","skinandblood")

padding

how many bases +/- to pad the target CpG by (default is 15)

useENSR

use ENSEMBL regulatory region bounds instead of CpGs (FALSE)

useHMMI

use HMM CpG island boundaries instead of padded CpGs (FALSE)

minCovg

minimum regional read coverage desired to estimate 5mC (5)

impute

use k-NN imputation to fill in low-coverage regions? (FALSE)

minSamp

minimum number of non-NA samples to perform imputation (5)

genome

genome to use as reference, if no genome(x) is set (NULL)

dropBad

drop rows/cols with > half missing pre-imputation? (FALSE)

...

arguments to be passed to impute.knn, such as rng.seed

Details

Also, please cite the appropriate papers for the Epigenetic Clock(s) you use:

For the 'horvath' or 'shrunk' clocks, cite Horvath, Genome Biology 2012. For the 'hannum' clock, cite Hannum et al, Molecular Cell 2013. For the 'skinandblood' clock, cite Horvath et al, Aging 2018.

Last but not least, keep track of the parameters YOU used for YOUR estimates. The call element in the returned list of results is for this exact purpose. If you need to recover the GRanges object used to average (or impute) DNAme values for the model, try as.character(rownames(result$meth)) on a result. The coefficients for each of these regions are stored in result$coefs, and the age estimates are stored in result$age (named, in case dropBad == TRUE).

Value

1
     a list: call, methylation estimates, coefs, age (estimates)

ttriche/biscuitEater documentation built on May 15, 2019, 4:18 p.m.