WGBSage: Guess ages using Horvath-style 'clock' models
In biscuiteer: Convenience Functions for Biscuit

Description Usage Arguments Details Value Examples

See Horvath, Genome Biology, 2013 for more information

WGBSage(
  bsseq,
  model = c("horvath", "horvathshrunk", "hannum", "skinandblood"),
  padding = 15,
  useENSR = FALSE,
  useHMMI = FALSE,
  minCovg = 5,
  impute = FALSE,
  minSamp = 5,
  genome = NULL,
  dropBad = FALSE,
  ...
)

`bsseq`	A bsseq object (must have assays named `M` and `Cov`)
`model`	Which model ("horvath", "horvathshrunk", "hannum", "skinandblood")
`padding`	How many bases +/- to pad the target CpG by (DEFAULT: 15)
`useENSR`	Use ENSEMBL regulatory region bounds instead of CpGs (DEFAULT: FALSE)
`useHMMI`	Use HMM CpG island boundaries instead of padded CpGs (DEFAULT: FALSE)
`minCovg`	Minimum regional read coverage desired to estimate 5mC (DEFAULT: 5)
`impute`	Use k-NN imputation to fill in low-coverage regions? (DEFAULT: FALSE)
`minSamp`	Minimum number of non-NA samples to perform imputation (DEFAULT: 5)
`genome`	Genome to use as reference, if no genome(bsseq) is set (DEFAULT: NULL)
`dropBad`	Drop rows/cols with > half missing pre-imputation? (DEFAULT: FALSE)
`...`	Arguments to be passed to impute.knn, such as rng.seed

Note: the accuracy of the prediction will increase or decrease depending on how various hyper-parameters are set by the user. This is NOT a hands-off procedure, and the defaults are only a starting point for exploration. It will not be uncommon to tune padding, minCovg, and minSamp for each WGBS or RRBS experiment (and the latter may be impacted by whether dupes are removed prior to importing data). Consider yourself forewarned. In the near future we may add support for arbitrary region-coefficient inputs and result transformation functions, which of course will just make the problems worse.

Also, please cite the appropriate papers for the Epigenetic Clock(s) you use:

For the 'horvath' or 'horvathshrunk' clocks, cite Horvath, Genome Biology 2013. For the 'hannum' clock, cite Hannum et al, Molecular Cell 2013. For the 'skinandblood' clock, cite Horvath et al, Aging 2018.

Last but not least, keep track of the parameters YOU used for YOUR estimates. The call element in the returned list of results is for this exact purpose. If you need recover the GRanges object used to average(or impute) DNAme values for the model, try granges(result$methcoefs) on a result. The methylation fraction and coefficients for each region can be found in the GRanges object, result$methcoefs, where each sample has a corresponding column with the methylation fraction and the coefficients have their own column titled "coefs". Additionally, the age estimates are stored in result$age (named, in case dropBad == TRUE).

1	A list with call, methylation estimates, coefs, age estimates

  shuf_bed <- system.file("extdata", "MCF7_Cunha_chr11p15_shuffled.bed.gz",
                          package="biscuiteer")
  orig_bed <- system.file("extdata", "MCF7_Cunha_chr11p15.bed.gz",
                          package="biscuiteer")
  shuf_vcf <- system.file("extdata",
                          "MCF7_Cunha_shuffled_header_only.vcf.gz",
                          package="biscuiteer")
  orig_vcf <- system.file("extdata",
                          "MCF7_Cunha_header_only.vcf.gz",
                          package="biscuiteer")
  bisc1 <- readBiscuit(BEDfile = shuf_bed, VCFfile = shuf_vcf,
                       merged = FALSE)
  bisc2 <- readBiscuit(BEDfile = orig_bed, VCFfile = orig_vcf,
                       merged = FALSE)

  comb <- unionize(bisc1, bisc2)
  ages <- WGBSage(comb, "horvath")