WGBSage: Guess ages using Horvath-style 'clock' models

Description Usage Arguments Details Value Examples

View source: R/WGBSage.R

Description

See Horvath, Genome Biology, 2013 for more information

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
WGBSage(
  bsseq,
  model = c("horvath", "horvathshrunk", "hannum", "skinandblood"),
  padding = 15,
  useENSR = FALSE,
  useHMMI = FALSE,
  minCovg = 5,
  impute = FALSE,
  minSamp = 5,
  genome = NULL,
  dropBad = FALSE,
  ...
)

Arguments

bsseq

A bsseq object (must have assays named M and Cov)

model

Which model ("horvath", "horvathshrunk", "hannum", "skinandblood")

padding

How many bases +/- to pad the target CpG by (DEFAULT: 15)

useENSR

Use ENSEMBL regulatory region bounds instead of CpGs (DEFAULT: FALSE)

useHMMI

Use HMM CpG island boundaries instead of padded CpGs (DEFAULT: FALSE)

minCovg

Minimum regional read coverage desired to estimate 5mC (DEFAULT: 5)

impute

Use k-NN imputation to fill in low-coverage regions? (DEFAULT: FALSE)

minSamp

Minimum number of non-NA samples to perform imputation (DEFAULT: 5)

genome

Genome to use as reference, if no genome(bsseq) is set (DEFAULT: NULL)

dropBad

Drop rows/cols with > half missing pre-imputation? (DEFAULT: FALSE)

...

Arguments to be passed to impute.knn, such as rng.seed

Details

Note: the accuracy of the prediction will increase or decrease depending on how various hyper-parameters are set by the user. This is NOT a hands-off procedure, and the defaults are only a starting point for exploration. It will not be uncommon to tune padding, minCovg, and minSamp for each WGBS or RRBS experiment (and the latter may be impacted by whether dupes are removed prior to importing data). Consider yourself forewarned. In the near future we may add support for arbitrary region-coefficient inputs and result transformation functions, which of course will just make the problems worse.

Also, please cite the appropriate papers for the Epigenetic Clock(s) you use:

For the 'horvath' or 'horvathshrunk' clocks, cite Horvath, Genome Biology 2013. For the 'hannum' clock, cite Hannum et al, Molecular Cell 2013. For the 'skinandblood' clock, cite Horvath et al, Aging 2018.

Last but not least, keep track of the parameters YOU used for YOUR estimates. The call element in the returned list of results is for this exact purpose. If you need recover the GRanges object used to average(or impute) DNAme values for the model, try granges(result$methcoefs) on a result. The methylation fraction and coefficients for each region can be found in the GRanges object, result$methcoefs, where each sample has a corresponding column with the methylation fraction and the coefficients have their own column titled "coefs". Additionally, the age estimates are stored in result$age (named, in case dropBad == TRUE).

Value

1
    A list with call, methylation estimates, coefs, age estimates

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
  shuf_bed <- system.file("extdata", "MCF7_Cunha_chr11p15_shuffled.bed.gz",
                          package="biscuiteer")
  orig_bed <- system.file("extdata", "MCF7_Cunha_chr11p15.bed.gz",
                          package="biscuiteer")
  shuf_vcf <- system.file("extdata",
                          "MCF7_Cunha_shuffled_header_only.vcf.gz",
                          package="biscuiteer")
  orig_vcf <- system.file("extdata",
                          "MCF7_Cunha_header_only.vcf.gz",
                          package="biscuiteer")
  bisc1 <- readBiscuit(BEDfile = shuf_bed, VCFfile = shuf_vcf,
                       merged = FALSE)
  bisc2 <- readBiscuit(BEDfile = orig_bed, VCFfile = orig_vcf,
                       merged = FALSE)

  comb <- unionize(bisc1, bisc2)
  ages <- WGBSage(comb, "horvath")

biscuiteer documentation built on Nov. 8, 2020, 8:28 p.m.