methylation_hooks: Read methylation dataset

Description Usage Arguments Details Value Author(s) Examples

View source: R/read_data_hooks.R

Description

Read methylation dataset

Usage

1
methylation_hooks(..., RESET = FALSE, READ.ONLY = NULL, LOCAL = FALSE)

Arguments

...

please ignore, see 'details' section.

RESET

remove all hooks

READ.ONLY

please ignore

LOCAL

please ignore

Details

Methylation dataset from whole genome bisulfite sequencing is always huge and it does not make sense to read them all into the memory. Normally, the methylation dataset is stored by chromosome and this hook function can be set to read methylation data in a per-chromosome manner. In the package, there are many functions use it internally to read methylation datasets.

Generally, for methylation dataset, there are methylation rate (ranging from 0 to 1), CpG coverage and genomic positions for CpG sites. Sometimes there is also smoothed methylation rate. All these datasets can be set by defining a proper methylation_hooks$get_by_chr. The value for methylation_hooks$get_by_chr is a function with only one argument which is the chromosome name. This function defines how to read methylation dataset for a single chromosome. The function must return a list which contains following mandatory elements:

gr

a GRanges object which contains genomic positions for CpG sites. Positions should be sorted.

meth

a matrix which contains methylation rate. This will be the main methylation dataset the epik package uses, so it should be smoothed methylation rate if the CpG coverage is not high. Note, this matrix must have column names which is sample names and will be used to match other datasets (e.g. RNASeq)

cov

a matrix which contains CpG coverage.

It can also contain some optional elements and they are not needed for the core analysis:

raw

a matrix which contains unsmoothed methylation rate (or the original methylation rate calculatd as the fraction of methylated CpG in a CpG site)

Note each row in above datasets should correspond to the same CpG site.

In following example code, assume the methylation data has been processed by bsseq package and saved as path/bsseq_$chr.rds, then the definition of methylation_hooks$get_by_chr is:

1
2
3
4
5
6
7
8
  methylation_hooks$get_by_chr = function(chr) {
      obj = readRDS(paste0("path/bsseq_", chr, ".rds"))
      lt = list(gr   = granges(obj),
                raw  = getMeth(obj, type = "raw"),
                cov  = getCoverage(obj, type = "Cov"),
                meth = getMeth(obj, type = "smooth")
      return(lt)
  }  

After methylation_hooks$get_by_chr is properly set, the "current chromosome" for the methylation dataset can be set by methylation_hooks$set_chr(chr) where chr is the chromosome name you want to go. After validating the dataset, following variables can be used directly:

methylation_hooks$set_chr(chr) tries to reload the data only when the current chromosome changes.

Value

Hook functions

Author(s)

Zuguang Gu <z.gu@dkfz.de>

Examples

1
2
# There is no example
NULL

jokergoo/epik documentation built on Sept. 28, 2019, 9:20 a.m.