empBayes: Function to calculate prior parameters using empirical Bayes.
In markrobinsonuzh/Repitools: Epigenomic tools

empBayes

R Documentation

Function to calculate prior parameters using empirical Bayes.

Description

Under the empirical Bayes approach (and assuming a uniform prior for the methylation level) the shape and scale parameters for the gamma prior of the region-specific read density are derived. The parameters are thereby determined in a CpG-dependent manner.

Usage

empBayes(x, ngroups = 100, ncomp = 1, maxBins=50000, method="beta", controlMethod=list(mode="full", weights=c(0.1, 0.8, 0.1), param=c(1,1)), ncpu = NULL, verbose = FALSE)

Arguments

`x`	Object of class `BayMethList`.
`ngroups`	Number of CpG density groups you would like to consider. The bins are classified based on its CpG density into one of `ngroups` classes and for each class separately the set of prior parameters will be determined.
`ncomp`	Number of components of beta distributions in the prior distribution for the methylation level when method is equal to `beta`.
`maxBins`	Maximum number of bins in one CpG density group used to derive the parameter estimates. If maxBins is smaller than the number of bins that are in one groups than `maxBins` bins are sampled with replacement.
`method`	Either `DBD` for a Dirac-Beta-Dirac mixture, representing a mixture a mixture of a point mass at zero, a beta distribution and a point mass at one, or `beta` for as Beta mixture with `ncomp` components.
`controlMethod`	list defining settings if the Dirac-Beta-Dirac mixture is chosen. - `mode` Either `full`, `fixedWeights` or `fixedBeta`. Using the `full` both the mixture weights and beta parameters are estimated. In mode `fixedWeights` the weights are fixed given to the values in `weights` and only the parameters of the beta component are estimated. In mode `fixedBeta` the parameters of the beta component are fixed to the values specified in `param`. The default mode is `full`. - `weights` Numeric vector of length three specifying the weights for the Dirac-Beta-Dirac mixture when mode is equal to `fixedWeights`. The first element specifies the weight for the zero point mass, the second for the beta component and the third for the point mass at one. The three values must sum up to one. The default is c(0.1, 0.8, 0.1). - `param` Numeric vector of length two specifying (positive) parameters of the beta distribution component when mode is equal to `fixedBeta`. The default is c(1,1).
`ncpu`	Number of CPUs on your machine you would like to use in parallel. If `ncpu` is set to NULL, half of the CPUs will be used on machines with a maximum of four CPUs, and 2/3 will be used if more CPUs are available.
`verbose`	Boolean indicating whether the empirical Bayes function should run in a verbose mode (default 'FALSE').

Details

BayMeth takes advantage of the relationship between CpG-density and read depth to formulate a CpG-density-dependent gamma prior distribution for the region-specific read density. Taking CpG-density into account the prior should stabilise the methylation estimation procedure for low counts and in the presence of sampling variability. The shape and scale parameter of the gamma prior distribution are determined in a CpG-density-dependent manner using empirical Bayes. For each genomic bin the CpG density is provided in the BayMethList-object. Each bin is classified based on its CpG-density into one of ngroups non-overlapping CpG-density intervals. For each class separately, we derive the values for the shape and scale parameter under an empirical Bayes framework using maximum likelihood. For CpG classes which contain more than maxBins bins, a random sample drawn with replacement of size maxBins is used to derive these prior parameters. Note that both read depths, from the SssI control and the sample of interest, are thereby taken into account. We end up with ngroups parameter sets for shape and rate.

Value

A BayMethList object where the slot priorTab is filled. priorTab represent a list. The first list entry contains the CpG group a bin is assigned to. The second entry contains the number of components that have been used for the prior (at the moment 1). The following list entries correspond to one sample of interest, respectively, and contain a matrix with the optimal shape and scale parameters for all CpG classes. The first row contains the optimal shape parameter and the second row the optimal scale parameter. The number of columns corresponds to the number of CpG classes specified in ngroups.

Author(s)

Andrea Riebler

Examples

    if(require(BSgenome.Hsapiens.UCSC.hg18)){
        windows <- genomeBlocks(Hsapiens, chrs="chr21", width=100, spacing=100)
        cpgdens <- cpgDensityCalc(windows, organism=Hsapiens, 
            w.function="linear", window=700)  
        co <- matrix(rnbinom(length(windows), mu=10, size=2), ncol=1)
        sI <- matrix(rnbinom(2*length(windows), mu=5, size=2), ncol=2)
        bm <- BayMethList(windows=windows, control=co, 
            sampleInterest=sI, cpgDens=cpgdens)
        bm <- determineOffset(bm)
 
        # mask out unannotated high copy number regions
        # see Pickrell et al. (2011), Bioinformatics 27: 2144-2146.

        # should take about 3 minutes for both sample of interests with 2 CPUs.
        bm <- empBayes(bm, ngroups=20) 
   }

markrobinsonuzh/Repitools documentation built on March 20, 2024, 6:04 a.m.