estimateBayesianDivergence | R Documentation |
The Information divergence of methylation levels is estimated using the direct estimation or a Bayesian approach of the methylation levels. if 'meth.level = FALSE', Hellinger divergence is computed as given in reference (1). The Bayesian approach followed is described in reference (2).
estimateBayesianDivergence(
x,
Bayesian = FALSE,
init.pars = NULL,
via.optim = TRUE,
JD = FALSE,
jd.stat = FALSE,
columns = c(mC1 = 1, uC1 = 2, mC2 = 3, uC2 = 4),
meth.level = FALSE,
preserve.gr = FALSE,
loss.fun = c("linear", "huber", "smooth", "cauchy", "arctg"),
logbase = 2,
verbose = FALSE,
num.cores = 1L,
tasks = 0L,
...
)
x |
A matrix of counts or GRanges object with the table of counts in the meta-columns (methylated mC and unmethylated uC cytosines). Unless specified in the parameter 'columns', the methylation counts must be given in the first four columns: 'mC1' and 'uC1' methylated and unmethylated counts for control sample, and 'mC2' and 'uC2' methylated and unmethylated counts for treatment sample, respectively. |
Bayesian |
logical. Whether to perform the estimations based on posterior estimations of methylation levels. |
init.pars |
initial parameter values. Defaults is NULL and an initial
guess is estimated using |
via.optim |
Logical. Whether to estimate beta distribution parameters
via |
JD |
Logic (Default:FALSE). Option on whether to add a column with
values of J-information divergence (see |
jd.stat |
logical(1). Whether to compute the |
columns |
Vector of integer numbers of the columns where the counts
'mC1', 'uC1', 'mC2', and 'uC2' are in the matrix (default 1 to 4). That is,
the input could have more than 4 columns, but only 4 columns with the counts
are used. if 'meth.level == TRUE', then columns = |
meth.level |
logical(1). Methylation levels can be provided in place of counts. |
preserve.gr |
logical(1). Option of whether to preserve all the metadata from the original GRanges object. |
loss.fun |
Loss function(s) used in the estimation of the best fitted
model to beta distribution (only applied when Bayesian=TRUE; see
(Loss function)). This
fitting uses the approach followed in in the R package
usefr. After
Loss 'linear' function works well for most of the methylation datasets with acceptable quality. |
logbase |
Logarithm base used to compute the JD (if JD = TRUE). Logarithm base 2 is used as default (bit unit). Use logbase = exp(1) for natural logarithm. |
verbose |
if TRUE, prints the function log to stdout |
num.cores |
The number of cores to use, i.e. at most how many child processes will be run simultaneously (see 'bplapply' function from BiocParallel package). |
tasks |
integer(1). The number of tasks per job. value must be a scalar
integer >= 0L. In this documentation a job is defined as a single call to a
function, such as bplapply, bpmapply etc. A task is the division of the
|
... |
Optional parameter values for: maxiter, ftol, ptol, and gradtol
from |
For the current version, the Information divergence of methylation levels is estimated based on Hellinger divergence. If read counts are provided, then Hellinger divergence is computed as given in the first formula from Theorem 1 from reference 1. In the present case,
H = 2 (n_1 + 1) (n_2 + 1)*((sqrt(p_1) - sqrt(p_2))^2 +
(sqrt(1-p_2) - sqrt(1-p_2))^2)/(n_1 + n_2 + 2)
where n_1
and n_2
are the coverage for the control and
treatment, respectively. Notice that each row from the matrix of counts
correspond to a single cytosine position and has four values corresponding to
'mC1' and 'uC1' (control), and mC2' and 'uC2' for treatment.
If the methylation levels are provided in place of counts, then Hellinger divergence is computed as:
H = (sqrt(p_1) - sqrt(p_2))^2 + (sqrt(1 - p_1) - sqrt(1 - p_2))^2
This formula assumes that the probability vectors derived from the
methylation levels p_i = c(p_{i2}, 1 - p_{i2})
(see function
estimateHellingerDiv
) are an unbiased estimation of the
expected one. The Bayesian approach followed here is described in reference
(2).
The input matrix or GRanges object with the four columns of counts and additional columns. If Bayessian = TRUE, the results are based on the posterior estimations of methylation levels. The basic additional columns are:
The original matrix of methylated c_{ij}
and unmethylated
t_{ij}
read counts from control j=1
and treatment j=2
samples at positions i
.
'p1' and 'p2': methylation levels for control and treatment, respectively. If 'meth.level = FALSE' and 'Bayesian = TRUE' (recommended), 'p1' and 'p2' are estimated following the Bayesian approach described in reference (1).
'bay.TV': total variation TV = p2 - p1
'TV': total variation based on simple counts:
TV=c_1/(c_1+t_1)-c_2/(c_2+t_2)
, where c_i
and t_i
denote
methylated and unmethylated read counts, respectively.
'hdiv': Hellinger divergence. If Bayesian = TRUE, the results are based on the posterior estimations of methylation levels. if 'meth.level = FALSE', then 'hdiv' is computed as given in reference (1), otherwise as:
hdiv = (sqrt(p_1) - sqrt(p_2))^2 + (sqrt(1 -p_1) - sqrt(1 -
p_2))^2
Robersy Sanchez (https://genomaths.com)
Basu A., Mandal A., Pardo L. Hypothesis testing for two discrete populations based on the Hellinger distance. Stat. Probab. Lett. 2010, 80: 206-214.
Sanchez R, Yang X, Maher T, Mackenzie S. Discrimination of DNA Methylation Signal from Background Variation for Clinical Diagnostics. Int. J. Mol Sci, 2019, 20:5343.
estimateDivergence
## The read count data are created
gr <- data.frame(chr = 'chr1', start = 1:10, end = 1:10,
strand = '*',
mC1 = rnbinom(size = 10, mu = 4, n = 500),
uC1 = rnbinom(size = 10, mu = 4, n = 500),
mC2 = rnbinom(size = 10, mu = 4, n = 500),
uC2 = rnbinom(size = 10, mu = 4, n = 500))
gr <- makeGRangesFromDataFrame(gr, keep.extra.columns = TRUE)
## Estimation of the information divergences
hd <- estimateBayesianDivergence(gr, JD = TRUE)
## Keep in mind that Hellinger and J divergences are, in general,
## correlated
cor.test(x = as.numeric(hd$hdiv), y = as.numeric(hd$jdiv),
method = 'kendall')
## An example with methylation levels
set.seed(123)
sites = 10
dat = data.frame(chr = 'chr1', start = 1:sites, end = 1:sites,strand = '*',
m1 = runif(n = sites), m2 = runif(n = sites))
dat = makeGRangesFromDataFrame(dat, keep.extra.columns = TRUE)
## Transforming the list of data frames into a GRanges object
hd <- estimateBayesianDivergence(x = dat, columns = c(p1 = 1, p2 = 2),
meth.level = TRUE, preserve.gr = TRUE)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.