dataSim | R Documentation |
The function simulates DNA methylation data from multiple samples. See references for detailed explanation on statistics.
dataSim(
replicates,
sites,
treatment,
percentage = 10,
effect = 25,
alpha = 0.4,
beta = 0.5,
theta = 10,
covariates = NULL,
sample.ids = NULL,
assembly = "hg18",
context = "CpG",
add.info = FALSE
)
replicates |
the number of samples that should be simulated. |
sites |
the number of CpG sites per sample. |
treatment |
a vector containing treatment information. |
percentage |
the proportion of sites which should be affected by the treatment. |
effect |
a number between 0 and 100 specifying the effect size of the treatment. This is essentially describing the average percent methylation difference between differentially methylated bases.See 'Examples' and 'Details'. |
alpha |
shape1 parameter for beta distribution (used for initial sampling of methylation proportions) |
beta |
shape2 parameter for beta distribution (used for initial sampling of methylation proportions) |
theta |
dispersion parameter for beta distribution (initial sampling of methylation proportions) |
covariates |
a data.frame containing covariates (optional) |
sample.ids |
will be generated automatically from |
assembly |
the assembly description (e.g. "hg18").Only needed for book keeping. |
context |
the experimanteal context of the data (e.g. "CpG"). Only needed for book keeping. |
add.info |
if set to TRUE, the output will be a list with the first element being the methylbase object and a vector of indices that indicate which CpGs should be differentially methylated. This vector can be used to subset simulated methylBase or methylDiff object with differentially methylated bases. |
a methylBase object containing simulated methylation data, or if add.info=TRUE a list containing the methylbase object and the indices of all treated sites (differentially methylated bases or regions) as the second element.
The function uses
a Beta distribution to simulate the methylation proportion
background across all samples.
The parameters alpha
, beta
used in a beta distribution to draw
methylation proportions,\mu
, from a typical bimodal distribution.
For each initial methylation proportion drawn using the parameters above,
a range of
methylation proportions is distributed around the original \mu
with
overdispersion parameter \theta
, this is using an alternative
parameterization of Beta distribution: Beta(\mu,\theta)
.
The parameters percentage
and effect
determine the proportion
of sites that are
affected by the treatment (meaning differential sites) and the strength of
this influence, respectively. effect
is added on top of \mu
for
the CpGs that are affected by the treament. The affected group of samples
for that
particular CpG will now be distributed by Beta(\mu+effect,\theta)
.
The coverage is modeled with a negative binomial distribution, using
rnbinom
function with size=1
and prob=0.01
.
The additional information needed for a valid methylBase object, such as
CpG start, end and strand, is generated as "dummy values",
but can be overwritten as needed.
data(methylKit)
# Simulate data for 4 samples with 20000 sites each.
# The methylation in 10% of the sites are elevated by 25%.
my.methylBase=dataSim(replicates=4,sites=2000,treatment=c(1,1,0,0),
percentage=10,effect=25)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.