dataSim-methods: Simulate DNA methylation data

dataSimR Documentation

Simulate DNA methylation data

Description

The function simulates DNA methylation data from multiple samples. See references for detailed explanation on statistics.

Usage

dataSim(
  replicates,
  sites,
  treatment,
  percentage = 10,
  effect = 25,
  alpha = 0.4,
  beta = 0.5,
  theta = 10,
  covariates = NULL,
  sample.ids = NULL,
  assembly = "hg18",
  context = "CpG",
  add.info = FALSE
)

Arguments

replicates

the number of samples that should be simulated.

sites

the number of CpG sites per sample.

treatment

a vector containing treatment information.

percentage

the proportion of sites which should be affected by the treatment.

effect

a number between 0 and 100 specifying the effect size of the treatment. This is essentially describing the average percent methylation difference between differentially methylated bases.See 'Examples' and 'Details'.

alpha

shape1 parameter for beta distribution (used for initial sampling of methylation proportions)

beta

shape2 parameter for beta distribution (used for initial sampling of methylation proportions)

theta

dispersion parameter for beta distribution (initial sampling of methylation proportions)

covariates

a data.frame containing covariates (optional)

sample.ids

will be generated automatically from treatment, but can be overwritten by a character vector containing sample names.

assembly

the assembly description (e.g. "hg18").Only needed for book keeping.

context

the experimanteal context of the data (e.g. "CpG"). Only needed for book keeping.

add.info

if set to TRUE, the output will be a list with the first element being the methylbase object and a vector of indices that indicate which CpGs should be differentially methylated. This vector can be used to subset simulated methylBase or methylDiff object with differentially methylated bases.

Value

a methylBase object containing simulated methylation data, or if add.info=TRUE a list containing the methylbase object and the indices of all treated sites (differentially methylated bases or regions) as the second element.

Details

The function uses a Beta distribution to simulate the methylation proportion background across all samples. The parameters alpha, beta used in a beta distribution to draw methylation proportions,\mu, from a typical bimodal distribution. For each initial methylation proportion drawn using the parameters above, a range of methylation proportions is distributed around the original \mu with overdispersion parameter \theta, this is using an alternative parameterization of Beta distribution: Beta(\mu,\theta). The parameters percentage and effect determine the proportion of sites that are affected by the treatment (meaning differential sites) and the strength of this influence, respectively. effect is added on top of \mu for the CpGs that are affected by the treament. The affected group of samples for that particular CpG will now be distributed by Beta(\mu+effect,\theta). The coverage is modeled with a negative binomial distribution, using rnbinom function with size=1 and prob=0.01. The additional information needed for a valid methylBase object, such as CpG start, end and strand, is generated as "dummy values", but can be overwritten as needed.

Examples


data(methylKit)

# Simulate data for 4 samples with 20000 sites each.
# The methylation in 10% of the sites are elevated by 25%.
my.methylBase=dataSim(replicates=4,sites=2000,treatment=c(1,1,0,0),
percentage=10,effect=25)




al2na/methylKit documentation built on Feb. 1, 2024, 4:42 p.m.