simulateCounts: Simulate read counts of methylated and unmethylated cytosines

View source: R/simulateCounts.R

simulateCountsR Documentation

Simulate read counts of methylated and unmethylated cytosines

Description

Auxiliary function to simulate read counts of methylated and unmethylated cytosines

Usage

simulateCounts(
  num.samples,
  sites = NULL,
  alpha = NULL,
  beta = NULL,
  size = NULL,
  theta,
  sample.ids = NULL,
  chromosome = "1",
  start = NULL,
  end = NULL,
  strand = "*",
  type = c("on_sites", "on_regions"),
  regions = 10,
  min_width = 1000,
  max_width = 5000,
  minCountPerIndv = 8,
  maxCountPerIndv = 300,
  mu = NULL,
  noise = 0,
  seed = 123
)

Arguments

num.samples

Number of samples to generate.

sites

Number of cytosine sites for each sample.

alpha

Alpha parameter of beta distribution. Parameter shape1 from Beta function.

beta

Beta parameter of beta distribution. Parameter shape2 from Beta function.

size

number of trials (11 or more). Expected cytosine coverage.

theta

Parameter theta from rnegbin (overdispersion parameter).

sample.ids

Names for the samples.

chromosome

A character string naming the chromosome. Default '1'.

start

An nteger vector with the start positions for each cytosine site. Default start = NULL.

end

An integer vector with the end position for each cytosine site. Default end = start. Notice that if end > start, each counts will cover a region.

strand

One of the characters '*', '+', or '-' denoting the DNA strand. Default is '*'.

type

One of the string 'on_sites' or 'on_regions'. Default is 'on_sites'. If type == 'on_sites', then the counts are intended on single bases, otherwise the counts are covering regions.

regions, min_width, max_width, minCountPerIndv, minCountPerIndv, mu

Arguments to provide when type == 'on_regions':

regions

Number of regions carrying counts

min_width

Minimum size for a region

max_width

Maximum size for a region

minCountPerIndv

Each region must have more than 'minCountPerIndv' counts (on average) per individual (if 'mu' is not provided)

maxCountPerIndv

Each region must have less than 'maxCountPerIndv' counts (on average) per individual (if 'mu' is not provided)

mu

The expected value of the data generated with Negative Binomial distribution. This a vector of means. Short vectors are recycled. Default is NULL and, in this case, simulation is performed with mu = seq(minCountPerIndv, maxCountPerIndv).

noise

A single number from the interval [0, 1] or a numeric vector of lengh(sites) with all its elements from the interval [0, 1]. Adds noise to the read counts. The noise is added to the methylation levels 'p', which are used to compute the coverage. If for some site 'p' + noise > 1', then the noise is not added to the site. Default is zero.

seed

seed a single value, interpreted as an integer, or NULL, to set a seed for Random Number Generation. Default is seed = 123.

Details

Methylation coverages (minimum 10) are generated from a Negative Binomial distribution with function rnegbin from R package MASS. This function uses the representation of the Negative Binomial distribution as a continuous mixture of Poisson distributions with Gamma distributed means. Prior methylation levels are randomly generated with beta distribution using Beta function from R package “stats” and posterior methylation levels are generated according Bayes' theorem. The read of methylation counts are obtained as the product of coverage by the posterior methylation level.

Counts on regions are generated from a Negative Binomial distribution with function rnegbin with mean mu and variance: mu + mu^2/theta.

Value

A list of GRanges objects with the methylated and unmethylated counts in its metacolumn.

Author(s)

Robersy Sanchez (https://github.com/genomaths).

Examples

## *** Simulate samples with expected average of difference of methylation
## levels equal to 0.0427.
## === Expected mean of methylation levels ===
bmean <- function(alpha, beta) alpha/(alpha + beta)
bmean(0.03, 0.5) - bmean(0.007, 0.5) #' Expected difference = 0.04279707

## === The number of cytosine sitesto generate ===
sites = 5000

## === Simulate samples ===
ref = simulateCounts(num.samples = 1, sites = sites, alpha = 0.007,
                    beta = 0.5, size = 50, theta = 4.5, sample.ids = 'C1')
treat = simulateCounts(num.samples = 2, sites = sites, alpha = 0.03,
                    beta = 0.5, size = 50, theta = 4.5,
                    sample.ids = c('T1', 'T2'))
### === Simulate counts on regions ====
simulateCounts(num.samples = 7, 
               sample.ids = c(paste0('C',1:4), paste0('T', 1:3)),
               type = 'on_regions', theta = 2.5, regions = 10)
               

genomaths/MethylIT.utils documentation built on July 4, 2023, 12:05 a.m.