simInheritance: Simulate a multigenerational methylation experiment with...

Description Usage Arguments Value Author(s) Examples

View source: R/methylInheritanceSimInternalMethods.R

Description

Simulate a multigenerational methylation case versus control experiment with inheritance relation using a real control dataset.

The simulation can be parametrized to fit different models. The number of cases and controls, the proportion of the case affected by the treatment (penetrance), the effect of the treatment on the mean of the distribution, the proportion of sites inherited, the proportion of the differentially methylated sites from the precedent generation inherited, etc..

The function simulates a multigeneration dataset like a bisulfite sequencing experiment. The simulation includes the information about control and case for each generation.

Usage

1
2
3
4
simInheritance(pathOut, pref, k, nbCtrl, nbCase, treatment, sample.id,
  generation, stateInfo, propDiff, propDiffsd, diffValue, propInheritance,
  rateDiff, minRate, propInherite, propHetero, minReads, maxPercReads, context,
  assembly, meanCov, diffRes, saveGRanges, saveMethylKit, runAnalysis)

Arguments

pathOut

a string of character or NULL, the path where the files created by the function will be saved. When NULL, the files are saved in the current directory.

pref

a string of character representing the parameters of specific simulation the string is composed of those elements, separated by "_":

  • a fileID

  • the chromosome number, a number between 1 and nbSynCHR

  • the number of samples, a number in the vNbSample vector

  • the mean proportion of samples that has, for a specific position, differentially methylated values, a number in the vpDiff vector

  • the proportion of C/T for a case differentially methylated that follows a shifted beta distribution, a number in the vDiff vector

  • the proportion of cases that inherits differentially sites, a number in the vInheritance vector

k

a positive integer, an ID for the current simulation.

nbCtrl

a positive integer, the number of controls.

nbCase

a positive integer, the number of cases.

treatment

a vector of integer denoting controls and cases. The vector length must correspond to the sum of cases and controls.

sample.id

a matrix the name of each samples for each generation (row) and each case and control (column).

generation

a positive integer, the number of generations simulated.

stateInfo

a GRanges that contains the CpG (or methylated sites). The GRanges have four metadata from the real dataset:

  • chrOri a numeric, the chromosome from the real dataset

  • startOri a numeric, the position of the site in the real dataset

  • meanCTRL a numeric, the mean of the control in the real dataset

  • varCTRL a numeric, the variance of the control in the real dataset.

propDiff

a double superior to 0 and inferior or equal to 1, the mean value for the proportion of samples that will have, for a specific position, differentially methylated values. It can be interpreted as the penetrance.

propDiffsd

a non-negative double, the standard deviation associated to the vpDiff. Note that vpDiff and vpDiffsd must be the same length.

diffValue

a non-negative double included in [0,1], the proportion of C/T for a case differentially methylated that follows a beta distribution where the mean is shifted by vDiff from the CTRL distribution.

propInheritance

a non-negative double included in [0,1], the proportion of cases that inherits differentially methylated sites.

rateDiff

a positive double inferior to 1, the mean of the chance that a site is differentially methylated.

minRate

a non-negative double inferior to 1, the minimum rate for differentially methylated sites. Default: 0.01.

propInherite

a non-negative double inferior or equal to 1, the proportion of differentially methylated regions that are inherated.

propHetero

a non-negative double between [0,1], the reduction of vDiff for the second and following generations.

minReads

a positive integer, sites and regions having lower coverage than this count are discarded. The parameter corresponds to the lo.count parameter in the methylKit package.

maxPercReads

a double between [0,100], the percentile of read counts that is going to be used as upper cutoff. Sites and regions having higher coverage than maxPercReads are discarded. This parameter is used for both CpG sites and tiles analysis. The parameter correspond to the hi.perc parameter in the methylKit package.

context

a string of character, the short description of the methylation context, such as "CpG", "CpH", "CHH", etc..

assembly

a string of character, the short description of the genome assembly, such as "mm9", "hg18", etc..

meanCov

a positive integer, the mean of the coverage at the simulated CpG sites.

diffRes

a list with 2 entries:

  • stateDiff a vector of integer (0 and 1) with length corresponding the length of stateInfo. The vector indicates, using a 1, the positions where the CpG sites are differentially methylated.

  • stateInherite a vector of integer (0 and 1) with length corresponding the length of stateInfo. The vector indicates, using a 1, the positions where the CpG values are inherited.

when is NULL generate a new ones with getDiffMeth.

saveGRanges

a logical, when true, the package save two files type. The first generate for each simulation contains a list. The length of the list corresponds to the number of generation. The generation are stored in order (first entry = first generation, second entry = second generation, etc..). All samples related to one generations are contained in a GRangesList. The GRangeaList store a list of GRanges. Each GRanges stores the raw mehylation data of one sample. The second file a numeric vector denoting controls and cases (a file is generates by entry in the vector parameters vNbSample).

saveMethylKit

a logical, when TRUE, the package save a file contains a list. The length of the list corresponds to the number of generation. The generation are stored in order (first entry = first generation, second entry = second generation, etc..). All samples related to one generations are contained in a S4 methylRawList object. The methylRawList object contains two Slots: 1. treatment: A numeric vector denoting controls and cases. 2. .Data: A list of methylRaw objects. Each object stores the raw methylation data of one sample.

runAnalysis

a logical, if TRUE, two files are saved :

  • 1. The first file is the methylObj... file formated with the methylkit package in a S4 methylBase object (with the methylKit functions: filterByCoverage, normalizeCoverage and unite).

  • 2. The second file contains a S4 calculateDiffMeth object generated with the methylKit functions calculateDiffMeth using the first file.

Value

0 indicating that the function has been successful.

Author(s)

Pascal Belleau, Astrid Deschenes

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
## Name of the directory that will contained the generated files
temp_dir <- "test_simInheritance"

## Load dataset
data(dataSimExample)

## Generate a stateDiff object with length corresponding to
## nbBlock * nbCpG from stateInformation
stateDiff <- list()
stateDiff[["stateDiff"]] <- c(1, 0, 1)
stateDiff[["stateInherite"]] <- c(1, 0, 0)

## Simulate multigenerational methylation experiment with inheritance
methInheritSim:::simInheritance(pathOut = temp_dir,
    pref = "S1_6_0.9_0.8_0.5", k = 1, nbCtrl = 6, nbCase = 6, 
    treatment = dataSimExample$treatment, 
    sample.id = dataSimExample$sample.id,
    generation = 3, stateInfo = dataSimExample$stateInfo[1:3],
    propDiff = 0.9, propDiffsd = 0.1, diffValue = 0.8, 
    propInheritance = 0.5, rateDiff = 0.3, minRate = 0.3,
    propInherite = 0.3, propHetero = 0.5, minReads = 10, maxPercReads = 99, 
    assembly="RNOR_5.0", context="Cpg", meanCov = 40, diffRes = stateDiff,
    saveGRanges = FALSE, saveMethylKit = FALSE, runAnalysis = FALSE)

## Delete directory
if (dir.exists(temp_dir)) {
    unlink(temp_dir, recursive = TRUE, force = FALSE)
}

belleau/methylInheritanceSim documentation built on April 1, 2020, 2:43 p.m.