R/methylInheritanceSimMethodsNew.R

Defines functions runSim

Documented in runSim

#' @title Simulate a multigeneration methylation experiment with inheritance 
#' 
#' @description Simulate a multigeneration methylation case versus control 
#' experiment 
#' with inheritance relation using a real control dataset. 
#' 
#' The simulation can 
#' be parametrized to fit different models. The number of cases and controls, 
#' the proportion of the case affected 
#' by the treatment (penetrance), the effect of the treatment on the mean of 
#' the distribution, the proportion of sites inherited, the proportion of the 
#' differentially methylated sites from the precedent generation inherited, 
#' etc..
#' 
#' The function simulates a multigeneration dataset like a bisulfite 
#' sequencing experiment. The simulation includes the information about 
#' control and case for each generation. Simulation dataset are saved in
#' multiple files created in the directory specified by the user.
#' 
#' @param methData an object of class \code{methylBase}, the CpG information
#' from controls (CTRL) that will be used to create the synthetic chromosome. 
#' The \code{methData} object can also contain information from cases but 
#' only the controls are used.
#' 
#' @param nbSynCHR a positive \code{integer}, the number of distinct synthetic 
#' chromosomes that will be generated. Default: \code{1}.
#' 
#' @param nbSimulation a positive \code{integer}, the number of simulations 
#' generated for each parameter (\code{vNbSample}, \code{vpDiff}, 
#' \code{vDiff} and \code{vInheritance}). 
#' The total number of simulation is 
#' nbSimulation * \code{length(vNbSample)} * \code{length(vpDiff)} *
#' \code{length(vInheritance)})
#' Default: \code{10}.
#'
#' @param nbBlock a positive \code{integer}, the number of blocks used 
#' for sampling.
#' Default: \code{100}.
#'
#' @param nbCpG a positive \code{integer}, the number of consecutive CpG 
#' positions used for sampling from \code{methInfo}.
#' Default: \code{50}.
#'
#' @param nbGeneration a positive \code{integer}, the number of generations
#' simulated.
#' Default: \code{3}.
#' 
#' @param vNbSample a \code{vector} of distinct positive \code{integer}, 
#' the number of controls (CTRL) and cases in the simulated dataset. In 
#' the simulated dataset, the number of CTRL equals the number of cases. 
#' The number of CTRL do not need to be equal to the number of Case in
#' the real \code{methData} dataset.
#' Default: \code{c(3, 6)}.
#'
#' @param vpDiff a \code{vector} of distinct \code{double} superior to 
#' \code{0} and inferior or equal 
#' to \code{1}, the mean value for the proportion of samples that will have,
#' for a specific position, differentially methylated values. It can be 
#' interpreted as the penetrance. Note that \code{vpDiff} and \code{vpDiffsd}
#'  must be the same length.
#' Default: \code{c(0.9)}.
#' 
#' @param vpDiffsd a \code{vector} of a non-negative \code{double}, the 
#' standard deviation associated to the \code{vpDiff}. Note that 
#' \code{vpDiff} and \code{vpDiffsd} must be the same length.
#' Default: \code{c(0.1)}.
#'
#' @param vDiff a \code{vector} of distinct non-negative \code{double} 
#' included in [0,1], the proportion of C/T for a case differentially 
#' methylated that follows 
#' a beta distribution where the mean is shifted by \code{vDiff} 
#' from the CTRL distribution.
#' Default: \code{c(0.8)}.
#'
#' @param vInheritance a \code{vector} of distinct non-negative \code{double} 
#' included in [0,1], the proportion of cases 
#' that inherits differentially methylated sites.
#' Default: \code{c(0.5)}.
#' 
#' @param rateDiff a positive \code{double} inferior to \code{1}, the mean of 
#' the chance that a site is differentially 
#' methylated.
#' Default: \code{0.01}.
#'
#' @param minRate a non-negative \code{double} inferior to \code{1}, the 
#' minimum rate for differentially methylated sites.
#' Default: \code{0.01}.
#'
#' @param propInherite a non-negative \code{double} inferior or equal 
#' to \code{1}, 
#' the proportion of differentially methylated regions that is inherited.
#' Default: \code{0.3}.
#'
#' @param propHetero a non-negative \code{double} between [0,1], the 
#' reduction of \code{vDiff} for the second and following generations.
#' Default: \code{0.5}.
#' 
#' @param keepDiff a \code{logical}, when \code{TRUE}, the 
#' differentially methylated sites
#' will be the same for all simulated datasets. Datasets generated using 
#' differents parameter values from vector parameters (\code{vpDiff}, 
#' \code{vDiff} and \code{vInheritance}) wil all have the same differentially 
#' methylated sites.
#' Default: \code{FALSE}.
#' 
#' @param outputDir a string of \code{character} or \code{NULL}, the path 
#' where the 
#' files created by the function will be saved. When \code{NULL}, the files
#' are saved in a directory called "outputDir" that is located in 
#' the current directory. Default: \code{NULL}.
#'
#' @param fileID a string of \code{character}, a identifiant that will be 
#' included in each output file name. Each output 
#' file name is 
#' composed of those elements, separated by "_":
#' \itemize{ 
#' \item a type name, ex: methylGR, methylObj, etc..
#' \item a \code{fileID}
#' \item the chromosome number, a number between 1 and \code{nbSynCHR}
#' \item the number of samples, a number in the \code{vNbSample} \code{vector}
#' \item the mean proportion of samples that has,
#' for a specific position, differentially methylated values, a 
#' number in the \code{vpDiff} \code{vector}
#' \item the proportion of 
#' C/T for a case differentially methylated that follows a shifted beta 
#' distribution, a
#' number in the \code{vDiff} \code{vector}
#' \item the 
#' proportion of cases that inherits differentially sites, a number in the
#' \code{vInheritance} \code{vector}
#' \item the identifiant for the simulation, a number 
#' between 1 and \code{nbSimulation}
#' \item the file extension ".rds"
#' }
#' Default: \code{"s"}.
#' 
#' @param minReads a positive \code{integer}, sites and regions having lower
#' coverage than this count are discarded. The parameter
#' corresponds to the \code{lo.count} parameter in 
#' the \code{methylKit} package.
#' Default: \code{10}.
#' 
#' @param maxPercReads a \code{double} between [0,100], the percentile of read
#' counts that is going to be used as upper cutoff. Sites and regions
#' having higher
#' coverage than \code{maxPercReads} are discarded. This parameter is used for 
#' both CpG sites and tiles analysis. The parameter
#' correspond to the \code{hi.perc} parameter in the \code{methylKit} package.
#' Default: \code{99.9}.
#' 
#' @param meanCov a positive \code{integer}, the mean of the coverage
#' at the simulated CpG sites.
#' Default: \code{80}.
#' 
#' @param context a string of \code{character}, the short description of the 
#' methylation context, such as "CpG", "CpH", "CHH", etc.. 
#' Default: \code{"CpG"}.
#' 
#' @param assembly a string of \code{character}, the short description of the 
#' genome assembly, such as "mm9", "hg18", etc..
#' Default: \code{"Rnor_5.0"}.
#' 
#' @param saveGRanges a \code{logical}, when \code{true}, the package save two 
#' files type. The first generate for each simulation contains a \code{list}. 
#' The length of the \code{list} corresponds to the number of generation. 
#' The generation are stored in order (first entry = first generation, 
#' second entry = second generation, etc..). All samples related to one 
#' generations are contained in a \code{GRangesList}. 
#' The \code{GRangeaList} store a \code{list} of \code{GRanges}. Each 
#' \code{GRanges} stores the raw mehylation data of one sample.
#' The second file a numeric \code{vector} denoting controls and cases 
#' (a file is generates by entry in the \code{vector} parameters 
#' \code{vNbSample}).
#' Default: \code{TRUE}.
#' 
#' @param saveMethylKit a \code{logical}, when \code{TRUE}, for each 
#' simulations save a file contains a \code{list}. The length of the 
#' \code{list} corresponds to the number of generation. The generation are 
#' stored in order (first entry = first generation, 
#' second entry = second generation, etc..). All samples related to one 
#' generations are contained in a S4 \code{methylRawList} object. The 
#' \code{methylRawList} object contains two Slots:
#' 1. treatment: A numeric \code{vector} denoting controls and cases.
#' 2. .Data: A \code{list} of \code{methylRaw} objects. Each object stores the 
#' raw methylation data of one sample.
#' Default: \code{TRUE}.
#' 
#' @param runAnalysis a \code{logical}, if \code{TRUE}, two files are saved 
#' for each simulation:
#' \itemize{
#' \item 1. The first file is the methylObj... file formated with 
#' the \code{methylkit} 
#' package in a S4 \code{methylBase} object (using the \code{methylKit} 
#' functions: \code{filterByCoverage}, \code{normalizeCoverage} and 
#' \code{unite}).
#' \item 2. The second file contains a S4 \code{calculateDiffMeth} object 
#' generated 
#' using the \code{methylKit} functions \code{calculateDiffMeth} on the 
#' first file.
#' }
#' Default: \code{FALSE}.
#' 
#' @param nbCores a positive \code{integer}, the number of cores used when
#' creating the simulated datasets. Default: \code{1} and always 
#' \code{1} for Windows.
#' 
#' @param vSeed a \code{integer}, a seed used when reproducible results are
#' needed. When a value inferior or equal to zero is given, a random integer
#' is used. Default: \code{-1}.
#'
#' @return \code{0} indicating that the function have been
#' successful.
#' 
#' @seealso the vignette for detail description of the files created
#' by the simulation.
#' 
#' @examples
#'
#' ## Load dataset containing methyl information
#' data(samplesForChrSynthetic)
#' 
#' ## Set the output directory where files will be created
#' temp_dir <- "test_runSim"
#' 
#' ## Create 2 simulated dataset (nbSimulation = 2) 
#' ## over 3 generations (nbGenration = 3) with
#' ## 6 cases and 6 controls (nNbsample = 6) using only one set
#' ## of parameters (vpDiff = 0.9, vpDiffsd = 0.1, vDiff = 0.8)
#' runSim(methData = samplesForChrSynthetic, nbSynCHR = 1, nbSimulation = 2, 
#'     nbGeneration = 3, nbBlock = 10, nbCpG = 20, vNbSample = c(6), 
#'     vpDiff = c(0.9), vpDiffsd = c(0.1), vDiff = c(0.8), 
#'     vInheritance = c(0.5), rateDiff = 0.3, minRate = 0.2,
#'     propInherite = 0.3, propHetero = 0.5, outputDir = temp_dir, 
#'     fileID = "F", nbCores = 1, vSeed = 32)
#' 
#' ## Delete the output directory and its content
#' if (dir.exists(temp_dir)) {
#'     unlink(temp_dir, recursive = TRUE, force = FALSE)
#' }
#' 
#' @author Pascal Belleau, Astrid Deschenes
#' @importFrom parallel mclapply nextRNGSubStream
#' @export
runSim <- function(methData, nbSynCHR = 1, nbSimulation = 10, nbBlock = 100, 
                nbCpG = 50, nbGeneration = 3, vNbSample = c(3, 6), 
                vpDiff = c(0.9), vpDiffsd = c(0.1), vDiff = c(0.8), 
                vInheritance = c(0.5), rateDiff = 0.01, minRate = 0.01, 
                propInherite = 0.3, propHetero = 0.5, keepDiff = FALSE, 
                outputDir = NULL, fileID = "s", minReads = 10, 
                maxPercReads = 99.9, meanCov = 80, context = "CpG", 
                assembly="Rnor_5.0", saveGRanges = TRUE, saveMethylKit = TRUE, 
                runAnalysis = FALSE, nbCores = 1, vSeed = -1) {
    
    ## Validate all parameters
    validateRunSimParameters(vpDiff = vpDiff, vpDiffsd = vpDiffsd, 
        vDiff = vDiff, vInheritance = vInheritance, 
        propInherite = propInherite, rateDiff = rateDiff, minRate = minRate, 
        propHetero = propHetero, maxPercReads = maxPercReads, 
        nbSynCHR = nbSynCHR, nbSimulation = nbSimulation, nbBlock = nbBlock, 
        nbCpG = nbCpG, vNbSample = vNbSample, nbGeneration = nbGeneration, 
        minReads = minReads, meanCov = meanCov, nbCores = nbCores, 
        vSeed = vSeed, keepDiff = keepDiff, saveGRanges = saveGRanges, 
        saveMethylKit = saveMethylKit, runAnalysis = runAnalysis, 
        outputDir = outputDir, fileID = fileID, methData = methData, 
        context = context, assembly = assembly)
    
    if (is.null(outputDir)) {
        outputDir <- "outputDir"
    }
    
    ## Remove ending slash
    if (substring(outputDir, nchar(outputDir), nchar(outputDir)) == "/") {
        outputDir <- substring(outputDir, 1, nchar(outputDir) - 1)
    }
    
    ## Create directory
    if (!dir.exists(outputDir)) {
        dir.create(outputDir, showWarnings = TRUE)
    }
    
    result <- runOnEachSynCHR(methData = methData, nbSynCHR = nbSynCHR, 
                nbSimulation = nbSimulation, nbBlock = nbBlock, nbCpG = nbCpG, 
                nbGeneration = nbGeneration, vNbSample = vNbSample, 
                vpDiff = vpDiff, vpDiffsd = vpDiffsd, vDiff = vDiff, 
                vInheritance = vInheritance, rateDiff = rateDiff, 
                minRate = minRate, propInherite = propInherite, 
                propHetero = propHetero, outputDir = outputDir, 
                fileID = fileID, minReads = minReads, 
                maxPercReads = maxPercReads, meanCov = meanCov, 
                context = context, assembly = assembly, keepDiff = keepDiff, 
                saveGRanges = saveGRanges, saveMethylKit = saveMethylKit, 
                runAnalysis = runAnalysis, nbCores = nbCores, vSeed = vSeed)
    
    return(result)
}
belleau/methInheritSim documentation built on April 1, 2020, 2:43 p.m.