simData: simData
In simulatorZ: Simulator for Collections of Independent Genomic Data Sets

Description Usage Arguments Value Author(s) Examples

simData is a function to perform non-parametric bootstrap resampling

on a list of (original) data sets, both on set level and patient level,

in order to simulate independent genomic sets.

simData(obj, n.samples, y.vars = list(), type = "two-steps", 


    balance.variables = NULL)

`obj`	a list of ExpressionSets, matrices or RangedSummarizedExperiments. If elements are matrices, columns represent samples
`n.samples`	an integer indicating how many samples should be resampled from each set
`y.vars`	a list of response variables, can be Surv object, or matrix or data.frame with two columns
`type`	string "one-step" or "two-steps". If type="one-step", the function will skip resampling the datasets, and directly resample from the original list of obj
`balance.variables`	balance.variables will be a vector of covariate names that should be balanced in the simulation. After balancing, the prevalence of covariate in each result set should be the same as the overall distribution across all original data sets. Default is set as NULL, when it will not balance over any covariate. if isn't NULL, esets parameter should only be of class ExpressionSet

returns a list of simulated ExpressionSets, with names indicating its original set, and indices of the original patients.

prob.desired and prob.real are only useful when balance.varaibles is set.

prob.desired shows overall distrubition of the specified covariate. prob.list

shows the sampling probability in each set after balancing

Yuqing Zhang, Christoph Bernau, Levi Waldron

library(curatedOvarianData)


library(GenomicRanges)





data(E.MTAB.386_eset)


data(GSE14764_eset)


esets.list <- list(E.MTAB.386=E.MTAB.386_eset[1:100, 1:10], GSE14764=GSE14764_eset[1:100, 1:10])


rm(E.MTAB.386_eset, GSE14764_eset)





## simulate on multiple ExpressionSets


set.seed(8)


# one-step bootstrap: skip resampling set labels


simmodels <- simData(esets.list, 20, type="one-step")  


# two-step-non-parametric bootstrap


simmodels <- simData(esets.list, 10, type="two-steps")





## simulate one set


simmodels <- simData(list(esets.list[[1]]), 10, type="two-steps")





## balancing covariates


# single covariate


simmodels <- simData(list(esets.list[[1]]), 5, balance.variables="tumorstage")





# multiple covariates


simmodels <- simData(list(esets.list[[1]]), 5, 


                     balance.variables=c("tumorstage", "age_at_initial_pathologic_diagnosis"))  





## Support matrices


X.list <- lapply(esets.list, function(eset){


  return(exprs(eset))


})


simmodels <- simData(X.list, 20, type="two-steps")





## Support RangedSummarizedExperiment


nrows <- 200; ncols <- 6


counts <- matrix(runif(nrows * ncols, 1, 1e4), nrows)


rowRanges <- GRanges(rep(c("chr1", "chr2"), c(50, 150)),


                     IRanges(floor(runif(200, 1e5, 1e6)), width=100),


                     strand=sample(c("+", "-"), 200, TRUE))


colData <- DataFrame(Treatment=rep(c("ChIP", "Input"), 3),


                     row.names=LETTERS[1:6])


sset <- SummarizedExperiment(assays=SimpleList(counts=counts),


                             rowRanges=rowRanges, colData=colData)





s.list <- list(sset[,1:3], sset[,4:6])


simmodels <- simData(s.list, 20, type="two-steps")