Description Usage Arguments Value Author(s) Examples
simData is a function to perform non-parametric bootstrap resampling
on a list of (original) data sets, both on set level and patient level,
in order to simulate independent genomic sets.
1 2 3 4 |
obj |
a list of ExpressionSets, matrices or RangedSummarizedExperiments. If elements are matrices, columns represent samples |
n.samples |
an integer indicating how many samples should be resampled from each set |
y.vars |
a list of response variables, can be Surv object, or matrix or data.frame with two columns |
type |
string "one-step" or "two-steps". If type="one-step", the function will skip resampling the datasets, and directly resample from the original list of obj |
balance.variables |
balance.variables will be a vector of covariate names that should be balanced in the simulation. After balancing, the prevalence of covariate in each result set should be the same as the overall distribution across all original data sets. Default is set as NULL, when it will not balance over any covariate. if isn't NULL, esets parameter should only be of class ExpressionSet |
returns a list of simulated ExpressionSets, with names indicating its original set, and indices of the original patients.
prob.desired and prob.real are only useful when balance.varaibles is set.
prob.desired shows overall distrubition of the specified covariate. prob.list
shows the sampling probability in each set after balancing
Yuqing Zhang, Christoph Bernau, Levi Waldron
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 | library(curatedOvarianData)
library(GenomicRanges)
data(E.MTAB.386_eset)
data(GSE14764_eset)
esets.list <- list(E.MTAB.386=E.MTAB.386_eset[1:100, 1:10], GSE14764=GSE14764_eset[1:100, 1:10])
rm(E.MTAB.386_eset, GSE14764_eset)
## simulate on multiple ExpressionSets
set.seed(8)
# one-step bootstrap: skip resampling set labels
simmodels <- simData(esets.list, 20, type="one-step")
# two-step-non-parametric bootstrap
simmodels <- simData(esets.list, 10, type="two-steps")
## simulate one set
simmodels <- simData(list(esets.list[[1]]), 10, type="two-steps")
## balancing covariates
# single covariate
simmodels <- simData(list(esets.list[[1]]), 5, balance.variables="tumorstage")
# multiple covariates
simmodels <- simData(list(esets.list[[1]]), 5,
balance.variables=c("tumorstage", "age_at_initial_pathologic_diagnosis"))
## Support matrices
X.list <- lapply(esets.list, function(eset){
return(exprs(eset))
})
simmodels <- simData(X.list, 20, type="two-steps")
## Support RangedSummarizedExperiment
nrows <- 200; ncols <- 6
counts <- matrix(runif(nrows * ncols, 1, 1e4), nrows)
rowRanges <- GRanges(rep(c("chr1", "chr2"), c(50, 150)),
IRanges(floor(runif(200, 1e5, 1e6)), width=100),
strand=sample(c("+", "-"), 200, TRUE))
colData <- DataFrame(Treatment=rep(c("ChIP", "Input"), 3),
row.names=LETTERS[1:6])
sset <- SummarizedExperiment(assays=SimpleList(counts=counts),
rowRanges=rowRanges, colData=colData)
s.list <- list(sset[,1:3], sset[,4:6])
simmodels <- simData(s.list, 20, type="two-steps")
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.