simData: simData

Description Usage Arguments Value Author(s) Examples

Description

simData is a function to perform non-parametric bootstrap resampling

on a list of (original) data sets, both on set level and patient level,

in order to simulate independent genomic sets.

Usage

1
2
3
4
simData(obj, n.samples, y.vars = list(), type = "two-steps", 


    balance.variables = NULL)

Arguments

obj

a list of ExpressionSets, matrices or RangedSummarizedExperiments. If

elements are matrices, columns represent samples

n.samples

an integer indicating how many samples should be resampled from each set

y.vars

a list of response variables, can be Surv object, or matrix or data.frame

with two columns

type

string "one-step" or "two-steps". If type="one-step", the function will

skip resampling the datasets, and directly resample from the original list

of obj

balance.variables

balance.variables will be a vector of covariate names that should be

balanced in the simulation. After balancing, the prevalence of covariate

in each result set should be the same as the overall distribution across

all original data sets. Default is set as NULL, when it will not balance

over any covariate. if isn't NULL, esets parameter should only be of class

ExpressionSet

Value

returns a list of simulated ExpressionSets, with names indicating its original set, and indices of the original patients.

prob.desired and prob.real are only useful when balance.varaibles is set.

prob.desired shows overall distrubition of the specified covariate. prob.list

shows the sampling probability in each set after balancing

Author(s)

Yuqing Zhang, Christoph Bernau, Levi Waldron

Examples

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
library(curatedOvarianData)


library(GenomicRanges)





data(E.MTAB.386_eset)


data(GSE14764_eset)


esets.list <- list(E.MTAB.386=E.MTAB.386_eset[1:100, 1:10], GSE14764=GSE14764_eset[1:100, 1:10])


rm(E.MTAB.386_eset, GSE14764_eset)





## simulate on multiple ExpressionSets


set.seed(8)


# one-step bootstrap: skip resampling set labels


simmodels <- simData(esets.list, 20, type="one-step")  


# two-step-non-parametric bootstrap


simmodels <- simData(esets.list, 10, type="two-steps")





## simulate one set


simmodels <- simData(list(esets.list[[1]]), 10, type="two-steps")





## balancing covariates


# single covariate


simmodels <- simData(list(esets.list[[1]]), 5, balance.variables="tumorstage")





# multiple covariates


simmodels <- simData(list(esets.list[[1]]), 5, 


                     balance.variables=c("tumorstage", "age_at_initial_pathologic_diagnosis"))  





## Support matrices


X.list <- lapply(esets.list, function(eset){


  return(exprs(eset))


})


simmodels <- simData(X.list, 20, type="two-steps")





## Support RangedSummarizedExperiment


nrows <- 200; ncols <- 6


counts <- matrix(runif(nrows * ncols, 1, 1e4), nrows)


rowRanges <- GRanges(rep(c("chr1", "chr2"), c(50, 150)),


                     IRanges(floor(runif(200, 1e5, 1e6)), width=100),


                     strand=sample(c("+", "-"), 200, TRUE))


colData <- DataFrame(Treatment=rep(c("ChIP", "Input"), 3),


                     row.names=LETTERS[1:6])


sset <- SummarizedExperiment(assays=SimpleList(counts=counts),


                             rowRanges=rowRanges, colData=colData)





s.list <- list(sset[,1:3], sset[,4:6])


simmodels <- simData(s.list, 20, type="two-steps")

simulatorZ documentation built on Nov. 8, 2020, 5 p.m.