Workhorse for simulation studies

Share:

Description

Generates data according to all provided constellations in dataGrid and applies all provided constellations in procGrid to them.

Usage

1
2
3
4
5
evalGrids(dataGrid, procGrid = expandGrid(proc = "length"),
  replications = 1, discardGeneratedData = FALSE, progress = FALSE,
  summary.fun = NULL, ncpus = 1L, cluster = NULL,
  clusterSeed = rep(12345, 6), clusterLibraries = NULL,
  clusterGlobalObjects = NULL, fallback = NULL, envir = globalenv(), ...)

Arguments

dataGrid

a data.frame where the first column is a character vector with function names. The other columns contain parameters for the functions specified in the first column. Parameters with NA are ignored.

procGrid

similar as dataGrid the first column must contain function names. The other columns contain parameters for the functions specified in the first column. The data generated according to dataGrid will always be passed to the first unspecified argument of the functions sepcified in the first column of procGrid.

replications

number of replications for the simulation

discardGeneratedData

if TRUE the generated data is deleted after all function constellations in procGrid have been applied. Otherwise, ALL generated data sets will be part of the returned object.

progress

if TRUE a progress bar is shown in the console.

summary.fun

univariate functions to summarize the results (numeric or logical) over the replications, e.g. mean, sd. Alternatively, summary.fun can be one function that may return a vector.

ncpus

a cluster of ncpus workers (R-processes) is created on the local machine to conduct the simulation. If ncpus equals one no cluster is created and the simulation is conducted by the current R-process.

cluster

a cluster generated by the parallel package that will be used to conduct the simulation. If cluster is specified, then ncpus will be ignored.

clusterSeed

if the simulation is done in parallel manner, then the combined multiple-recursive generator from L'Ecuyer (1999) is used to generate random numbers. Thus clusterSeed must be a (signed) integer vector of length 6. The 6 elements of the seed are internally regarded as 32-bit unsigned integers. Neither the first three nor the last three should be all zero, and they are limited to less than 4294967087 and 4294944443 respectively.

clusterLibraries

a character vector specifying the packages that should be loaded by the workers.

clusterGlobalObjects

a character vector specifying the names of R objects in the global environment that should be exported to the global environment of every worker.

fallback

must be missing or a character specfying a file. Every time when the data generation function is changed, the results so far obtained are saved in the file specified by fallback.

envir

must be provided if the functions specified in dataGrid or procGrid are not part of the global environment.

...

only needed to alert the user if some deprecated arguments were used.

Value

The returned object is a list of the class evalGrid, where the fourth element is a list of lists named simulation. simulation[[i]][[r]] contains:

data

the data set that was generated by the ith constellation (ith row) of dataGrid in the rth replication

results

a list containing nrow(procGrid) objects. The jth object is the returned value of the function specified by the jth constellation (jth row) of procGrid applied to the data set contained in data

Note

If cluster is provided by the user the function evalGrids will NOT stop the cluster. This has to be done by the user. Conducting parallel simulations by specifing ncpus will interally create a cluster and stop it after the simulation is done.

Author(s)

Marsel Scheer

See Also

as.data.frame.evalGrid

Examples

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
rng = function(data, ...) {
ret = range(data)
names(ret) = c("min", "max")
ret
}

# call runif(n=1), runif(n=2), runif(n=3)
# and range on the three "datasets"
# generated by runif(n=1), runif(n=2), runif(n=3)
eg = evalGrids(
 expandGrid(fun="runif", n=1:3),
 expandGrid(proc="rng"),
 rep=10
)
eg

# summarizing the results in a data.frame
as.data.frame(eg)

# we now generate data for a regression
# and fit different regression models

# not that we use SD and not sd (the
# reason for this is the cast() call below)
regData = function(n, SD){
 data.frame(
   x=seq(0,1,length=n),
   y=rnorm(n, sd=SD))
}

eg = evalGrids(
 expandGrid(fun="regData", n=20, SD=1:2),
 expandGrid(proc="lm", formula=c("y~x", "y~I(x^2)")),
 replications=2)

# can not be converted to data.frame, because
# an object of class "lm" can not converted to
# a data.frame
try(as.data.frame(eg))

# for the data.frame we just extract the r.squared
# from the fitted model
as.data.frame(eg, convert.result.fun=function(fit) c(rsq=summary(fit)$r.squared))

# for the data.frame we just extract the coefficients
# from the fitted model
df = as.data.frame(eg, convert.result.fun=coef)

# since we have done 2 replication we can calculate
# sum summary statistics
library("reshape")
df$replication=NULL
mdf = melt(df, id=1:7, na.rm=TRUE)
cast(mdf, ... ~ ., c(mean, length, sd))

# note if the data.frame would contain the column
# named "sd" instead of "SD" the cast will generate
# an error
names(df)[5] = "sd"
mdf = melt(df, id=1:7, na.rm=TRUE)
try(cast(mdf, ... ~ ., c(mean, length, sd)))


# extracting the summary of the fitted.model
as.data.frame(eg, convert.result.fun=function(x) {
 ret = coef(summary(x))
 data.frame(valueName = rownames(ret), ret, check.names=FALSE)
})



# we now compare to methods for
# calculating quantiles

# the functions and parameters
# that generate the data
N = c(10, 50, 100)
library("plyr")
dg = rbind.fill(
 expandGrid(fun="rbeta", n=N, shape1=4, shape2=4),
 expandGrid(fun="rnorm", n=N))

# definition of the two quantile methods
emp.q = function(data, probs) c(quantile(data, probs=probs))
nor.q = function(data, probs) {
 ret = qnorm(probs, mean=mean(data), sd=sd(data))
 names(ret) = names(quantile(1, probs=probs))
 ret
}

# the functions and parameters that are
# applied to the generate data
pg = rbind.fill(expandGrid(proc=c("emp.q", "nor.q"), probs=c(0.01, 0.025, 0.05)))

# generate data and apply quantile methods
set.seed(1234)
eg = evalGrids(dg, pg, replication=50, progress=TRUE)

# convert the results to a data.frame
df = as.data.frame(eg)
df$replication=NULL
mdf = melt(df, id=1:8, na.rm=TRUE)

# calculate, print and plot summary statistics
require("ggplot2")
print(a <- arrange(cast(mdf, ... ~ ., c(mean, sd)), n))
ggplot(a, aes(x=fun, y=mean, color=proc)) + geom_point(size=I(3)) + facet_grid(probs ~ n)