summarize: Summarize simulation results

View source: R/summarize.R

summarizeR Documentation

Summarize simulation results

Description

This function calculates summary statistics for simulation results. Options for summary statistics include descriptive statistics (e.g. measures of center or spread) and inferential statistics (e.g. bias or confidence interval coverage). All summary statistics are calculated over simulation replicates within a single simulation level.

Usage

summarize(sim, ...)

Arguments

sim

A simulation object of class sim_obj, usually created by new_sim

...

Name-value pairs of summary statistic functions. The possible functions (names) are listed below. The value for each summary function is a list of summaries to perform.

  • mean: Each mean summary is a named list of three arguments. name gives a name for the summary, x gives the name of the variable in sim$results on which to calculate the mean, and na.rm indicates whether to exclude NA values when performing the calculation.

  • median: Each median summary is a named list of three arguments. name gives a name for the summary, x gives the name of the variable in sim$results on which to calculate the median, and na.rm indicates whether to exclude NA values when performing the calculation.

  • var: Each var (variance) summary is a named list of three arguments. name gives a name for the summary, x gives the name of the variable in sim$results on which to calculate the variance, and na.rm indicates whether to exclude NA values when performing the calculation.

  • sd: Each sd (standard deviation) summary is a named list of three arguments. name gives a name for the summary, x gives the name of the variable in sim$results on which to calculate the standard deviation, and na.rm indicates whether to exclude NA values when performing the calculation.

  • mad: Each mad (mean absolute deviation) summary is a named list of three arguments. name gives a name for the summary, x gives the name of the variable in sim$results on which to calculate the MAD, and na.rm indicates whether to exclude NA values when performing the calculation.

  • iqr: Each iqr (interquartile range) summary is a named list of three arguments. name gives a name for the summary, x gives the name of the variable in sim$results on which to calculate the IQR, and na.rm indicates whether to exclude NA values when performing the calculation.

  • min: Each min (minimum) summary is a named list of three arguments. name gives a name for the summary, x gives the name of the variable in sim$results on which to calculate the minimum, and na.rm indicates whether to exclude NA values when performing the calculation.

  • max: Each max (maximum) summary is a named list of three arguments. name gives a name for the summary, x gives the name of the variable in sim$results on which to calculate the maximum, and na.rm indicates whether to exclude NA values when performing the calculation.

  • quantile: Each quantile summary is a named list of four arguments. name gives a name for the summary, x gives the name of the variable in sim$results on which to calculate the quantile, prob is a number in [0,1] denoting the desired quantile, and na.rm indicates whether to exclude NA values when performing the calculation.

  • bias: Each bias summary is a named list of four arguments. name gives a name for the summary, estimate gives the name of the variable in sim$results containing the estimator of interest, truth is the estimand of interest (see Details), and na.rm indicates whether to exclude NA values when performing the calculation.

  • bias_pct: Each bias_pct summary is a named list of four arguments. name gives a name for the summary, estimate gives the name of the variable in sim$results containing the estimator of interest, truth is the estimand of interest (see Details), and na.rm indicates whether to exclude NA values when performing the calculation.

  • mse: Each mse (mean squared error) summary is a named list of four arguments. name gives a name for the summary, estimate gives the name of the variable in sim$results containing the estimator of interest, truth is the estimand of interest (see Details), and na.rm indicates whether to exclude NA values when performing the calculation.

  • mae: Each mae (mean absolute error) summary is a named list of four arguments. name gives a name for the summary, estimate gives the name of the variable in sim$results containing the estimator of interest, truth is the estimand of interest (see Details), and na.rm indicates whether to exclude NA values when performing the calculation.

  • coverage: Each coverage (confidence interval coverage) summary is a named list of five arguments. Either (estimate, se) or (lower, upper) must be provided. name gives a name for the summary, estimate gives the name of the variable in sim$results containing the estimator of interest, se gives the name of the variable in sim$results containing the standard error of the estimator of interest, lower gives the name of the variable in sim$results containing the confidence interval lower bound, upper gives the name of the variable in sim$results containing the confidence interval upper bound, truth is the estimand of interest, and na.rm indicates whether to exclude NA values when performing the calculation. See Details.

Details

  • For all summaries besides coverage, the name argument is optional. If name is not provided, a name will be formed from the type of summary and the column on which the summary is performed.

  • For all inferential summaries there are three ways to specify truth: (1) a single number, meaning the estimand is the same across all simulation replicates and levels, (2) a numeric vector of the same length as the number of rows in sim$results, or (3) the name of a variable in sim$results containing the estimand of interest.

  • There are two ways to specify the confidence interval bounds for coverage. The first is to provide an estimate and its associated se (standard error). These should both be variables in sim$results. The function constructs a 95% Wald-type confidence interval of the form (estimate - 1.96 se, estimate + 1.96 se). The alternative is to provide lower and upper bounds, which should also be variables in sim$results. In this case, the confidence interval is (lower, upper). The coverage is simply the proportion of simulation replicates for a given level in which truth lies within the interval.

Value

A data frame containing the result of each specified summary function as a column, for each of the simulation levels.

Examples

# The following is a toy example of a simulation, illustrating the use of
# the summarize function.
sim <- new_sim()
create_data <- function(n) { rpois(n, lambda=5) }
est_mean <- function(dat, type) {
  if (type=="M") { return(mean(dat)) }
  if (type=="V") { return(var(dat)) }
}
sim %<>% set_levels(n=c(10,100,1000), est=c("M","V"))
sim %<>% set_config(num_sim=5)
sim %<>% set_script(function() {
  dat <- create_data(L$n)
  lambda_hat <- est_mean(dat=dat, type=L$est)
  return (list("lambda_hat"=lambda_hat))
})
sim %<>% run()
sim %>% summarize(
  mean = list(name="mean_lambda_hat", x="lambda_hat"),
  mse = list(name="lambda_mse", estimate="lambda_hat", truth=5)
)

Avi-Kenny/SimEngine documentation built on June 23, 2022, 11:09 a.m.