In ellessenne/rsimsum: Analysis of Simulation Studies Including Monte Carlo Error

options(width = 150)
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.align = "center", fig.height = 6, fig.width = 6,
  out.width = "75%"
)

Single Estimand

rsimsum supports custom input values for the true value of the estimand and for confidence intervals limits (used to calculate coverage probability).

To illustrate this feature, we can use the tt dataset (bundled with rsimsum):

library(rsimsum)
data("tt", package = "rsimsum")
head(tt)

This includes the results of a simulation study assessing robustness of the t-test when estimating the difference between means. The t-test assumes a t distribution, hence confidence intervals for the estimated mean are generally based on the t distribution. See for instance the example from the t-test documentation (?t.test):

t.test(extra ~ group, data = sleep)

We can incorporate custom confidence intervals by passing the name of two columns in data as the ci.limits argument:

s1 <- simsum(data = tt, estvarname = "diff", true = -1, se = "se", ci.limits = c("conf.low", "conf.high"), methodvar = "method", by = "dgm")
summary(s1, stats = "cover")

By doing so, we can incorporate different types of confidence intervals in the analysis of Monte Carlo simulation studies. Compare with the default setting:

s2 <- simsum(data = tt, estvarname = "diff", true = -1, se = "se", methodvar = "method", by = "dgm")
summary(s2, stats = "cover")

The ci.limits is also useful when using non-symmetrical confidence intervals, e.g. when using bootstrapped confidence intervals.

A pair of values can also be passed to rsimsum as the ci.limits argument:

s3 <- simsum(data = tt, estvarname = "diff", true = -1, se = "se", ci.limits = c(-1.5, -0.5), methodvar = "method", by = "dgm")
summary(s3, stats = "cover")

If you have a better example of the utility of this method please get in touch: I'd love to hear from you!

By default, simsum will calculate confidence intervals using normal-theory, Wald-type intervals. It is possible to use t-based critical values by providing a column for the (replication-specific) degrees of freedom (analogously as passing confidence bounds to ci.limits):

s4 <- simsum(data = tt, estvarname = "diff", true = -1, se = "se", df = "df", methodvar = "method", by = "dgm")

Given that the confidence intervals in (conf.low, conf.high) are obtained by using critical values from a t distribution, the results of s4 will be equivalent to the results of s1:

all.equal(tidy(s1), tidy(s4))

We can pass a column of values for true as well:

tt$true <- -1
s5 <- simsum(data = tt, estvarname = "diff", true = "true", se = "se", ci.limits = c("conf.low", "conf.high"), methodvar = "method", by = "dgm")
summary(s5, stats = "cover")

Compare with the default settings:

summary(s2, stats = "cover")

Finally, we could have multiple columns identifying methods as well. This uses the MIsim and MIsim2 datasets, which are bundled with {rsimsum}:

data("MIsim", package = "rsimsum")
data("MIsim2", package = "rsimsum")
head(MIsim)
head(MIsim2)

The syntax when calling simsum() is pretty much the same:

s6 <- simsum(data = MIsim, estvarname = "b", true = 0.50, se = "se", methodvar = "method")
s7 <- simsum(data = MIsim2, estvarname = "b", true = 0.50, se = "se", methodvar = c("m1", "m2"))

See the inferred methods:

print(s6)
print(s7)

And of course, the estimated performance measures are the same:

all.equal(tidy(s6)$est, tidy(s7)$est)

Multiple Estimands at Once

multisimsum can be as flexible as simsum. Remember the default behaviour:

data("frailty", package = "rsimsum")
ms1 <- multisimsum(
  data = frailty,
  par = "par", true = c(trt = -0.50, fv = 0.75),
  estvarname = "b", se = "se", methodvar = "model",
  by = "fv_dist"
)
summary(ms1, stats = "bias")

In this example, we pass the true values of each estimand as the named vector c(trt = -0.50, fv = 0.75).

Say instead we stored the true value of each estimand as a column in our dataset:

frailty$true <- ifelse(frailty$par == "trt", -0.50, 0.75)
head(frailty)

With this data structure, we can pass a string value to multisimsum that will identify the true column in our dataset:

ms2 <- multisimsum(
  data = frailty,
  par = "par", true = "true",
  estvarname = "b", se = "se", methodvar = "model",
  by = "fv_dist"
)
summary(ms2, stats = "bias")

We can confirm that we obtain the same results with the two approaches:

identical(tidy(ms1), tidy(ms2))

This approach is particularly useful when the true value might vary across replications (e.g. when it depends on the simulated dataset).

Of course, it can be combined with custom confidence interval limits for coverage as well:

frailty$conf.low <- frailty$b - qt(1 - 0.05 / 2, df = 10) * frailty$se
frailty$conf.high <- frailty$b + qt(1 - 0.05 / 2, df = 10) * frailty$se

ms3 <- multisimsum(
  data = frailty,
  par = "par", true = "true",
  estvarname = "b", se = "se", methodvar = "model",
  by = "fv_dist",
  ci.limits = c("conf.low", "conf.high")
)
summary(ms3, stats = "cover")

This will be completely different than before:

summary(ms2, stats = "cover")

Multiple columns identifying methods are supported with multisimsum() as well; examples are omitted here, but it works analogously as with simsum().

ellessenne/rsimsum documentation built on March 10, 2024, 1:21 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

Tweet to @rdrrHQ

GitHub issue tracker

ian@mutexlabs.com