knitr::opts_chunk$set(
  collapse = TRUE,
  cache = TRUE,
  comment = "#>",
  fig.path = "../man/figures/"
)

Here we walk through a simple example of building a simulatr specifier object, checking and running the simulation locally, and visualizing the results. We consider estimating the coefficients in a linear regression model via ordinary least squares and lasso, varying the number of samples.

library(simulatr)
library(ggplot2)

1. Assemble simulation components

2. Create a simulatr specifier object

This is the easiest step; just pass all four of the above components to the function simulatr_specifier():

simulatr_spec <- simulatr_specifier(
  parameter_grid,
  fixed_parameters,
  generate_data_function, 
  run_method_functions,
  evaluation_functions
)

3. Check and, if necessary, update the simulatr specifier object

check_results <- check_simulatr_specifier_object(simulatr_spec, B_in = 2)

This message tells us that the simulation did not encounter any errors for the first two data realizations. We are free to move on to running the full simulation.

4. Run the simulation on your laptop

Since this example simulation is small, we can run it on your laptop in RStudio:

sim_results <- check_simulatr_specifier_object(simulatr_spec)

5. Summarize and/or visualize the results

Let's take a look at the results:

sim_results$metrics

We have both the mean and Monte Carlo standard error for the metric (RMSE) for each method in each problem setting. We can plot these as follows:

sim_results$metrics |>
  ggplot(aes(x = n, 
             y = mean, 
             ymin = mean - 2*se, 
             ymax = mean + 2*se, 
             color = method)) +
  geom_point() + 
  geom_line() + 
  geom_errorbar(width = 1) +
  labs(x = "Sample size",
       y = "RMSE") + 
  theme(legend.position = "bottom")

It looks like lasso performs better for small sample sizes but OLS performs better for large sample sizes.



timothy-barry/simulatr documentation built on Sept. 6, 2024, 7:10 p.m.