knitr::opts_chunk$set(echo = TRUE, fig.align = "center") knitr::opts_chunk$set(fig.width = 6, fig.height = 4) knitr::opts_chunk$set(comment = "#>") options(width = 100)
In this vignette, we show how to use the three main functions of the package
datasim
which are sim_data
, model_frame
and model_response
. This is an
introductory tutorial and only the simulation of linear Gaussian models are
presented.
library(datasim)
First, we need to define a list of formulas specifying the type of effect that are included in the linear predictor of each parameter. For example, this list can be defined as follows.
f <- list( mean ~ I(5) + I(0.5 * x1) + fa(sex, beta = c(0, 1)), sd ~ I(1) )
In this formula, it can be seen that an intercept, a linear effect on x1
and a
factor effect on sex
are being included on the mean
parameter, while the
standard deviation sd
is constant. The simulation of the dataset can be done with
the function sim_model
, which implements the simulation in two parts:
model_frame
.model_frame
, the response variable is simulated with the function model_response
.The name of these two functions model_frame
and model_response
were defined in
similarity to the functions model.frame
and model.response
, which return the
predictors and response variable for a given formula
and data.frame
.
The data for our model can be simulated with the function sim_model
.
The two main arguments of sim_model
function, when working with linear Gaussian
models, are the formula
and the sample size n
.
In order to obtain a reproducible dataset, a seed must be defined
with the function set.seed
or by using the argument seed
in sim_model
.
data_model <- sim_model(formula = f, n = 100, seed = 1)
The first 10 rows of the generated dataset looks as follows:
knitr::kable(head(data_model, 10))
it contains an unique id
for each individual, all the predictors included in the
formula
(i.e. x1
and sex
), the parameters (mean
and sd
), and the simulated
response
variable.
Some customization for the effects can be used, for instance the labels of the factor
can be included with the option levels
in the function fa
.
f <- list( mean ~ I(5) + I(0.5 * x1) + fa(sex, beta = c(0, 1), levels = c("male", "female")), sd ~ I(1) ) data_model <- sim_model(formula = f, n = 100, seed = 1)
The modified formula, generates the same dataset, but with labeled levels for the
factor sex
.
knitr::kable(head(data_model, 10))
sim_model
simulate the entire dataset. If only predictors want to be simulated to
have more control, the function model_frame
can be used. The two main arguments
of this function are the formula
and the sample size n
.
data_frame <- model_frame(formula = f, n = 100, seed = 1)
The first 10 rows of the generated dataset looks as follows. As expected, only the covariates are simulated.
knitr::kable(head(data_frame, 10))
If the covariates are already obtained, we can simulate the response variable using the
function model_response
.
data_frame <- model_response(data_frame, formula = f)
The first 10 rows of the generated dataset looks as follows. As expected, only the response variable and associated parameters are simulated.
knitr::kable(head(data_frame, 10))
Notice that the same results can be obtained using the either sim_model
alone or
model_frame
and model_response
together.
data_model <- model_frame(f, n = 100, seed = 1) %>% model_response()
knitr::kable(head(data_model, 10))
Once the data is simulated, we can use it to compare models, effects, etc. For example, we can fit a linear model to our simulated data with.
lm_data <- lm(response ~ x1 + sex, data_frame) lm_sum <- summary(lm_data)
knitr::kable(lm_sum$coefficients)
The estimated effects are as expected, close to 0.5
for the effect of x1
, close
to 1
for the effect of females
and an intercept close to 5
.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.