In ErickChacon/datasim: Data Simulation Based on Models Formulas

knitr::opts_chunk$set(echo = TRUE, fig.align = "center")
knitr::opts_chunk$set(fig.width = 6, fig.height = 4)
knitr::opts_chunk$set(comment = "#>")
options(width = 100)

In this vignette, we show how to simulate multivariate linear Gaussian models.

Required packages

library(datasim)
library(ggplot2)

Multivariate Linear Model

First, we need to define a list of formulas specifying the type of effect that are included in the linear predictor of each parameter. For example, this list can be defined as follows.

f <- list(
  mean ~ mfe(x, beta = c(-10, 0, 10)),
  sd ~ mfa(ones, beta = matrix(c(0.1, 0.5, 3), nrow = 1))
  )

In this formula, it can be seen a multivariate fixed effect on x included on the mean parameter and a multivariate factor effect on ones. Note the given that length of beta is three, then the number of response variables is three.

Simulate the dataset

The data for our model can be simulated with the function sim_model. The two main arguments of sim_model function when working with multivariate linear Gaussian models, are the formula, the sample size n and the number of responses. In order to obtain a reproducible dataset, a seed must be defined with the function set.seed or by using the argument seed in sim_model.

data_model <- sim_model(formula = f, n = 1000, responses = 3, seed = 1)

The first 10 rows of the generated dataset looks as follows:

knitr::kable(head(data_model, 10))

it contains an unique id for each individual, all the predictors included in the formula (i.e. x1 and ones), the parameters (mean and sd), the simulated response variable, and the number of response response_label.

Visualizing the multivariate fixed effects

ggplot(data_model, aes(x, response)) +
  geom_point(aes(col = factor(response_label)), alpha = 1/5)

Multivariate Linear Mixed Model

Simulate the dataset

Cor <- matrix(c(1, -0.9, 0, -0.9, 1, 0, 0, 0, 1), nrow = 3)
vars <- diag(c(1,1,3))
Sigma <- vars %*% Cor %*% vars

f <- list(
  mean ~ mfe(x, beta = c(-10, 0, 10)) + mre(factor(id), sigma = get("Sigma")),
  sd ~ I(0)
  )

data_model <- sim_model(formula = f, n = 1000, seed = 1, responses = 3)
knitr::kable(head(data_model, 10))

Visualizing the multivariate fixed effects

ggplot(data_model, aes(x, response)) +
  geom_point(aes(col = factor(response_label)), alpha = 1/5)

Visualizing the multivariate random effects

ggplot(data_model, aes(id, mre.factor.mean)) +
  geom_line(aes(col = factor(response_label)), alpha = 0.8) +
  facet_wrap(~ response_label, ncol = 1, scales = "free_y") +
  theme(legend.position = "bottom")

cor(matrix(data_model$mre.factor.mean, ncol = 3))