knitr::opts_chunk$set(echo = TRUE, fig.align = "center")
knitr::opts_chunk$set(fig.width = 6, fig.height = 4)
knitr::opts_chunk$set(comment = "#>")
options(width = 100)

In this vignette, we show how to simulate a geostatistical model.

Required packages

library(datasim)
library(ggplot2)

Geostatistical Model

First, we need to define a list of formulas specifying the type of effect that are included in the linear predictor of each parameter. For example, this list can be defined as follows.

f <- list(
  mean ~ I(5) +
    gp(list(s1, s2), cor.model = "exp_cor", cor.params = list(phi = 0.02)),
  sd ~ I(1)
  )

In this formula, it can be seen that an intercept, a linear effect on x1 and an spatial Gaussian process in included on the mean parameter, while the standard deviation sd is constant. The simulation of the dataset can be done with the function sim_model.

Simulate the dataset

The two main arguments of sim_model function, when working with linear Gaussian models, are the formula and the sample size n. In order to obtain a reproducible dataset, a seed must be defined with the function set.seed or by using the argument seed in sim_model.

data_model <- sim_model(formula = f, n = 500, seed = 1)

The first 10 rows of the generated dataset looks as follows:

knitr::kable(head(data_model, 10))

it contains an unique id for each individual, the coordinates of the spatial effect (s1 and s2), the realization of the Gaussian process gp.list.mean, the parameters (mean and sd) and the simulated response variable.

Visualize

ggplot(data_model, aes(s1, s2)) +
  geom_point(aes(col = gp.list.mean)) +
  scale_colour_distiller(palette = "RdYlBu")

Simulate coordinates or predictors only

sim_model simulate the entire dataset. If only predictors want to be simulated to have more control, the function model_frame can be used. The two main arguments of this function are the formula and the sample size n.

data_frame <- model_frame(formula = f, n = 100, seed = 1)

The first 10 rows of the generated dataset looks as follows. As expected, only the covariates are simulated.

knitr::kable(head(data_frame, 10))

Simulate the response variable only

If the data is already obtained, we can simulate the response variable using the function model_response.

data_frame <- model_response(data_frame, formula = f)

The first 10 rows of the generated dataset looks as follows. As expected, only the covariates are simulated.

knitr::kable(head(data_frame, 10))


ErickChacon/datasim documentation built on March 25, 2020, 7:53 p.m.