| data_sim | R Documentation |
A tidy reimplementation of the functions implemented in mgcv::gamSim()
that can be used to fit GAMs. An new feature is that the sampling
distribution can be applied to all the example types.
data_sim(
model = "eg1",
n = 400,
scale = NULL,
theta = 3,
power = 1.5,
dist = c("normal", "poisson", "binary", "negbin", "tweedie", "gamma", "ocat",
"ordered categorical"),
n_cat = 4,
cuts = c(-1, 0, 5),
seed = NULL,
gfam_families = c("binary", "tweedie", "normal")
)
model |
character; either |
n |
numeric; the number of observations to simulate. |
scale |
numeric; the level of noise to use. |
theta |
numeric; the dispersion parameter |
power |
numeric; the Tweedie power parameter. |
dist |
character; a sampling distribution for the response
variable. |
n_cat |
integer; the number of categories for categorical response.
Currently only used for |
cuts |
numeric; vector of cut points on the latent variable, excluding
the end points |
seed |
numeric; the seed for the random number generator. Passed to
|
gfam_families |
character; a vector of distributions to use in
generating data with grouped families for use with |
data_sim() can simulate data from several underlying models of
known true functions. The available options currently are:
"eg1": a four term additive true model. This is the classic Gu & Wahba
four univariate term test model. See gw_functions for more details of
the underlying four functions.
"eg2": a bivariate smooth true model.
"eg3": an example containing a continuous by smooth (varying
coefficient) true model. The model is \hat{y}_i = f_2(x_{1i})x_{2i} where the function f_2() is f_2(x) = 0.2 * x^{11} *
(10 * (1 - x))^6 + 10 * (10 * x)^3 * (1 - x)^{10}.
"eg4": a factor by smooth true model. The true model contains a factor
with 3 levels, where the response for the nth level follows the nth
Gu & Wabha function (for n \in {1, 2, 3}).
"eg5": an additive plus factor true model. The response is a linear
combination of the Gu & Wabha functions 2, 3, 4 (the latter is a null
function) plus a factor term with four levels.
"eg6": an additive plus random effect term true model.
´"eg7": a version of the model in "eg1"', but where the covariates are
correlated.
"gwf2": a model where the response is Gu & Wabha's
f_2(x_i) plus noise.
"lwf6": a model where the response is Luo & Wabha's "example 6"
function sin(2(4x-2)) + 2 exp(-256(x-0.5)^2) plus noise.
"gfam": simulates data for use with GAMs with
family = gfam(families). See example in mgcv::gfam(). If this model
is specified then dist is ignored and gfam_families is used to
specify which distributions are included in the simulated data. Can be a
vector of any of the families allowed by dist. For
"ocat" %in% gfam_families (or "ordered categorical"), 4 classes are
assumed, which can't be changed. Link functions used are "identity"
for "normal", "logit" for "binary", "ocat", and
"ordered categorical", and "exp" elsewhere.
The random component providing noise or sampling variation can follow one
of the distributions, specified via argument dist
"normal": Gaussian,
"poisson": Poisson,
"binary": Bernoulli,
"negbin": Negative binomial,
"tweedie": Tweedie,
"gamma": gamma , and
"ordered categorical": ordered categorical
Other arguments provide the parameters for the distribution.
Gu, C., Wahba, G., (1993). Smoothing Spline ANOVA with Component-Wise Bayesian "Confidence Intervals." J. Comput. Graph. Stat. 2, 97–117.
Luo, Z., Wahba, G., (1997). Hybrid adaptive splines. J. Am. Stat. Assoc. 92, 107–116.
data_sim("eg1", n = 100, seed = 1)
# an ordered categorical response
data_sim("eg1", n = 100, dist = "ocat", n_cat = 4, cuts = c(-1, 0, 5))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.