data_sim | R Documentation |
A tidy reimplementation of the functions implemented in mgcv::gamSim()
that can be used to fit GAMs. An new feature is that the sampling
distribution can be applied to all the example types.
data_sim(
model = "eg1",
n = 400,
scale = NULL,
theta = 3,
power = 1.5,
dist = c("normal", "poisson", "binary", "negbin", "tweedie", "gamma", "ocat",
"ordered categorical"),
n_cat = 4,
cuts = c(-1, 0, 5),
seed = NULL,
gfam_families = c("binary", "tweedie", "normal")
)
model |
character; either |
n |
numeric; the number of observations to simulate. |
scale |
numeric; the level of noise to use. |
theta |
numeric; the dispersion parameter |
power |
numeric; the Tweedie power parameter. |
dist |
character; a sampling distribution for the response
variable. |
n_cat |
integer; the number of categories for categorical response.
Currently only used for |
cuts |
numeric; vector of cut points on the latent variable, excluding
the end points |
seed |
numeric; the seed for the random number generator. Passed to
|
gfam_families |
character; a vector of distributions to use in
generating data with grouped families for use with |
data_sim()
can simulate data from several underlying models of
known true functions. The available options currently are:
"eg1"
: a four term additive true model. This is the classic Gu & Wahba
four univariate term test model. See gw_functions
for more details of
the underlying four functions.
"eg2"
: a bivariate smooth true model.
"eg3"
: an example containing a continuous by smooth (varying
coefficient) true model. The model is \hat{y}_i = f_2(x_{1i})x_{2i}
where the function f_2()
is f_2(x) = 0.2 * x^{11} *
(10 * (1 - x))^6 + 10 * (10 * x)^3 * (1 - x)^{10}
.
"eg4"
: a factor by smooth true model. The true model contains a factor
with 3 levels, where the response for the nth level follows the nth
Gu & Wabha function (for n \in {1, 2, 3}
).
"eg5"
: an additive plus factor true model. The response is a linear
combination of the Gu & Wabha functions 2, 3, 4 (the latter is a null
function) plus a factor term with four levels.
"eg6"
: an additive plus random effect term true model.
´"eg7": a version of the model in
"eg1"', but where the covariates are
correlated.
"gwf2"
: a model where the response is Gu & Wabha's
f_2(x_i)
plus noise.
"lwf6"
: a model where the response is Luo & Wabha's "example 6"
function sin(2(4x-2)) + 2 exp(-256(x-0.5)^2)
plus noise.
"gfam"
: simulates data for use with GAMs with
family = gfam(families)
. See example in mgcv::gfam()
. If this model
is specified then dist
is ignored and gfam_families
is used to
specify which distributions are included in the simulated data. Can be a
vector of any of the families allowed by dist
. For
"ocat" %in% gfam_families
(or "ordered categorical"
), 4 classes are
assumed, which can't be changed. Link functions used are "identity"
for "normal"
, "logit"
for "binary"
, "ocat"
, and
"ordered categorical"
, and "exp"
elsewhere.
The random component providing noise or sampling variation can follow one
of the distributions, specified via argument dist
"normal"
: Gaussian,
"poisson"
: Poisson,
"binary"
: Bernoulli,
"negbin"
: Negative binomial,
"tweedie"
: Tweedie,
"gamma"
: gamma , and
"ordered categorical"
: ordered categorical
Other arguments provide the parameters for the distribution.
Gu, C., Wahba, G., (1993). Smoothing Spline ANOVA with Component-Wise Bayesian "Confidence Intervals." J. Comput. Graph. Stat. 2, 97–117.
Luo, Z., Wahba, G., (1997). Hybrid adaptive splines. J. Am. Stat. Assoc. 92, 107–116.
data_sim("eg1", n = 100, seed = 1)
# an ordered categorical response
data_sim("eg1", n = 100, dist = "ocat", n_cat = 4, cuts = c(-1, 0, 5))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.