Description Usage Arguments Details Value References See Also Examples
Functions for fitting GAMLSS (generalized additive models for location, scale and shape) using boosting techniques. The algorithm iteratively rotates between the distribution parameters, updating one while using the current fits of the others as offsets (for details see Mayr et al., 2012).
1 2 3 4 5 6 7 8 9 10 11 12 13 | mboostLSS(formula, data = list(), families = GaussianLSS(),
control = boost_control(), weights = NULL, ...)
glmboostLSS(formula, data = list(), families = GaussianLSS(),
control = boost_control(), weights = NULL, ...)
gamboostLSS(formula, data = list(), families = GaussianLSS(),
control = boost_control(), weights = NULL, ...)
blackboostLSS(formula, data = list(), families = GaussianLSS(),
control = boost_control(), weights = NULL, ...)
## fit function:
mboostLSS_fit(formula, data = list(), families = GaussianLSS(),
control = boost_control(), weights = NULL,
fun = mboost, funchar = "mboost", call = NULL, ...)
|
formula |
a symbolic description of the model to be fit. See
|
data |
a data frame containing the variables in the model. |
families |
an object of class |
control |
a list of parameters controlling the algorithm. For
more details see |
weights |
a numeric vector of weights (optional). |
fun |
fit function. Either |
funchar |
character representation of fit function. Either |
call |
used to forward the call from |
... |
Further arguments to be passed to |
For information on GAMLSS theory see Rigby and Stasinopoulos (2005) or
the information provided at http://gamlss.org. For a tutorial on
gamboostLSS
see Hofner et al. (2014).
glmboostLSS
uses glmboost
to fit the
distribution parameters of a GAMLSS – a linear boosting model is
fitted for each parameter.
gamboostLSS
uses gamboost
to fit the
distribution parameters of a GAMLSS – an additive boosting model (by
default with smooth effects) is fitted for each parameter. With the
formula
argument, a wide range of different base-learners can
be specified (see baselearners
). The
base-learners inply the type of effect each covariate has on the
corresponding distribution parameter.
mboostLSS
uses mboost
to fit the
distribution parameters of a GAMLSS. The type of model (linear,
tree-based or smooth) is specified by fun
.
blackboostLSS
uses blackboost
to fit the
distribution parameters of a GAMLSS – a tree-based boosting model is
fitted for each parameter.
mboostLSS
, glmboostLSS
, gamboostLSS
and
blackboostLSS
all call mboostLSS_fit
while fun
is
the corresponding mboost
function, i.e., the same
function without LSS
. For further possible arguments see
these functions as well as mboost_fit
.
In all four fitting functions it is possible to specify one or
multiple mstop
and nu
values via
boost_control
. In the case of one single value, this
value is used for all distribution parameters of the GAMLSS model.
Alternatively, a (named) vector or a (named) list with separate values
for each component can be used to specify a seperate value for each
parameter of the GAMLSS model. The names of the list must correspond
to the names of the distribution parameters of the GAMLSS family. If
no names are given, the order of the mstop
or nu
values
is assumed to be the same as the order of the components in the
families
. For one-dimensional stopping, the user therefore can
specify, e.g., mstop = 100
via boost_control
. For
more-dimensional stopping, one can specify, e.g., mstop =
list(mu = 100, sigma = 200)
(see examples).
To (potentially) stabilize the model estimation by standardizing the
negative gradients one can use the argument stabilization
of
the families. See Families
for details.
An object of class mboostLSS
with corresponding methods to
extract information.
B. Hofner, A. Mayr, M. Schmid (2014). gamboostLSS: An R Package for Model Building and Variable Selection in the GAMLSS Framework. Technical Report, arXiv:1407.1774.
Mayr, A., Fenske, N., Hofner, B., Kneib, T. and Schmid, M. (2012): Generalized additive models for location, scale and shape for high-dimensional data - a flexible approach based on boosting. Journal of the Royal Statistical Society, Series C (Applied Statistics) 61(3): 403-427.
M. Schmid, S. Potapov, A. Pfahlberg, and T. Hothorn. Estimation and regularization techniques for regression models with multidimensional prediction functions. Statistics and Computing, 20(2):139-150, 2010.
Rigby, R. A. and D. M. Stasinopoulos (2005). Generalized additive models for location, scale and shape (with discussion). Journal of the Royal Statistical Society, Series C (Applied Statistics), 54, 507-554.
Buehlmann, P. and Hothorn, T. (2007), Boosting algorithms: Regularization, prediction and model fitting. Statistical Science, 22(4), 477–505.
Families
for a documentation of available GAMLSS distributions.
The underlying boosting functions mboost
, gamboost
, glmboost
,
blackboost
are contained in the mboost
package.
See for example risk
or coef
for methods
that can be used to extract information from mboostLSS
objects.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 | ### Data generating process:
set.seed(1907)
x1 <- rnorm(1000)
x2 <- rnorm(1000)
x3 <- rnorm(1000)
x4 <- rnorm(1000)
x5 <- rnorm(1000)
x6 <- rnorm(1000)
mu <- exp(1.5 +1 * x1 +0.5 * x2 -0.5 * x3 -1 * x4)
sigma <- exp(-0.4 * x3 -0.2 * x4 +0.2 * x5 +0.4 * x6)
y <- numeric(1000)
for( i in 1:1000)
y[i] <- rnbinom(1, size = sigma[i], mu = mu[i])
dat <- data.frame(x1, x2, x3, x4, x5, x6, y)
### linear model with y ~ . for both components: 400 boosting iterations
model <- glmboostLSS(y ~ ., families = NBinomialLSS(), data = dat,
control = boost_control(mstop = 400),
center = TRUE)
coef(model, off2int = TRUE)
### estimate model with different formulas for mu and sigma:
names(NBinomialLSS()) # names of the family
# Note: Multiple formulas must be specified via a _named list_
# where the names correspond to the names of the distribution parameters
# in the family (see above)
model2 <- glmboostLSS(formula = list(mu = y ~ x1 + x2 + x3 + x4,
sigma = y ~ x3 + x4 + x5 + x6),
families = NBinomialLSS(), data = dat,
control = boost_control(mstop = 400, trace = TRUE),
center = TRUE)
coef(model2, off2int = TRUE)
### Offset needs to be specified via the arguments of families object:
model <- glmboostLSS(y ~ ., data = dat,
families = NBinomialLSS(mu = mean(mu),
sigma = mean(sigma)),
control = boost_control(mstop = 10),
center = TRUE)
# Note: mu-offset = log(mean(mu)) and sigma-offset = log(mean(sigma))
# as we use a log-link in both families
coef(model)
log(mean(mu))
log(mean(sigma))
### use different mstop values for the two distribution parameters
### (two-dimensional early stopping)
### the number of iterations is passed to boost_control via a named list
model3 <- glmboostLSS(formula = list(mu = y ~ x1 + x2 + x3 + x4,
sigma = y ~ x3 + x4 + x5 + x6),
families = NBinomialLSS(), data = dat,
control = boost_control(mstop = list(mu = 400,
sigma = 300),
trace = TRUE),
center = TRUE)
coef(model3, off2int = TRUE)
### Alternatively we can subset model2:
# here it is assumed that the first element in the vector corresponds to
# the first distribution parameter of model2 etc.
model2[c(400, 300)]
par(mfrow = c(1,2))
plot(model2, xlim = c(0, max(mstop(model2))))
## all.equal(coef(model2), coef(model3)) # same!
### WARNING: Subsetting via model[mstopnew] changes the model directly!
### For the original fit one has to subset again: model[mstop]
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.