Description Usage Arguments Details Value Note Author(s) References See Also Examples
The function fits generalized linear models with regularized categorical effects, categorical effect modifiers, continuous effects and smooth effects. The model is specified by giving a symbolic description of the linear predictor and a description of the error distribution. Estimation employs different regularization and model selection strategies. These strategies are either a penalty or a forward selection strategy employing AIC/BIC. For non-differentiable penalties, a local quadratic approximation is employed, see Oelker and Tutz (2013).
1 2 3 4 5 6 7 8 9 10 11 12 | gvcm.cat(formula, data, family = gaussian, method = c("lqa", "AIC", "BIC"),
tuning = list(lambda=TRUE, specific=FALSE, phi=0.5, grouped.fused=0.5,
elastic=0.5, vs=0.5, spl=0.5), weights, offset, start, control,
model = FALSE, x = FALSE, y = FALSE, plot=FALSE, ...)
pest(x, y, indices, family = gaussian,
tuning = list(lambda=TRUE, specific=FALSE, phi=0.5, grouped.fused=0.5,
elastic=0.5, vs=0.5, spl=0.5), weights, offset, start = NULL,
control = cat_control(), plot=FALSE, ...)
abc(x, y, indices, family = gaussian, tuning = c("AIC", "BIC"),
weights, offset, start, control = cat_control(), plot=FALSE, ...)
|
formula |
an object of class |
data |
a data frame, containing the variables in the model |
family |
a |
method |
fitting method; one out of |
tuning |
a list; tuning parameters for penalized estimation; |
weights |
an optional weight vector (for the observations) |
offset |
an optional offset |
start |
initial values for the PIRLS algorithm for method |
control |
a list of parameters for controlling the fitting process; if emtpy, set to |
model |
for functions |
x, y |
for function |
plot |
logical; if |
indices |
for |
... |
further arguments passed to or from other methods |
A typical formula
has the form response ~ 1 + terms
; where response
is the response vector and terms
is a series of terms which specifies a linear predictor.
There are some special terms for regularized terms:
v(x, u, n="L1", bj=TRUE)
: varying coefficients enter the formula
as v(x,u)
where u
denotes the categorical effect modifier and x
the modfied covariate.
A varying intercept is denoted by v(1,u)
. Varying coefficients with categorical effect modifiers are penalized as described in Oelker et. al. 2012.
The argument bj
and the element phi
in argument tuning
allow for the described weights.
p(u, n="L1")
: ordinal/nominal covariates u
given as p(u)
are penalized as described in Gertheiss and Tutz (2010). For numeric covariates, p(u)
indicates a pure Lasso penalty.
grouped(u, ...)
: penalizes a group of covariates with the grouped Lasso penalty of Yuan and Lin (2006); so far, working for categorical covariates only
sp(x, knots=20, n="L2")
: implents a continuous x
covariate non-parametrically as f(x); f(x) is represented by centered evaluations of basis functions (cubic B-splines with number of knots = knots
); for n="L2"
, the curvature of f(x) is penalized by a Ridge penalty; see Eilers and Marx (1996)
SCAD(u)
: penalizes a covariate u
with the SCAD penalty by Fan and Li (2001); for categorical covariates u
, differences of coefficients are penalized by a SCAD penalty, see Gertheiss and Tutz (2010)
elastic(u)
: penalizes a covariate u
with the elastic net penalty by Zou and Hastie (2005); for categorical covariates u
, differences of coefficients are penalized by the elastic net penalty, see Gertheiss and Tutz (2010)
If the formula
contains no (varying) intercept, gvcm.cat
assumes a constant intercept. There is no way to avoid an intercept.
For specials p
and v
, there is the special argument n
:
if n="L1"
, the absolute values in the penalty are replaced by squares of the same terms;
if n="L2"
, the absolute values in the penalty are replaced by quadratic, Ridge-type terms;
if n="L0"
, the absolute values in the penalty are replaced by an indicator for non-zero entries of the same terms.
For methods "AIC"
and "BIC"
, the coefficients are not penalized but selected by a forward selection strategy whenever it makes sense;
for special v(x,u)
, the selection strategy is described in Oelker et. al. 2012; the approach for the other specials corresponds to this idea.
For binomial families the response can also be a success/failure rate or a two-column matrix with the columns giving the numbers of successes and failures.
Function pest
computes penalized estimates, that is, it implements method "lqa"
(PIRLS-algorithm).
Function abc
implements the forward selection strategy employing AIC/BIC.
Categorical effect modifiers and penalized categorical covariates are dummy coded as required by the penalty. If x
in v(x,u)
is binary, it is effect coded (first category refers to -1). Other covariates are coded like given by getOption
.
There is a summary function: summary.gvcm.cat
gvcm.cat
returns an object of class “gvcm.cat
” which inherits from class “glm
” which inherits from class “lm
”.
An object of class “gvcm.cat
” contains:
coefficients |
named vector of coefficients |
coefficients.reduced |
reduced vector of coefficients; selected coefficients/differences of coefficients are set to zero |
coefficients.refitted |
refitted vector of coefficients; i.e. maximum likelihood estimate of that model containing selected covariates only; same length as |
coefficients.oml |
maximum likelihood estimate of the full model |
residuals |
deviance residuals |
fitted.values |
fitted mean values |
rank |
degrees of freedom model; for |
family |
the |
linear.predictors |
linear fit on link scale |
deviance |
scaled deviance |
aic |
a version of Akaike's Information Criterion; minus twice the maximized log-likelihood plus twice the rank. For binomial and Poison families the dispersion is fixed at one. For a gaussian family the dispersion is estimated from the residual deviance, and the number of parameters is the rank plus one. |
null.deviance |
the deviance for the null model, comparable with |
iter |
number of iterations |
weights |
working weights of the final iteration |
df.residual |
the residual degrees of freedom/degrees of freedom error; computed like |
df.null |
the residual degrees of freedom for the null model |
converged |
logical; fulfills the PIRLS-algorithm the given convergence conditions? |
boundary |
logical; is the fitted value on the boundary of the attainable values? |
offset |
the offset vector used |
control |
the value of the |
contrasts |
the contrasts used |
na.action |
information returned by |
plot |
in principle, a list containing two matrixes needed for different types of plots:
if input option |
tuning |
a list, employed tuning parameters; if |
indices |
used index argument; see function |
number.selectable.parameters |
number of coefficients that could be selected |
number.removed.parameters |
number of actual removed coefficients |
x.reduction |
a matrix; transforms model frame |
beta.reduction |
a matrix; transforms the |
call |
the matched call |
formula |
the |
terms |
the |
data |
the data argument |
x, y |
if requested, the model matrix/the response vector |
model |
if requested, the model frame |
xlevels |
a record of the levels of the factors used in fitting |
bootstrap.errors |
experimental |
method |
same as input argument |
In addition, non-empty fits will have components qr
, R
and effects
relating to the final weighted linear fit.
Pleas note that the functions gvcm.cat
, pest
and the fitting procedure
for penalized estimation gvcmcatfit
are organized like the functions glm
/glm.fit
whenever possible.
This was done to avoid mistakes and to provide a well-known structure.
Margret-Ruth Oelker (margret.oelker@stat.uni-muenchen.de)
Eilers, P. H. C. and B. D. Marx (1996). Flexible smoothing with b-splines and penalties. Statist. Sci. 11 (2), 89-121.
Fan, J. and R. Li (2001). Variable selection via nonconcave penalized likelihood and its oracle properties.
Journal of the American Statistical Association 96(456), 1348-1360.
Gertheiss, J. and G. Tutz (2010). Sparse modeling of categorial explanatory variables.
The Annals of Statistics 4(4), 2150-2180.
Oelker, M.-R., J. Gertheiss and G. Tutz (2012). Regularization and model melection with categorial predictors and effect
modifiers in generalized linear models. Department of Statistics at the University of Munich: Technical Report 122.
Oelker, M.-R., J. Gertheiss and G. Tutz (2013). A general family of penalties for combining differing types of penalties in
generalized structured models. Department of Statistics at the University of Munich: Technical Report 139.
Yuan, M. and Y. Lin (2006). Model selection and estimation in regression with grouped variables. R. Stat. Soc. Ser. B Stat.
Methodol. 68 (1), 49-67.
Zou, H. and T. Hastie (2005). Regularization and variable selection via the Elastic Net. R. Stat. Soc. Ser. B Stat.
Methodol. 67 (2), 301-320.
Functions index
, cat_control
, plot.gvcm.cat
, predict.gvcm.cat
, simulation
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | ## example for function simulation()
covariates <- list(x1=list("unif", c(0,2)),
x2=list("unif", c(0,2)),
x3=list("unif", c(0,2)),
u=list("multinom",c(0.3,0.4,0.3), "nominal")
)
true.f <- y ~ 1 + v(x1,u) + x2
true.coefs <- c(0.2, 0.3,.7,.7, -.5)
data <- simulation(400, covariates, NULL, true.f, true.coefs , binomial(), seed=456)
## example for function gvcm.cat()
f <- y ~ v(1,u) + v(x1,u) + v(x2,u)
m1 <- gvcm.cat(f, data, binomial(), plot=TRUE, control=cat_control(lambda.upper=19))
summary(m1)
## example for function predict.gvcm.cat
newdata <- simulation(200, covariates, NULL, true.f, true.coefs , binomial(), seed=789)
prediction <- predict.gvcm.cat(m1, newdata)
## example for function plot.gvcm.cat
plot(m1)
plot(m1, type="score")
plot(m1, type="coefs")
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.