Selection and smoothing of dummy coefficients of ordinal predictors

Share:

Description

Fits dummy coefficients of ordinally scaled independent variables with a group lasso penalty on differences of adjacent dummy coefficients.

Usage

1
2
3
4
5
6
ordSelect(x, y, u = NULL, z = NULL, 
  offset = rep(0,length(y)), lambda, nu = 1, zeta = 1,  
  model = c("linear", "logit", "poisson"), penscale = sqrt, 
  scalex = TRUE, scalez = TRUE, scaleu = TRUE,
  nonpenx = NULL, nonpenz = NULL, nonpenu = NULL, 
  intercept = TRUE, eps = 1e-3, ...)

Arguments

x

the matrix of ordinal predictors, with each column corresponding to one predictor and containing numeric values from {1,2,...}; for each covariate, category 1 is taken as reference category with zero dummy coefficient.

y

the response vector.

u

a matrix (or data.frame) of additional categorical (nominal) predictors, with each column corresponding to one (additional) predictor and containing numeric values from {1,2,...}; corresponding dummy coefficients will be regularized using a groupwise simple Ridge penalty, and for each covariate category 1 is taken as reference category.

z

a matrix (or data.frame) of additional metric predictors, with each column corresponding to one (additional) predictor; corresponding coefficients will be regularized using a simple Lasso penalty.

offset

vector of offset values.

lambda

vector of penalty parameters (in decreasing order). Optimization starts with the first component. See details below.

nu

additional tuning parameter to control the strength of the penalty imposed on dummy coefficients corresponding to u. See details below.

zeta

additional tuning parameter to control the strength of the penalty imposed on coefficients corresponding to z. See details below.

model

the model which is to be fitted. Possible choices are "linear" (default), "logit" or "poisson". See details below.

penscale

rescaling function to adjust the value of the penalty parameter to the degrees of freedom of the parameter group. See the references below.

scalex, scaleu, scalez

logical. Should (split/dummy-coded) design matrices corresponding to x, u, resp. z be scaled to have unit variance over columns? See details below.

nonpenx, nonpenu, nonpenz

vectors of indices indicating columns of x, u, resp. z whose regression coefficients are not penalized.

intercept

logical. Should a (non-penalized) intercept be included in the model? Default is TRUE.

eps

a (small) constant to be added to the columnwise standard deviations when scaling design matrices, to control the effect of very small stds. See details below.

...

additional arguments.

Details

The method assumes that categorical covariates (contained in x and u) take values 1,2,...,max, where max denotes the (columnwise) highest level observed in the data. If any level between 1 and max is not observed, a corresponding (dummy) coefficient is fitted anyway. If any level > max is not observed but possible in principle, and a corresponding coefficient is to be fitted, the easiest way is to add a corresponding row to x (and u,z) with corresponding y value being NA.

In order to adjust the strength of penalty that is imposed on nominal or metric covariates, nu, resp. zeta can be chosen. The penalty parameter used for nominal variables (contained in u) is lambda*nu, for metric predictors (contained in z) it is lambda*zeta.

If a linear regression model is fitted, response vector y may contain any numeric values; if a logit model is fitted, y has to be 0/1 coded; if a poisson model is fitted, y has to contain count data.

If scalex, scaleu or scalez are TRUE, design matrices constructed from x, u, resp. z are scaled to have unit variance over columns. In the case of x, the design matrix is split-coded, in the case of u, it is dummy-coded, and in case of z, it is just z. If a certain x- or u- category, however, is observed only a few times, variances may become very small and scaling has enormous effects on the result and may cause numerical problems. Hence a small constant eps can be added to each standard deviation when used for scaling.

Value

An ordPen object, which is a list containing:

fitted

the matrix of fitted response values of the training data. Columns correspond to different lambda values.

coefficients

the matrix of fitted coefficients with respect to dummy-coded (ordinal or nominal) categorical input variables (including the reference category) as well as metric predictors. Columns correspond to different lambda values.

model

the type of the fitted model: "linear", "logit", or "poisson".

lambda

the used lambda values.

xlevels

a vector giving the number of levels of the ordinal predictors.

ulevels

a vector giving the number of levels of the nominal predictors.

zcovars

the number of metric covariates.

Author(s)

Jan Gertheiss

References

Gertheiss, J., S. Hogger, C. Oberhauser and G. Tutz (2011). Selection of ordinally scaled independent variables with applications to international classification of functioning core sets. Journal of the Royal Statistical Society C (Applied Statistics), 60, 377-395.

Meier, L., S. van de Geer and P. Buehlmann (2008). The group lasso for logistic regression. Journal of the Royal Statistical Society B, 70, 53-71.

Yuan, M. and Y. Lin (2006). Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society B, 68, 49-67.

See Also

plot.ordPen, predict.ordPen, ICFCoreSetCWP

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
# smoothing and selection of ordinal covariates using a random dataset
set.seed(123)

# generate (ordinal) predictors
x1 <- sample(1:8,100,replace=TRUE)
x2 <- sample(1:6,100,replace=TRUE)
x3 <- sample(1:7,100,replace=TRUE)

# the response
y <- -1 + log(x1) + sin(3*(x2-1)/pi) + rnorm(100)

# x matrix
x <- cbind(x1,x2,x3)

# lambda values
lambda <- c(1000,500,200,100,50,30,20,10,1)

# smoothing and selection
o2 <- ordSelect(x = x, y = y, lambda = lambda)

# results
round(o2$coef,digits=3)
plot(o2)

# If for a certain plot the x-axis should be annotated in a different way,
# this can (for example) be done as follows:
plot(o2, whichx = 1, xlim = c(0,9), xaxt = "n")
axis(side = 1, at = c(1,8), labels = c("no agreement","total agreement"))