predict.grpnet: Predict Method for grpnet Fits

View source: R/predict.grpnet.R

predict.grpnetR Documentation

Predict Method for grpnet Fits

Description

Obtain predictions from a fit group elastic net regularized GLM (grpnet) object.

Usage

## S3 method for class 'grpnet'
predict(object, 
        newx,
        newdata,
        s = NULL,
        type = c("link", "response", "class", "terms", 
                 "importance", "coefficients", "nonzero", "groups", 
                 "ncoefs", "ngroups", "norm", "znorm"),
        ...)

Arguments

object

Object of class "grpnet"

newx

Matrix of new x scores for prediction (default S3 method). Must have p columns arranged in the same order as the x matrix used to fit the model. Ignored for the last six types of predictions.

newdata

Data frame of new data scores for prediction (S3 "formula" method). Must contain all variables in the formula used to fit the model. Ignored for the last six types of predictions.

s

Lambda value(s) at which predictions should be obtained. Default uses s = object$lambda. Interpolation is used for s values that are not included in object$lambda.

type

Type of prediction to return. "link" gives predictions on the link scale (\eta). "response" gives predictions on the mean scale (\mu). "class" gives predicted class labels (for "binomial" and "multinomial" families). "terms" gives the predictions for each term (group) in the model (\eta_k). "importance" gives the variable importance index for each term (group) in the model. "coefficients" returns the coefficients used for predictions. "nonzero" returns a list giving the indices of non-zero coefficients for each s. "groups" returns a list giving the labels of non-zero groups for each s. "ncoefs" returns the number of non-zero coefficients for each s. "ngroups" returns the number of non-zero groups for each s. "norm" returns the L2 norm of each group's (raw) coefficients for each s. "znorm" returns the L2 norm of each group's standardized coefficients for each s.

...

Additional arguments (ignored)

Details

When type == "link", the predictions for each \lambda have the form

\boldsymbol\eta_\lambda = \mathbf{X}_{\mathrm{new}} \boldsymbol\beta_\lambda

where \mathbf{X}_{\mathrm{new}} is the argument newx (or the design matrix created from newdata by applying object$formula) and \boldsymbol\beta_\lambda is the coefficient vector corresponding to \lambda.

When type == "response", the predictions for each \lambda have the form

\boldsymbol\mu_\lambda = g^{-1}(\boldsymbol\eta_\lambda)

where g^{-1}(\cdot) is the inverse link function stored in object$family$linkinv.

When type == "class", the predictions for each \lambda have the form

\mathbf{y}_\lambda = \arg\max_l \boldsymbol\mu_\lambda(l)

where \boldsymbol\mu_\lambda(l) gives the predicted probability that each observation belongs to the l-th category (for l = 1,\ldots,m) using the regularization parameter \lambda.

When type == "terms", the groupwise predictions for each \lambda have the form

\boldsymbol\eta_{k\lambda} = \mathbf{X}_k^{\mathrm{(new)}} \boldsymbol\beta_{k\lambda}

where \mathbf{X}_k^{\mathrm{(new)}} is the portion of the argument newx (or the design matrix created from newdata by applying object$formula) that corresponds to the k-th term/group, and \boldsymbol\beta_{k\lambda} are the corresponding coefficients.

When type == "importance", the variable importance indices are defined as

\pi_k = \left( \boldsymbol\eta_{k\lambda}^\top \mathbf{C} \boldsymbol\eta_{0\lambda} \right) \left( \boldsymbol\eta_{0\lambda}^\top \mathbf{C} \boldsymbol\eta_{0\lambda} \right)^{-1}

where \mathbf{C} = (\mathbf{I}_n - \frac{1}{n} \mathbf{1}_n \mathbf{1}_n^\top) denotes a centering matrix, and \boldsymbol\eta_{0\lambda} = \sum_{k=1}^K \boldsymbol\eta_{k\lambda}. Note that \sum_{k=1}^K \pi_k = 1, but some \pi_k could be negative. When they are positive, \pi_k gives the approximate proportion of model (explained) variation that is attributed to the k-th term.

Value

Depends on three factors...
1. the exponential family distribution
2. the length of the input s
3. the type of prediction requested

For most response variables, the typical output will be...

*

a matrix of dimension c(newnobs, length(s)) if length(s) > 1

*

a vector of length newnobs if length(s) == 1

For multinomial response variables, the typical output will be...

*

an array of dimension c(newnobs, length(object$ylev), length(s)) if type %in% c("link", "response")

*

a matrix of dimension c(newobs, length(s)) if type == "class"

Note: if type == "class", then the output will be the same class as object$ylev. Otherwise, the output will be real-valued (or integer for the counts).

If type == "terms" and family != "multinomial", the output will be...

*

an array of dimension c(newnobs, nterms, length(s)) if length(s) > 1

*

a matrix of dimension c(newnobs, nterms) if length(s) == 1

If type == "terms" and family == "multinomial", the output will be a list of length length(object$ylev) where each element gives the terms for the corresponding response class.

If type == "importance" and family != "multinomial", the output will be...

*

a matrix of dimension c(nterms, length(s)) if length(s) > 1

*

a vector of length nterms if length(s) == 1

If type == "importance" and family == "multinomial", the output will be a list of length length(object$ylev) where each element gives the importance for the corresponding response class. If length(s) == 1, the output will be simplified to matrix.

If type == "coefficients", the output will be the same as that produced by coef.grpnet.

If type == "nonzero", the output will be a list of length length(s) where each element is a vector of integers (indices).

If type == "groups", the output will be a list of length length(s) where each element is a vector of characters (term.labels).

If type %in% c("ncoefs", "ngroups"), the output will be a vector of length length(s) where each element is an integer.

If type == "norm", the output will be a matrix of dimension c(K, length(s)), where each cell gives the L2 norm for the corresponding group and smoothing parameter. Note that K denotes the number of groups.

Note

Some internal code (e.g., used for the interpolation) is borrowed from the predict.glmnet function in the glmnet package (Friedman, Hastie, & Tibshirani, 2010).

Author(s)

Nathaniel E. Helwig <helwig@umn.edu>

References

Friedman, J., Hastie, T., & Tibshirani, R. (2010). Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software, 33(1), 1-22. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.18637/jss.v033.i01")}

Helwig, N. E. (2024). Versatile descent algorithms for group regularization and variable selection in generalized linear models. Journal of Computational and Graphical Statistics. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1080/10618600.2024.2362232")}

See Also

grpnet for fitting grpnet regularization paths

predict.cv.grpnet for predicting from cv.grpnet objects

Examples

######***######   family = "gaussian"   ######***######

# load data
data(auto)

# fit model (formula method, response = mpg)
mod <- grpnet(mpg ~ ., data = auto)

# get fitted values for regularization path (output = 392 x 100 matrix)
fit.path <- predict(mod, newdata = auto)

# get fitted values at 3 particular points (output = 392 x 3 matrix)
fit.some <- predict(mod, newdata = auto, s = c(1.5, 1, 0.5))

# compare rmse for solutions
rmse.path <- sqrt(colMeans((auto$mpg - fit.path)^2))
rmse.some <- sqrt(colMeans((auto$mpg - fit.some)^2))
plot(log(mod$lambda), rmse.path, cex = 0.5)
points(log(c(1.5, 1, 0.5)), rmse.some, pch = 0, col = "red")



######***######   family = "binomial"   ######***######

# load data
data(auto)

# redefine origin (Domestic vs Foreign)
auto$origin <- ifelse(auto$origin == "American", "Domestic", "Foreign")

# fit model (formula method, response = origin with 2 levels)
mod <- grpnet(origin ~ ., data = auto, family = "binomial")

# get predicted classes for regularization path (output = 392 x 100 matrix)
fit.path <- predict(mod, newdata = auto, type = "class")

# get predicted classes at 3 particular points (output = 392 x 3 matrix)
fit.some <- predict(mod, newdata = auto, type = "class", s = c(.15, .1, .05))

# compare misclassification rate for solutions
miss.path <- 1 - colMeans(auto$origin == fit.path)
miss.some <- 1 - colMeans(auto$origin == fit.some)
plot(log(mod$lambda), miss.path, cex = 0.5)
points(log(c(.15, .1, .05)), miss.some, pch = 0, col = "red")



######***######   family = "multinomial"   ######***######

# load data
data(auto)

# fit model (formula method, response = origin with 3 levels)
mod <- grpnet(origin ~ ., data = auto, family = "multinomial")

# get predicted classes for regularization path (output = 392 x 100 matrix)
fit.path <- predict(mod, newdata = auto, type = "class")

# get predicted classes at 3 particular points (output = 392 x 3 matrix)
fit.some <- predict(mod, newdata = auto, type = "class", s = c(.1, .01, .001))

# compare misclassification rate for solutions
miss.path <- 1 - colMeans(auto$origin == fit.path)
miss.some <- 1 - colMeans(auto$origin == fit.some)
plot(log(mod$lambda), miss.path, cex = 0.5)
points(log(c(.1, .01, .001)), miss.some, pch = 0, col = "red")



######***######   family = "poisson"   ######***######

# load data
data(auto)

# fit model (formula method, response = horsepower)
mod <- grpnet(horsepower ~ ., data = auto, family = "poisson")

# get fitted values for regularization path (output = 392 x 100 matrix)
fit.path <- predict(mod, newdata = auto, type = "response")

# get fitted values at 3 particular points (output = 392 x 3 matrix)
fit.some <- predict(mod, newdata = auto, type = "response", s = c(15, 10, 5))

# compare rmse for solutions
rmse.path <- sqrt(colMeans((auto$horsepower - fit.path)^2))
rmse.some <- sqrt(colMeans((auto$horsepower - fit.some)^2))
plot(log(mod$lambda), rmse.path, cex = 0.5)
points(log(c(15, 10, 5)), rmse.some, pch = 0, col = "red")



######***######   family = "negative.binomial"   ######***######

# load data
data(auto)

# fit model (formula method, response = horsepower)
mod <- grpnet(horsepower ~ ., data = auto, family = "negative.binomial")

# get fitted values for regularization path (output = 392 x 100 matrix)
fit.path <- predict(mod, newdata = auto, type = "response")

# get fitted values at 3 particular points (output = 392 x 3 matrix)
fit.some <- predict(mod, newdata = auto, type = "response", s = c(0.1, 0.01, 0.001))

# compare rmse for solutions
rmse.path <- sqrt(colMeans((auto$horsepower - fit.path)^2))
rmse.some <- sqrt(colMeans((auto$horsepower - fit.some)^2))
plot(log(mod$lambda), rmse.path, cex = 0.5)
points(log(c(0.1, 0.01, 0.001)), rmse.some, pch = 0, col = "red")



######***######   family = "Gamma"   ######***######

# load data
data(auto)

# fit model (formula method, response = mpg)
mod <- grpnet(mpg ~ ., data = auto, family = "Gamma")

# get fitted values for regularization path (output = 392 x 100 matrix)
fit.path <- predict(mod, newdata = auto, type = "response")

# get fitted values at 3 particular points (output = 392 x 3 matrix)
fit.some <- predict(mod, newdata = auto, type = "response", s = c(0.1, 0.01, 0.001))

# compare rmse for solutions
rmse.path <- sqrt(colMeans((auto$mpg - fit.path)^2))
rmse.some <- sqrt(colMeans((auto$mpg - fit.some)^2))
plot(log(mod$lambda), rmse.path, cex = 0.5)
points(log(c(0.1, 0.01, 0.001)), rmse.some, pch = 0, col = "red")



######***######   family = "inverse.gaussian"   ######***######

# load data
data(auto)

# fit model (formula method, response = mpg)
mod <- grpnet(mpg ~ ., data = auto, family = "inverse.gaussian")

# get fitted values for regularization path (output = 392 x 100 matrix)
fit.path <- predict(mod, newdata = auto, type = "response")

# get fitted values at 3 particular points (output = 392 x 3 matrix)
fit.some <- predict(mod, newdata = auto, type = "response", s = c(0.005, 0.001, 0.0001))

# compare rmse for solutions
rmse.path <- sqrt(colMeans((auto$mpg - fit.path)^2))
rmse.some <- sqrt(colMeans((auto$mpg - fit.some)^2))
plot(log(mod$lambda), rmse.path, cex = 0.5)
points(log(c(0.005, 0.001, 0.0001)), rmse.some, pch = 0, col = "red")


grpnet documentation built on Oct. 12, 2024, 1:07 a.m.