View source: R/predict.grpnet.R
| predict.grpnet | R Documentation |
Obtain predictions from a fit group elastic net regularized GLM (grpnet) object.
## S3 method for class 'grpnet'
predict(object,
newx,
newdata,
s = NULL,
type = c("link", "response", "class", "terms",
"importance", "coefficients", "nonzero", "groups",
"ncoefs", "ngroups", "norm", "znorm"),
...)
object |
Object of class "grpnet" |
newx |
Matrix of new |
newdata |
Data frame of new |
s |
Lambda value(s) at which predictions should be obtained. Default uses |
type |
Type of prediction to return. "link" gives predictions on the link scale ( |
... |
Additional arguments (ignored) |
When type == "link", the predictions for each \lambda have the form
\boldsymbol\eta_\lambda = \mathbf{X}_{\mathrm{new}} \boldsymbol\beta_\lambda
where \mathbf{X}_{\mathrm{new}} is the argument newx (or the design matrix created from newdata by applying object$formula) and \boldsymbol\beta_\lambda is the coefficient vector corresponding to \lambda.
When type == "response", the predictions for each \lambda have the form
\boldsymbol\mu_\lambda = g^{-1}(\boldsymbol\eta_\lambda)
where g^{-1}(\cdot) is the inverse link function stored in object$family$linkinv.
When type == "class", the predictions for each \lambda have the form
\mathbf{y}_\lambda = \arg\max_l \boldsymbol\mu_\lambda(l)
where \boldsymbol\mu_\lambda(l) gives the predicted probability that each observation belongs to the l-th category (for l = 1,\ldots,m) using the regularization parameter \lambda.
When type == "terms", the groupwise predictions for each \lambda have the form
\boldsymbol\eta_{k\lambda} = \mathbf{X}_k^{\mathrm{(new)}} \boldsymbol\beta_{k\lambda}
where \mathbf{X}_k^{\mathrm{(new)}} is the portion of the argument newx (or the design matrix created from newdata by applying object$formula) that corresponds to the k-th term/group, and \boldsymbol\beta_{k\lambda} are the corresponding coefficients.
When type == "importance", the variable importance indices are defined as
\pi_k = \left( \boldsymbol\eta_{k\lambda}^\top \mathbf{C} \boldsymbol\eta_{0\lambda} \right) \left( \boldsymbol\eta_{0\lambda}^\top \mathbf{C} \boldsymbol\eta_{0\lambda} \right)^{-1}
where \mathbf{C} = (\mathbf{I}_n - \frac{1}{n} \mathbf{1}_n \mathbf{1}_n^\top) denotes a centering matrix, and \boldsymbol\eta_{0\lambda} = \sum_{k=1}^K \boldsymbol\eta_{k\lambda}. Note that \sum_{k=1}^K \pi_k = 1, but some \pi_k could be negative. When they are positive, \pi_k gives the approximate proportion of model (explained) variation that is attributed to the k-th term.
Depends on three factors...
1. the exponential family distribution
2. the length of the input s
3. the type of prediction requested
For most response variables, the typical output will be...
* |
a matrix of dimension |
* |
a vector of length |
For multigaussian and multinomial response variables, the typical output will be...
* |
an array of dimension |
* |
a matrix of dimension |
Note: if type == "class", then the output will be the same class as object$ylev. Otherwise, the output will be real-valued (or integer for the counts).
If type == "terms" and !(family %in% c("multigaussian","multinomial")), the output will be...
* |
an array of dimension |
* |
a matrix of dimension |
If type == "terms" and family %in% c("multigaussian","multinomial"), the output will be a list of length length(object$ylev) where each element gives the terms for the corresponding response dimension/class.
If type == "importance" and family != "multinomial", the output will be...
* |
a matrix of dimension |
* |
a vector of length |
If type == "importance" and family == "multinomial", the output will be a list of length length(object$ylev) where each element gives the importance for the corresponding response class. If length(s) == 1, the output will be simplified to matrix.
If type == "coefficients", the output will be the same as that produced by coef.grpnet.
If type == "nonzero", the output will be a list of length length(s) where each element is a vector of integers (indices).
If type == "groups", the output will be a list of length length(s) where each element is a vector of characters (term.labels).
If type %in% c("ncoefs", "ngroups"), the output will be a vector of length length(s) where each element is an integer.
If type == "norm", the output will be a matrix of dimension c(K, length(s)), where each cell gives the L2 norm for the corresponding group and smoothing parameter. Note that K denotes the number of groups.
Some internal code (e.g., used for the interpolation) is borrowed from the predict.glmnet function in the glmnet package (Friedman, Hastie, & Tibshirani, 2010).
Nathaniel E. Helwig <helwig@umn.edu>
Friedman, J., Hastie, T., & Tibshirani, R. (2010). Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software, 33(1), 1-22. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.18637/jss.v033.i01")}
Helwig, N. E. (2025). Versatile descent algorithms for group regularization and variable selection in generalized linear models. Journal of Computational and Graphical Statistics, 34(1), 239-252. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1080/10618600.2024.2362232")}
grpnet for fitting grpnet regularization paths
predict.cv.grpnet for predicting from cv.grpnet objects
######***###### family = "gaussian" ######***######
# load data
data(auto)
# fit model (formula method, response = mpg)
mod <- grpnet(mpg ~ ., data = auto)
# get fitted values for regularization path (output = 392 x 100 matrix)
fit.path <- predict(mod, newdata = auto)
# get fitted values at 3 particular points (output = 392 x 3 matrix)
fit.some <- predict(mod, newdata = auto, s = c(1.5, 1, 0.5))
# compare rmse for solutions
rmse.path <- sqrt(colMeans((auto$mpg - fit.path)^2))
rmse.some <- sqrt(colMeans((auto$mpg - fit.some)^2))
plot(log(mod$lambda), rmse.path, cex = 0.5)
points(log(c(1.5, 1, 0.5)), rmse.some, pch = 0, col = "red")
######***###### family = "binomial" ######***######
# load data
data(auto)
# redefine origin (Domestic vs Foreign)
auto$origin <- ifelse(auto$origin == "American", "Domestic", "Foreign")
# fit model (formula method, response = origin with 2 levels)
mod <- grpnet(origin ~ ., data = auto, family = "binomial")
# get predicted classes for regularization path (output = 392 x 100 matrix)
fit.path <- predict(mod, newdata = auto, type = "class")
# get predicted classes at 3 particular points (output = 392 x 3 matrix)
fit.some <- predict(mod, newdata = auto, type = "class", s = c(.15, .1, .05))
# compare misclassification rate for solutions
miss.path <- 1 - colMeans(auto$origin == fit.path)
miss.some <- 1 - colMeans(auto$origin == fit.some)
plot(log(mod$lambda), miss.path, cex = 0.5)
points(log(c(.15, .1, .05)), miss.some, pch = 0, col = "red")
######***###### family = "multinomial" ######***######
# load data
data(auto)
# fit model (formula method, response = origin with 3 levels)
mod <- grpnet(origin ~ ., data = auto, family = "multinomial")
# get predicted classes for regularization path (output = 392 x 100 matrix)
fit.path <- predict(mod, newdata = auto, type = "class")
# get predicted classes at 3 particular points (output = 392 x 3 matrix)
fit.some <- predict(mod, newdata = auto, type = "class", s = c(.1, .01, .001))
# compare misclassification rate for solutions
miss.path <- 1 - colMeans(auto$origin == fit.path)
miss.some <- 1 - colMeans(auto$origin == fit.some)
plot(log(mod$lambda), miss.path, cex = 0.5)
points(log(c(.1, .01, .001)), miss.some, pch = 0, col = "red")
######***###### family = "poisson" ######***######
# load data
data(auto)
# fit model (formula method, response = horsepower)
mod <- grpnet(horsepower ~ ., data = auto, family = "poisson")
# get fitted values for regularization path (output = 392 x 100 matrix)
fit.path <- predict(mod, newdata = auto, type = "response")
# get fitted values at 3 particular points (output = 392 x 3 matrix)
fit.some <- predict(mod, newdata = auto, type = "response", s = c(15, 10, 5))
# compare rmse for solutions
rmse.path <- sqrt(colMeans((auto$horsepower - fit.path)^2))
rmse.some <- sqrt(colMeans((auto$horsepower - fit.some)^2))
plot(log(mod$lambda), rmse.path, cex = 0.5)
points(log(c(15, 10, 5)), rmse.some, pch = 0, col = "red")
######***###### family = "negative.binomial" ######***######
# load data
data(auto)
# fit model (formula method, response = horsepower)
mod <- grpnet(horsepower ~ ., data = auto, family = "negative.binomial")
# get fitted values for regularization path (output = 392 x 100 matrix)
fit.path <- predict(mod, newdata = auto, type = "response")
# get fitted values at 3 particular points (output = 392 x 3 matrix)
fit.some <- predict(mod, newdata = auto, type = "response", s = c(0.1, 0.01, 0.001))
# compare rmse for solutions
rmse.path <- sqrt(colMeans((auto$horsepower - fit.path)^2))
rmse.some <- sqrt(colMeans((auto$horsepower - fit.some)^2))
plot(log(mod$lambda), rmse.path, cex = 0.5)
points(log(c(0.1, 0.01, 0.001)), rmse.some, pch = 0, col = "red")
######***###### family = "Gamma" ######***######
# load data
data(auto)
# fit model (formula method, response = mpg)
mod <- grpnet(mpg ~ ., data = auto, family = "Gamma")
# get fitted values for regularization path (output = 392 x 100 matrix)
fit.path <- predict(mod, newdata = auto, type = "response")
# get fitted values at 3 particular points (output = 392 x 3 matrix)
fit.some <- predict(mod, newdata = auto, type = "response", s = c(0.1, 0.01, 0.001))
# compare rmse for solutions
rmse.path <- sqrt(colMeans((auto$mpg - fit.path)^2))
rmse.some <- sqrt(colMeans((auto$mpg - fit.some)^2))
plot(log(mod$lambda), rmse.path, cex = 0.5)
points(log(c(0.1, 0.01, 0.001)), rmse.some, pch = 0, col = "red")
######***###### family = "inverse.gaussian" ######***######
# load data
data(auto)
# fit model (formula method, response = mpg)
mod <- grpnet(mpg ~ ., data = auto, family = "inverse.gaussian")
# get fitted values for regularization path (output = 392 x 100 matrix)
fit.path <- predict(mod, newdata = auto, type = "response")
# get fitted values at 3 particular points (output = 392 x 3 matrix)
fit.some <- predict(mod, newdata = auto, type = "response", s = c(0.005, 0.001, 0.0001))
# compare rmse for solutions
rmse.path <- sqrt(colMeans((auto$mpg - fit.path)^2))
rmse.some <- sqrt(colMeans((auto$mpg - fit.some)^2))
plot(log(mod$lambda), rmse.path, cex = 0.5)
points(log(c(0.005, 0.001, 0.0001)), rmse.some, pch = 0, col = "red")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.