View source: R/predict.grpnet.R
predict.grpnet | R Documentation |
Obtain predictions from a fit group elastic net regularized GLM (grpnet) object.
## S3 method for class 'grpnet'
predict(object,
newx,
newdata,
s = NULL,
type = c("link", "response", "class", "terms",
"importance", "coefficients", "nonzero", "groups",
"ncoefs", "ngroups", "norm", "znorm"),
...)
object |
Object of class "grpnet" |
newx |
Matrix of new |
newdata |
Data frame of new |
s |
Lambda value(s) at which predictions should be obtained. Default uses |
type |
Type of prediction to return. "link" gives predictions on the link scale ( |
... |
Additional arguments (ignored) |
When type == "link"
, the predictions for each \lambda
have the form
\boldsymbol\eta_\lambda = \mathbf{X}_{\mathrm{new}} \boldsymbol\beta_\lambda
where \mathbf{X}_{\mathrm{new}}
is the argument newx
(or the design matrix created from newdata
by applying object$formula
) and \boldsymbol\beta_\lambda
is the coefficient vector corresponding to \lambda
.
When type == "response"
, the predictions for each \lambda
have the form
\boldsymbol\mu_\lambda = g^{-1}(\boldsymbol\eta_\lambda)
where g^{-1}(\cdot)
is the inverse link function stored in object$family$linkinv
.
When type == "class"
, the predictions for each \lambda
have the form
\mathbf{y}_\lambda = \arg\max_l \boldsymbol\mu_\lambda(l)
where \boldsymbol\mu_\lambda(l)
gives the predicted probability that each observation belongs to the l
-th category (for l = 1,\ldots,m
) using the regularization parameter \lambda
.
When type == "terms"
, the groupwise predictions for each \lambda
have the form
\boldsymbol\eta_{k\lambda} = \mathbf{X}_k^{\mathrm{(new)}} \boldsymbol\beta_{k\lambda}
where \mathbf{X}_k^{\mathrm{(new)}}
is the portion of the argument newx
(or the design matrix created from newdata
by applying object$formula
) that corresponds to the k
-th term/group, and \boldsymbol\beta_{k\lambda}
are the corresponding coefficients.
When type == "importance"
, the variable importance indices are defined as
\pi_k = \left( \boldsymbol\eta_{k\lambda}^\top \mathbf{C} \boldsymbol\eta_{0\lambda} \right) \left( \boldsymbol\eta_{0\lambda}^\top \mathbf{C} \boldsymbol\eta_{0\lambda} \right)^{-1}
where \mathbf{C} = (\mathbf{I}_n - \frac{1}{n} \mathbf{1}_n \mathbf{1}_n^\top)
denotes a centering matrix, and \boldsymbol\eta_{0\lambda} = \sum_{k=1}^K \boldsymbol\eta_{k\lambda}
. Note that \sum_{k=1}^K \pi_k = 1
, but some \pi_k
could be negative. When they are positive, \pi_k
gives the approximate proportion of model (explained) variation that is attributed to the k
-th term.
Depends on three factors...
1. the exponential family distribution
2. the length of the input s
3. the type
of prediction requested
For most response variables, the typical output will be...
* |
a matrix of dimension |
* |
a vector of length |
For multinomial response variables, the typical output will be...
* |
an array of dimension |
* |
a matrix of dimension |
Note: if type == "class"
, then the output will be the same class as object$ylev
. Otherwise, the output will be real-valued (or integer for the counts).
If type == "terms"
and family != "multinomial"
, the output will be...
* |
an array of dimension |
* |
a matrix of dimension |
If type == "terms"
and family == "multinomial"
, the output will be a list of length length(object$ylev)
where each element gives the terms for the corresponding response class.
If type == "importance"
and family != "multinomial"
, the output will be...
* |
a matrix of dimension |
* |
a vector of length |
If type == "importance"
and family == "multinomial"
, the output will be a list of length length(object$ylev)
where each element gives the importance for the corresponding response class. If length(s) == 1
, the output will be simplified to matrix.
If type == "coefficients"
, the output will be the same as that produced by coef.grpnet
.
If type == "nonzero"
, the output will be a list of length length(s)
where each element is a vector of integers (indices).
If type == "groups"
, the output will be a list of length length(s)
where each element is a vector of characters (term.labels
).
If type %in% c("ncoefs", "ngroups")
, the output will be a vector of length length(s)
where each element is an integer.
If type == "norm"
, the output will be a matrix of dimension c(K, length(s))
, where each cell gives the L2 norm for the corresponding group and smoothing parameter. Note that K
denotes the number of groups.
Some internal code (e.g., used for the interpolation) is borrowed from the predict.glmnet
function in the glmnet package (Friedman, Hastie, & Tibshirani, 2010).
Nathaniel E. Helwig <helwig@umn.edu>
Friedman, J., Hastie, T., & Tibshirani, R. (2010). Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software, 33(1), 1-22. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.18637/jss.v033.i01")}
Helwig, N. E. (2024). Versatile descent algorithms for group regularization and variable selection in generalized linear models. Journal of Computational and Graphical Statistics. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1080/10618600.2024.2362232")}
grpnet
for fitting grpnet regularization paths
predict.cv.grpnet
for predicting from cv.grpnet
objects
######***###### family = "gaussian" ######***######
# load data
data(auto)
# fit model (formula method, response = mpg)
mod <- grpnet(mpg ~ ., data = auto)
# get fitted values for regularization path (output = 392 x 100 matrix)
fit.path <- predict(mod, newdata = auto)
# get fitted values at 3 particular points (output = 392 x 3 matrix)
fit.some <- predict(mod, newdata = auto, s = c(1.5, 1, 0.5))
# compare rmse for solutions
rmse.path <- sqrt(colMeans((auto$mpg - fit.path)^2))
rmse.some <- sqrt(colMeans((auto$mpg - fit.some)^2))
plot(log(mod$lambda), rmse.path, cex = 0.5)
points(log(c(1.5, 1, 0.5)), rmse.some, pch = 0, col = "red")
######***###### family = "binomial" ######***######
# load data
data(auto)
# redefine origin (Domestic vs Foreign)
auto$origin <- ifelse(auto$origin == "American", "Domestic", "Foreign")
# fit model (formula method, response = origin with 2 levels)
mod <- grpnet(origin ~ ., data = auto, family = "binomial")
# get predicted classes for regularization path (output = 392 x 100 matrix)
fit.path <- predict(mod, newdata = auto, type = "class")
# get predicted classes at 3 particular points (output = 392 x 3 matrix)
fit.some <- predict(mod, newdata = auto, type = "class", s = c(.15, .1, .05))
# compare misclassification rate for solutions
miss.path <- 1 - colMeans(auto$origin == fit.path)
miss.some <- 1 - colMeans(auto$origin == fit.some)
plot(log(mod$lambda), miss.path, cex = 0.5)
points(log(c(.15, .1, .05)), miss.some, pch = 0, col = "red")
######***###### family = "multinomial" ######***######
# load data
data(auto)
# fit model (formula method, response = origin with 3 levels)
mod <- grpnet(origin ~ ., data = auto, family = "multinomial")
# get predicted classes for regularization path (output = 392 x 100 matrix)
fit.path <- predict(mod, newdata = auto, type = "class")
# get predicted classes at 3 particular points (output = 392 x 3 matrix)
fit.some <- predict(mod, newdata = auto, type = "class", s = c(.1, .01, .001))
# compare misclassification rate for solutions
miss.path <- 1 - colMeans(auto$origin == fit.path)
miss.some <- 1 - colMeans(auto$origin == fit.some)
plot(log(mod$lambda), miss.path, cex = 0.5)
points(log(c(.1, .01, .001)), miss.some, pch = 0, col = "red")
######***###### family = "poisson" ######***######
# load data
data(auto)
# fit model (formula method, response = horsepower)
mod <- grpnet(horsepower ~ ., data = auto, family = "poisson")
# get fitted values for regularization path (output = 392 x 100 matrix)
fit.path <- predict(mod, newdata = auto, type = "response")
# get fitted values at 3 particular points (output = 392 x 3 matrix)
fit.some <- predict(mod, newdata = auto, type = "response", s = c(15, 10, 5))
# compare rmse for solutions
rmse.path <- sqrt(colMeans((auto$horsepower - fit.path)^2))
rmse.some <- sqrt(colMeans((auto$horsepower - fit.some)^2))
plot(log(mod$lambda), rmse.path, cex = 0.5)
points(log(c(15, 10, 5)), rmse.some, pch = 0, col = "red")
######***###### family = "negative.binomial" ######***######
# load data
data(auto)
# fit model (formula method, response = horsepower)
mod <- grpnet(horsepower ~ ., data = auto, family = "negative.binomial")
# get fitted values for regularization path (output = 392 x 100 matrix)
fit.path <- predict(mod, newdata = auto, type = "response")
# get fitted values at 3 particular points (output = 392 x 3 matrix)
fit.some <- predict(mod, newdata = auto, type = "response", s = c(0.1, 0.01, 0.001))
# compare rmse for solutions
rmse.path <- sqrt(colMeans((auto$horsepower - fit.path)^2))
rmse.some <- sqrt(colMeans((auto$horsepower - fit.some)^2))
plot(log(mod$lambda), rmse.path, cex = 0.5)
points(log(c(0.1, 0.01, 0.001)), rmse.some, pch = 0, col = "red")
######***###### family = "Gamma" ######***######
# load data
data(auto)
# fit model (formula method, response = mpg)
mod <- grpnet(mpg ~ ., data = auto, family = "Gamma")
# get fitted values for regularization path (output = 392 x 100 matrix)
fit.path <- predict(mod, newdata = auto, type = "response")
# get fitted values at 3 particular points (output = 392 x 3 matrix)
fit.some <- predict(mod, newdata = auto, type = "response", s = c(0.1, 0.01, 0.001))
# compare rmse for solutions
rmse.path <- sqrt(colMeans((auto$mpg - fit.path)^2))
rmse.some <- sqrt(colMeans((auto$mpg - fit.some)^2))
plot(log(mod$lambda), rmse.path, cex = 0.5)
points(log(c(0.1, 0.01, 0.001)), rmse.some, pch = 0, col = "red")
######***###### family = "inverse.gaussian" ######***######
# load data
data(auto)
# fit model (formula method, response = mpg)
mod <- grpnet(mpg ~ ., data = auto, family = "inverse.gaussian")
# get fitted values for regularization path (output = 392 x 100 matrix)
fit.path <- predict(mod, newdata = auto, type = "response")
# get fitted values at 3 particular points (output = 392 x 3 matrix)
fit.some <- predict(mod, newdata = auto, type = "response", s = c(0.005, 0.001, 0.0001))
# compare rmse for solutions
rmse.path <- sqrt(colMeans((auto$mpg - fit.path)^2))
rmse.some <- sqrt(colMeans((auto$mpg - fit.some)^2))
plot(log(mod$lambda), rmse.path, cex = 0.5)
points(log(c(0.005, 0.001, 0.0001)), rmse.some, pch = 0, col = "red")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.