View source: R/predict.grpnet.R
| predict.grpnet | R Documentation |
Obtain predictions from a fit group elastic net regularized GLM (grpnet) object.
## S3 method for class 'grpnet'
predict(object,
newx,
newdata,
s = NULL,
type = c("link", "response", "class", "terms",
"importance", "coefficients", "nonzero", "groups",
"ncoefs", "ngroups", "norm", "znorm"),
...)
object |
Object of class "grpnet" |
newx |
Matrix of new |
newdata |
Data frame of new |
s |
Lambda value(s) at which predictions should be obtained. Default uses |
type |
Type of prediction to return. "link" gives predictions on the link scale ( |
... |
Additional arguments (ignored) |
When type == "link", the predictions for each \lambda have the form
\boldsymbol\eta_\lambda = \mathbf{X}_{\mathrm{new}} \boldsymbol\beta_\lambda
where \mathbf{X}_{\mathrm{new}} is the argument newx (or the design matrix created from newdata by applying object$formula) and \boldsymbol\beta_\lambda is the coefficient vector corresponding to \lambda.
When type == "response", the predictions for each \lambda have the form
\boldsymbol\mu_\lambda = g^{-1}(\boldsymbol\eta_\lambda)
where g^{-1}(\cdot) is the inverse link function stored in object$family$linkinv.
When type == "class", the predictions for each \lambda have the form
\mathbf{y}_\lambda = \arg\max_l \boldsymbol\mu_\lambda(l)
where \boldsymbol\mu_\lambda(l) gives the predicted probability that each observation belongs to the l-th category (for l = 1,\ldots,m) using the regularization parameter \lambda.
When type == "terms", the groupwise predictions for each \lambda have the form
\boldsymbol\eta_{k\lambda} = \mathbf{X}_k^{\mathrm{(new)}} \boldsymbol\beta_{k\lambda}
where \mathbf{X}_k^{\mathrm{(new)}} is the portion of the argument newx (or the design matrix created from newdata by applying object$formula) that corresponds to the k-th term/group, and \boldsymbol\beta_{k\lambda} are the corresponding coefficients.
When type == "importance", the variable importance indices are defined as
\pi_k = \left( \boldsymbol\eta_{k\lambda}^\top \mathbf{C} \boldsymbol\eta_{0\lambda} \right) \left( \boldsymbol\eta_{0\lambda}^\top \mathbf{C} \boldsymbol\eta_{0\lambda} \right)^{-1}
where \mathbf{C} = (\mathbf{I}_n - \frac{1}{n} \mathbf{1}_n \mathbf{1}_n^\top) denotes a centering matrix, and \boldsymbol\eta_{0\lambda} = \sum_{k=1}^K \boldsymbol\eta_{k\lambda}. Note that \sum_{k=1}^K \pi_k = 1, but some \pi_k could be negative. When they are positive, \pi_k gives the approximate proportion of model (explained) variation that is attributed to the k-th term.
Depends on three factors...
1. the exponential family distribution
2. the length of the input s
3. the type of prediction requested
For most response variables, the typical output will be...
* |
a matrix of dimension |
* |
a vector of length |
For multigaussian and multinomial response variables, the typical output will be...
* |
an array of dimension |
* |
a matrix of dimension |
Note: if type == "class", then the output will be the same class as object$ylev. Otherwise, the output will be real-valued (or integer for the counts).
If type == "terms" and !(family %in% c("multigaussian","multinomial")), the output will be...
* |
an array of dimension |
* |
a matrix of dimension |
If type == "terms" and family %in% c("multigaussian","multinomial"), the output will be a list of length length(object$ylev) where each element gives the terms for the corresponding response dimension/class.
If type == "importance" and family != "multinomial", the output will be...
* |
a matrix of dimension |
* |
a vector of length |
If type == "importance" and family == "multinomial", the output will be a list of length length(object$ylev) where each element gives the importance for the corresponding response class. If length(s) == 1, the output will be simplified to matrix.
If type == "coefficients", the output will be the same as that produced by coef.grpnet.
If type == "nonzero", the output will be a list of length length(s) where each element is a vector of integers (indices).
If type == "groups", the output will be a list of length length(s) where each element is a vector of characters (term.labels).
If type %in% c("ncoefs", "ngroups"), the output will be a vector of length length(s) where each element is an integer.
If type == "norm", the output will be a matrix of dimension c(K, length(s)), where each cell gives the L2 norm for the corresponding group and smoothing parameter. Note that K denotes the number of groups.
Some internal code (e.g., used for the interpolation) is borrowed from the predict.glmnet function in the glmnet package (Friedman, Hastie, & Tibshirani, 2010).
Nathaniel E. Helwig <helwig@umn.edu>
Friedman, J., Hastie, T., & Tibshirani, R. (2010). Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software, 33(1), 1-22. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.18637/jss.v033.i01")}
Helwig, N. E. (2025). Versatile descent algorithms for group regularization and variable selection in generalized linear models. Journal of Computational and Graphical Statistics, 34(1), 239-252. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1080/10618600.2024.2362232")}
grpnet for fitting grpnet regularization paths
predict.cv.grpnet for predicting from cv.grpnet objects
######***###### family = "gaussian" ######***######
# load data
data(auto)
# fit model (formula method, response = mpg)
mod <- grpnet(mpg ~ ., data = auto)
# get fitted values for regularization path (output = 392 x 100 matrix)
fit.path <- predict(mod, newdata = auto)
# get fitted values at 3 particular points (output = 392 x 3 matrix)
lam <- quantile(mod$lambda, probs = c(0.25, 0.5, 0.75))
fit.some <- predict(mod, newdata = auto, s = lam)
# compare rmse for solutions
rmse.path <- sqrt(colMeans((auto$mpg - fit.path)^2))
rmse.some <- sqrt(colMeans((auto$mpg - fit.some)^2))
plot(log(mod$lambda), rmse.path, cex = 0.5)
points(log(lam), rmse.some, pch = 0, col = "red")
######***###### family = "multigaussian" ######***######
# load data
data(auto)
# fit model (formula method, response = (mpg, displacement))
y <- as.matrix(auto[,c(1,3)])
mod <- grpnet(y ~ ., data = auto[,-c(1,3)], family = "multigaussian",
standardize.response = TRUE)
# get fitted values for regularization path (output = 392 x 2 x 100 array)
fit.path <- predict(mod, newdata = auto)
# get fitted values at 3 particular points (output = 392 x 2 x 3 array)
lam <- quantile(mod$lambda, probs = c(0.25, 0.5, 0.75))
fit.some <- predict(mod, newdata = auto, s = lam)
# compare rmse for solutions (mpg)
rmse1.path <- sqrt(colMeans((y[,1] - fit.path[,1,])^2))
rmse1.some <- sqrt(colMeans((y[,1] - fit.some[,1,])^2))
plot(log(mod$lambda), rmse1.path, cex = 0.5)
points(log(lam), rmse1.some, pch = 0, col = "red")
# compare rmse for solutions (displacement)
rmse2.path <- sqrt(colMeans((y[,2] - fit.path[,2,])^2))
rmse2.some <- sqrt(colMeans((y[,2] - fit.some[,2,])^2))
plot(log(mod$lambda), rmse2.path, cex = 0.5)
points(log(lam), rmse2.some, pch = 0, col = "red")
######***###### family = "svm1" ######***######
# load data
data(auto)
# redefine origin (Domestic vs Foreign)
auto$origin <- ifelse(auto$origin == "American", "Domestic", "Foreign")
# fit model (formula method, response = origin with 2 levels)
mod <- grpnet(origin ~ ., data = auto, family = "svm1")
# get predicted classes for regularization path (output = 392 x 100 matrix)
fit.path <- predict(mod, newdata = auto, type = "class")
# get predicted classes at 3 particular points (output = 392 x 3 matrix)
lam <- quantile(mod$lambda, probs = c(0.25, 0.5, 0.75))
fit.some <- predict(mod, newdata = auto, type = "class", s = lam)
# compare misclassification rate for solutions
miss.path <- 1 - colMeans(auto$origin == fit.path)
miss.some <- 1 - colMeans(auto$origin == fit.some)
plot(log(mod$lambda), miss.path, cex = 0.5)
points(log(lam), miss.some, pch = 0, col = "red")
######***###### family = "svm2" ######***######
# load data
data(auto)
# redefine origin (Domestic vs Foreign)
auto$origin <- ifelse(auto$origin == "American", "Domestic", "Foreign")
# fit model (formula method, response = origin with 2 levels)
mod <- grpnet(origin ~ ., data = auto, family = "svm2")
# get predicted classes for regularization path (output = 392 x 100 matrix)
fit.path <- predict(mod, newdata = auto, type = "class")
# get predicted classes at 3 particular points (output = 392 x 3 matrix)
lam <- quantile(mod$lambda, probs = c(0.25, 0.5, 0.75))
fit.some <- predict(mod, newdata = auto, type = "class", s = lam)
# compare misclassification rate for solutions
miss.path <- 1 - colMeans(auto$origin == fit.path)
miss.some <- 1 - colMeans(auto$origin == fit.some)
plot(log(mod$lambda), miss.path, cex = 0.5)
points(log(lam), miss.some, pch = 0, col = "red")
######***###### family = "logit" ######***######
# load data
data(auto)
# redefine origin (Domestic vs Foreign)
auto$origin <- ifelse(auto$origin == "American", "Domestic", "Foreign")
# fit model (formula method, response = origin with 2 levels)
mod <- grpnet(origin ~ ., data = auto, family = "logit")
# get predicted classes for regularization path (output = 392 x 100 matrix)
fit.path <- predict(mod, newdata = auto, type = "class")
# get predicted classes at 3 particular points (output = 392 x 3 matrix)
lam <- quantile(mod$lambda, probs = c(0.25, 0.5, 0.75))
fit.some <- predict(mod, newdata = auto, type = "class", s = lam)
# compare misclassification rate for solutions
miss.path <- 1 - colMeans(auto$origin == fit.path)
miss.some <- 1 - colMeans(auto$origin == fit.some)
plot(log(mod$lambda), miss.path, cex = 0.5)
points(log(lam), miss.some, pch = 0, col = "red")
######***###### family = "binomial" ######***######
# load data
data(auto)
# redefine origin (Domestic vs Foreign)
auto$origin <- ifelse(auto$origin == "American", "Domestic", "Foreign")
# fit model (formula method, response = origin with 2 levels)
mod <- grpnet(origin ~ ., data = auto, family = "binomial")
# get predicted classes for regularization path (output = 392 x 100 matrix)
fit.path <- predict(mod, newdata = auto, type = "class")
# get predicted classes at 3 particular points (output = 392 x 3 matrix)
lam <- quantile(mod$lambda, probs = c(0.25, 0.5, 0.75))
fit.some <- predict(mod, newdata = auto, type = "class", s = lam)
# compare misclassification rate for solutions
miss.path <- 1 - colMeans(auto$origin == fit.path)
miss.some <- 1 - colMeans(auto$origin == fit.some)
plot(log(mod$lambda), miss.path, cex = 0.5)
points(log(lam), miss.some, pch = 0, col = "red")
######***###### family = "multinomial" ######***######
# load data
data(auto)
# fit model (formula method, response = origin with 3 levels)
mod <- grpnet(origin ~ ., data = auto, family = "multinomial")
# get predicted classes for regularization path (output = 392 x 100 matrix)
fit.path <- predict(mod, newdata = auto, type = "class")
# get predicted classes at 3 particular points (output = 392 x 3 matrix)
lam <- quantile(mod$lambda, probs = c(0.25, 0.5, 0.75))
fit.some <- predict(mod, newdata = auto, type = "class", s = lam)
# compare misclassification rate for solutions
miss.path <- 1 - colMeans(auto$origin == fit.path)
miss.some <- 1 - colMeans(auto$origin == fit.some)
plot(log(mod$lambda), miss.path, cex = 0.5)
points(log(lam), miss.some, pch = 0, col = "red")
######***###### family = "ordinal" ######***######
# load data
data(auto)
# fit model (formula method, response = cylinders with 5 levels)
mod <- grpnet(cylinders ~ ., data = auto, family = "ordinal")
# get fitted values for regularization path (output = 392 x 100 matrix)
fit.path <- predict(mod, newdata = auto, type = "class")
# get fitted values at 3 particular points (output = 392 x 3 matrix)
lam <- quantile(mod$lambda, probs = c(0.25, 0.5, 0.75))
fit.some <- predict(mod, newdata = auto, type = "class", s = lam)
# compare misclassification rate for solutions
miss.path <- 1 - colMeans(auto$cylinders == fit.path)
miss.some <- 1 - colMeans(auto$cylinders == fit.some)
plot(log(mod$lambda), miss.path, cex = 0.5)
points(log(lam), miss.some, pch = 0, col = "red")
######***###### family = "poisson" ######***######
# load data
data(auto)
# fit model (formula method, response = horsepower)
mod <- grpnet(horsepower ~ ., data = auto, family = "poisson")
# get fitted values for regularization path (output = 392 x 100 matrix)
fit.path <- predict(mod, newdata = auto, type = "response")
# get fitted values at 3 particular points (output = 392 x 3 matrix)
lam <- quantile(mod$lambda, probs = c(0.25, 0.5, 0.75))
fit.some <- predict(mod, newdata = auto, type = "response", s = lam)
# compare rmse for solutions
rmse.path <- sqrt(colMeans((auto$horsepower - fit.path)^2))
rmse.some <- sqrt(colMeans((auto$horsepower - fit.some)^2))
plot(log(mod$lambda), rmse.path, cex = 0.5)
points(log(lam), rmse.some, pch = 0, col = "red")
######***###### family = "negative.binomial" ######***######
# load data
data(auto)
# fit model (formula method, response = horsepower)
mod <- grpnet(horsepower ~ ., data = auto, family = "negative.binomial")
# get fitted values for regularization path (output = 392 x 100 matrix)
fit.path <- predict(mod, newdata = auto, type = "response")
# get fitted values at 3 particular points (output = 392 x 3 matrix)
lam <- quantile(mod$lambda, probs = c(0.25, 0.5, 0.75))
fit.some <- predict(mod, newdata = auto, type = "response", s = lam)
# compare rmse for solutions
rmse.path <- sqrt(colMeans((auto$horsepower - fit.path)^2))
rmse.some <- sqrt(colMeans((auto$horsepower - fit.some)^2))
plot(log(mod$lambda), rmse.path, cex = 0.5)
points(log(lam), rmse.some, pch = 0, col = "red")
######***###### family = "Gamma" ######***######
# load data
data(auto)
# fit model (formula method, response = mpg)
mod <- grpnet(mpg ~ ., data = auto, family = "Gamma")
# get fitted values for regularization path (output = 392 x 100 matrix)
fit.path <- predict(mod, newdata = auto, type = "response")
# get fitted values at 3 particular points (output = 392 x 3 matrix)
lam <- quantile(mod$lambda, probs = c(0.25, 0.5, 0.75))
fit.some <- predict(mod, newdata = auto, type = "response", s = lam)
# compare rmse for solutions
rmse.path <- sqrt(colMeans((auto$mpg - fit.path)^2))
rmse.some <- sqrt(colMeans((auto$mpg - fit.some)^2))
plot(log(mod$lambda), rmse.path, cex = 0.5)
points(log(lam), rmse.some, pch = 0, col = "red")
######***###### family = "inverse.gaussian" ######***######
# load data
data(auto)
# fit model (formula method, response = mpg)
mod <- grpnet(mpg ~ ., data = auto, family = "inverse.gaussian")
# get fitted values for regularization path (output = 392 x 100 matrix)
fit.path <- predict(mod, newdata = auto, type = "response")
# get fitted values at 3 particular points (output = 392 x 3 matrix)
lam <- quantile(mod$lambda, probs = c(0.25, 0.5, 0.75))
fit.some <- predict(mod, newdata = auto, type = "response", s = lam)
# compare rmse for solutions
rmse.path <- sqrt(colMeans((auto$mpg - fit.path)^2))
rmse.some <- sqrt(colMeans((auto$mpg - fit.some)^2))
plot(log(mod$lambda), rmse.path, cex = 0.5)
points(log(lam), rmse.some, pch = 0, col = "red")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.