PVR: Principal Variable Regression (PVR)
In plsVarSel: Variable Selection in Partial Least Squares

View source: R/PVS_PVR.R

PVR	R Documentation

Principal Variable Regression (PVR)

Description

Greedy algorithm for extracting the most dominant variables/columns with respect to simultaneous explained X-variance and squared correlation with Y.

Usage

PVR(X, Y, nvar = 2, ncomp = NULL)

Arguments

`X`	numeric predictor `matrix`.
`Y`	numeric response `vector` or `matrix` (single or multiple responses).
`nvar`	integer, the required number of selected variables (default = 2).
`ncomp`	integer, the number of principal components to include in the voting process (default = all PCs).

Value

A list containing:

`ids`	The indices of the selected variables.
`betas`	The regression coefficients (including the constant term) for prediction of Y from the selected variables.
`Q`	Orthonormal scores (associated with the selected variables).
`R`	Corresponding loadings. NOTE: R[,vperm] is upper triangular.
`vperm`	Indices arranged in the order of the nvar selected and all non-selected variables. NOTE: R[,vperm] is upper triangular.
`U`	The normalized PCA-scores.
`s`	Singular values of the mean centered X.
`ssEX`	The X-variances explained by the selected variables.
`ssEY`	The Y-variances explained by the selected variables.
`ni`	The norms of the (residual) selected variables before the score-normalization (Q).

Author(s)

Ulf Indahl, Kristian Hovde Liland.

References

Joakim Skogholt, Kristian Hovde Liland, Tormod Næs, Age K. Smilde, Ulf Geir Indahl, Selection of principal variables through a modified Gram–Schmidt process with and without supervision, Journal of Chemometrics, Volume 37, Issue 10, Pages e3510 (2023), https://doi.org/10.1002/cem.3510

Examples

library(pls)
data(gasoline, package = "pls")

# PVR: Select 10 variables using all PCs in voting
pvr_result <- PVR(gasoline$NIR, gasoline$octane, nvar = 10)

# Compare with PCR using all variables
pcr_result <- pcr(octane ~ NIR, ncomp = 10, data = gasoline, 
                  validation = "CV", scale = FALSE)

# Compare X-variance and Y-variance explained
par(mfrow = c(1, 2))
plot(cumsum(pvr_result$ssEX), type = "b", col = "blue", 
     xlab = "Number of Variables/Components", 
     ylab = "Cumulative % X-Variance",
     main = "X-Variance: PVR vs PCR",
     ylim = c(50, 100))
pcr_xvar <- 100 * cumsum(pcr_result$Xvar) / pcr_result$Xtotvar
lines(seq_along(pcr_xvar), pcr_xvar, type = "b", col = "red")
legend("bottomright", legend = c("PVR (10 vars)", "PCR (10 comps)"),
       col = c("blue", "red"), lty = 1, pch = 1)

plot(cumsum(pvr_result$ssEY), type = "b", col = "blue", 
     xlab = "Number of Variables/Components", 
     ylab = "Cumulative % Y-Variance",
     main = "Y-Variance: PVR vs PCR",
     ylim = c(0, 100))
pcr_yvar <- 100 * R2(pcr_result)$val[1,1,-1]
lines(seq_along(pcr_yvar), pcr_yvar, type = "b", col = "red")
legend("bottomright", legend = c("PVR (10 vars)", "PCR (10 comps)"),
       col = c("blue", "red"), lty = 1, pch = 1)
par(mfrow = c(1, 1))

# Predict using selected variables
X_selected <- gasoline$NIR[, pvr_result$ids]
y_pred_pvr <- cbind(1, X_selected) %*% pvr_result$betas[, ncol(pvr_result$betas)]
y_pred_pcr <- predict(pcr_result, ncomp = 10, newdata = gasoline)

# Compare RMSE (training error - same data used for fitting)
rmse_pvr <- sqrt(mean((gasoline$octane - y_pred_pvr)^2))
rmse_pcr <- sqrt(mean((gasoline$octane - y_pred_pcr)^2))
cat("RMSE - PVR:", round(rmse_pvr, 4), "\n")
cat("RMSE - PCR:", round(rmse_pcr, 4), "\n")

plsVarSel documentation built on Feb. 13, 2026, 9:07 a.m.