dfplsr_cov | R Documentation |
Monte Carlo estimation of the model complexity df
(number of degrees of freedom) of univariate PLSR models. See in particular Ye 1998 and Efron 2004.
- In dfplsr_cov
, the covariances presented by Efron 2004 (Eq. 2.16) are calculated by parametric bootstrap. The residual variance sigma^2
is estimated from a low-biased model.
- In dfplsr_div
, the divergencies dy_fit/dy
presented by Ye (1998) and Efron (2004) are calculated by perturbation analysis. This is a Stein unbiased risk estimation (SURE) of df
.
dfplsr_cov(
X, Y, ncomp, algo = NULL,
maxlv = 50,
B = 30, seed = NULL,
print = TRUE,
...
)
dfplsr_div(
X, Y, ncomp, algo = NULL,
eps = 1e-2,
B = 30, samp = c("random", "syst"),
seed = NULL,
print = TRUE,
...
)
X |
A |
Y |
A vector of length |
ncomp |
The maximal number of PLS scores (= components = latent variables) to consider. |
algo |
a PLS algorithm. Default to |
maxlv |
For |
eps |
For |
B |
For |
samp |
For |
seed |
An integer defining the seed for the random simulation, or |
print |
Logical. If |
... |
Optionnal arguments to pass in the function defined in |
A list of outputs (see examples), such as:
df |
The model complexity for the models with |
Efron, B., 2004. The Estimation of Prediction Error. Journal of the American Statistical Association 99, 619â632. https://doi.org/10.1198/016214504000000692
Hastie, T., Tibshirani, R.J., 1990. Generalized Additive Models, Monographs on statistics and applied probablity. Chapman and Hall/CRC, New York, USA.
Hastie, T., Tibshirani, R., Friedman, J., 2009. The elements of statistical learning: data mining, inference, and prediction, 2nd ed. Springer, NewYork.
Hastie, T., Tibshirani, R., Wainwright, M., 2015. Statistical Learning with Sparsity: The Lasso and Generalizations. CRC Press
Kramer, N., Braun, M.L., 2007. Kernelizing PLS, degrees of freedom, and efficient model selection, in: Proceedings of the 24th International Conference on Machine Learning, ICML 07. Association for Computing Machinery, New York, NY, USA, pp. 441-448. https://doi.org/10.1145/1273496.1273552
Kramer, N., Sugiyama, M., 2011. The Degrees of Freedom of Partial Least Squares Regression. Journal of the American Statistical Association 106, 697-705. https://doi.org/10.1198/jasa.2011.tm10107
Kramer, N., Braun, M. L. 2019. plsdof: Degrees of Freedom and Statistical Inference for Partial Least Squares Regression. R package version 0.2-9. https://cran.r-project.org
Stein, C.M., 1981. Estimation of the Mean of a Multivariate Normal Distribution. The Annals of Statistics 9, 1135â1151.
Ye, J., 1998. On Measuring and Correcting the Effects of Data Mining and Model Selection. Journal of the American Statistical Association 93, 120â131. https://doi.org/10.1080/01621459.1998.10474094
Zou, H., Hastie, T., Tibshirani, R., 2007. On the âdegrees of freedomâ of the lasso. The Annals of Statistics 35, 2173â2192. https://doi.org/10.1214/009053607000000127
## The example below reproduces the numerical illustration
## given by Kramer & Sugiyama 2011 on the Ozone data (Fig. 1, center).
## Note that function "pls.model" used for df calculations
## in the R package "plsdof" v0.2-9 (Kramer & Braun 2019)
## automatically scales the X matrix before PLS.
data(datozone)
z <- datozone$X
u <- which(!is.na(rowSums(z))) ## Removing rows with NAs
X <- z[u, -4]
y <- z[u, 4]
dim(X)
zX <- scale(X) ## Scaling only for consistency with plsdof
ncomp <- 12
B <- 50 ## Should be increased for more stability
u <- dfplsr_cov(zX, y, ncomp = ncomp, B = B)
v <- dfplsr_div(zX, y, ncomp = ncomp, B = B)
#library(plsdof)
#fm <- pls.model(
# Xr, yr, m = ncomp, compute.DoF = TRUE,
# compute.jacobian = FALSE, ## Krylov
# use.kernel = FALSE
# )
#df.kramer <- fm$DoF
df.kramer <- c(1.000000, 3.712373, 6.456417, 11.633565, 12.156760, 11.715101, 12.349716,
12.192682, 13.000000, 13.000000, 13.000000, 13.000000, 13.000000)
zncomp <- 0:ncomp
plot(zncomp, u$df, type = "l", col = "red",
ylim = c(0, 15),
xlab = "Nb components", ylab = "df")
lines(zncomp, v$df, col = "blue")
lines(zncomp, zncomp + 1, col = "grey40") ## Naive df
points(zncomp, df.kramer, pch = 16)
abline(h = 1, lty = 2, col = "grey")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.