dfplsr_cov: Degrees of freedom of PLSR1 Models
In mlesnoff/rnirs: Dimension reduction, Regression and Discrimination for Chemometrics

dfplsr_cov

R Documentation

Degrees of freedom of PLSR1 Models

Description

Monte Carlo estimation of the model complexity df (number of degrees of freedom) of univariate PLSR models. See in particular Ye 1998 and Efron 2004.

- In dfplsr_cov, the covariances presented by Efron 2004 (Eq. 2.16) are calculated by parametric bootstrap. The residual variance sigma^2 is estimated from a low-biased model.

- In dfplsr_div, the divergencies dy_fit/dy presented by Ye (1998) and Efron (2004) are calculated by perturbation analysis. This is a Stein unbiased risk estimation (SURE) of df.

Usage


dfplsr_cov(
    X, Y, ncomp, algo = NULL,
    maxlv = 50, 
    B = 30, seed = NULL, 
    print = TRUE, 
    ...
    )

dfplsr_div(
    X, Y, ncomp, algo = NULL,
    eps = 1e-2,
    B = 30, samp = c("random", "syst"), 
    seed = NULL, 
    print = TRUE, 
    ...
    )

Arguments

`X`	A `n x p` matrix or data frame of training observations.
`Y`	A vector of length `n` of training responses.
`ncomp`	The maximal number of PLS scores (= components = latent variables) to consider.
`algo`	a PLS algorithm. Default to `NULL` (`pls_kernel` is used).
`maxlv`	For `dfplsr_cov`: dimenson of the PLSR model used for parametric bootstrap.
`eps`	For `dfplsr_div`. The `epsilon` quantity used for scaling the perturbation analysis.
`B`	For `dfplsr_cov`: number of bootstrap replications. For `dfplsr_div`: number of observations in the data receiving perturbation (the maximum is `n`).
`samp`	For `dfplsr_div`. When `ns < n`, method used for sampling the observations receiving a perturbation. Possible values are "random" (default) ore "syst" (observations are sampled regularly over `y`).
`seed`	An integer defining the seed for the random simulation, or `NULL` (default). See `set.seed`.
`print`	Logical. If `TRUE`, fitting information are printed.
`...`	Optionnal arguments to pass in the function defined in `algo`.

Value

A list of outputs (see examples), such as:

`df`	The model complexity for the models with `a = 0, 1, ..., ncomp` components.

References

Efron, B., 2004. The Estimation of Prediction Error. Journal of the American Statistical Association 99, 619â632. https://doi.org/10.1198/016214504000000692

Hastie, T., Tibshirani, R.J., 1990. Generalized Additive Models, Monographs on statistics and applied probablity. Chapman and Hall/CRC, New York, USA.

Hastie, T., Tibshirani, R., Friedman, J., 2009. The elements of statistical learning: data mining, inference, and prediction, 2nd ed. Springer, NewYork.

Hastie, T., Tibshirani, R., Wainwright, M., 2015. Statistical Learning with Sparsity: The Lasso and Generalizations. CRC Press

Kramer, N., Braun, M.L., 2007. Kernelizing PLS, degrees of freedom, and efficient model selection, in: Proceedings of the 24th International Conference on Machine Learning, ICML 07. Association for Computing Machinery, New York, NY, USA, pp. 441-448. https://doi.org/10.1145/1273496.1273552

Kramer, N., Sugiyama, M., 2011. The Degrees of Freedom of Partial Least Squares Regression. Journal of the American Statistical Association 106, 697-705. https://doi.org/10.1198/jasa.2011.tm10107

Kramer, N., Braun, M. L. 2019. plsdof: Degrees of Freedom and Statistical Inference for Partial Least Squares Regression. R package version 0.2-9. https://cran.r-project.org

Stein, C.M., 1981. Estimation of the Mean of a Multivariate Normal Distribution. The Annals of Statistics 9, 1135â1151.

Ye, J., 1998. On Measuring and Correcting the Effects of Data Mining and Model Selection. Journal of the American Statistical Association 93, 120â131. https://doi.org/10.1080/01621459.1998.10474094

Zou, H., Hastie, T., Tibshirani, R., 2007. On the âdegrees of freedomâ of the lasso. The Annals of Statistics 35, 2173â2192. https://doi.org/10.1214/009053607000000127

Examples


## The example below reproduces the numerical illustration
## given by Kramer & Sugiyama 2011 on the Ozone data (Fig. 1, center).
## Note that function "pls.model" used for df calculations
## in the R package "plsdof" v0.2-9 (Kramer & Braun 2019)
## automatically scales the X matrix before PLS.

data(datozone)

z <- datozone$X
u <- which(!is.na(rowSums(z)))    ## Removing rows with NAs
X <- z[u, -4]
y <- z[u, 4]
dim(X)

zX <- scale(X)    ## Scaling only for consistency with plsdof

ncomp <- 12
B <- 50   ## Should be increased for more stability
u <- dfplsr_cov(zX, y, ncomp = ncomp, B = B)
v <- dfplsr_div(zX, y, ncomp = ncomp, B = B)

#library(plsdof)
#fm <- pls.model(
#  Xr, yr, m = ncomp, compute.DoF = TRUE,
#  compute.jacobian = FALSE,                   ## Krylov
#  use.kernel = FALSE
#  )
#df.kramer <- fm$DoF
df.kramer <- c(1.000000, 3.712373, 6.456417, 11.633565, 12.156760, 11.715101, 12.349716,
  12.192682, 13.000000, 13.000000, 13.000000, 13.000000, 13.000000)

zncomp <- 0:ncomp
plot(zncomp, u$df, type = "l", col = "red",
     ylim = c(0, 15),
     xlab = "Nb components", ylab = "df")
lines(zncomp, v$df, col = "blue")                 
lines(zncomp, zncomp + 1, col = "grey40")   ## Naive df
points(zncomp, df.kramer, pch = 16)
abline(h = 1, lty = 2, col = "grey")

mlesnoff/rnirs documentation built on April 24, 2023, 4:17 a.m.

mlesnoff/rnirs index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

mlesnoff/rnirs
Dimension reduction, Regression and Discrimination for Chemometrics

dfplsr_cov: Degrees of freedom of PLSR1 Models
In mlesnoff/rnirs: Dimension reduction, Regression and Discrimination for Chemometrics

Degrees of freedom of PLSR1 Models

Description

Usage

Arguments

Value

References

Examples

Related to dfplsr_cov in mlesnoff/rnirs...

R Package Documentation

Browse R Packages

We want your feedback!

mlesnoff/rnirs Dimension reduction, Regression and Discrimination for Chemometrics

dfplsr_cov: Degrees of freedom of PLSR1 Models In mlesnoff/rnirs: Dimension reduction, Regression and Discrimination for Chemometrics

Degrees of freedom of PLSR1 Models

Description

Usage

Arguments

Value

References

Examples

Related to dfplsr_cov in mlesnoff/rnirs...

R Package Documentation

Browse R Packages

We want your feedback!

mlesnoff/rnirs
Dimension reduction, Regression and Discrimination for Chemometrics

dfplsr_cov: Degrees of freedom of PLSR1 Models
In mlesnoff/rnirs: Dimension reduction, Regression and Discrimination for Chemometrics