selwold: Heuristic selection of the dimension of a latent variable...
In mlesnoff/rnirs: Dimension reduction, Regression and Discrimination for Chemometrics

selwold

R Documentation

Heuristic selection of the dimension of a latent variable model with the Wold's criterion

Description

The function helps selecting the dimension (i.e. nb. components) of latent variable models (such as PCR, PLSR, PCDA, PLSDA, ...) using the "Wold criterion".

The criterion is the "precision gain ratio" R = 1 - r(a+1) / r(a) where r is the observed error rate quantifying the model performance (rmsep, msep, classification error rate, etc.) and a the model dimension (= nb. components).

R represents the relative gain in efficiency after a new dimension is added to the model. The iterations continue until R becomes lower than a threshold value alpha. By default and only as an indication, the default alpha = 1 is set in the function, but the user should set any other value depending on his data and parcimony objective.

In the original article, for dimension selection, Wold (1978; see also Bro et al. 2008) used the ratio of cross-validated vs. training residual sums of squares (i.e. PRESS over SSR). Instead, function selwold only compares homogeneous quantities (the input vector r), such as PRESS values as in Li et al. (2002) and Andries et al. (2011). This last approach is equivalent to the "punish factor" described in Westad & Martens (2000).

In addition to R, function selwold also calculates delta values diff = r(a+1) - r(a).

In some cases, particularly for classification (discrimination), the observed error rate r is erratic, making difficult the interpretation of the R variations. For such a situation, selwold proposes to calculate R and diff on two other possible values than the raw error rate r, using argument typ:

- typ = "smooth". R and diff are calculated on a non-parametric smoothing of r. The smoothing is implemented with function lowess.

- typ = "integral". R and diff are calculated on the area under the observed error rate curve r. In this case, ratio R becomes R = c.r(a+1) / c.r(a) - 1, where c.r is the cumulated error rate ("area" under the curve).

Note that any other values than error rates (e.g. eigenvalues returned by a PCA) can be used as input r of selwold.

Usage

selwold(r, start = 0, 
  type = c("raw", "smooth", "integral"), 
  alpha = .01, digits = 3,
  plot = c("R", "diff", "none"),
  xlab = "Index", ylab = "Value", main = "r",
  ...
  )

Arguments

`r`	A vector of a given error rate `r` (or any other value).
`start`	Starting value for indexing the elements of vector `r` (default to 0). The index is returned in the output data.frame and the plots.
`type`	Type of value used for calculating `R` and `diff`. Possible values are `"raw"` (default; calculations on `r`), `"smooth"` (on the smoothing of `r`) and `"integral"` (on the area under `r`).
`alpha`	Proportion `alpha` used as threshold for `R`.
`digits`	The number of digits for `R`.
`plot`	Output plotted in the right side of the graphic window. Possible values are `"R"` (default), `"diff"` or `"none"` (no plot).
`xlab`	x-axis label of the plot of `r` (left-side in the graphic window).
`ylab`	y-axis label of the plot of `r` (left-side in the graphic window).
`main`	Title of the plot of `r` (left-side in the graphic window).
`...`	Other arguments to pass in function `lowess`.

Value

A list of outputs (see examples), such as:

`res`	Data.frame with variables: `r` (raw input `r`), `val` (eventually tranformed `r`), `diff` (calculated on `val`) and `R` (calculated on `val`)..
`opt`	The index of the minimum for `r`.
`sel`	The index of the selection from the `R` threshold (usually a parcimonious number of components).

References

Andries, J.P.M., Vander Heyden, Y., Buydens, L.M.C., 2011. Improved variable reduction in partial least squares modelling based on Predictive-Property-Ranked Variables and adaptation of partial least squares complexity. Analytica Chimica Acta 705, 292-305. https://doi.org/10.1016/j.aca.2011.06.037

Bro, R., Kjeldahl, K., Smilde, A.K., Kiers, H.A.L., 2008. Cross-validation of component models: A critical look at current methods. Anal Bioanal Chem 390, 1241-1251. https://doi.org/10.1007/s00216-007-1790-1

Li, B., Morris, J., Martin, E.B., 2002. Model selection for partial least squares regression. Chemometrics and Intelligent Laboratory Systems 64, 79-89. https://doi.org/10.1016/S0169-7439(02)00051-5

Westad, F., Martens, H., 2000. Variable Selection in near Infrared Spectroscopy Based on Significance Testing in Partial Least Squares Regression. J. Near Infrared Spectrosc., JNIRS 8, 117â124.

Wold S. Cross-Validatory Estimation of the Number of Components in Factor and Principal Components Models. Technometrics. 1978;20(4):397-405

Examples


data(datcass)

Xr <- datcass$Xr
yr <- datcass$yr

n <- nrow(Xr)
segm <- segmkf(n = n, K = 5, typ = "random", nrep = 1)
fm <- cvfit(
    Xr, yr,
    fun = plsr,
    ncomp = 20,
    segm = segm,
    print = TRUE
    )

z <- mse(fm, ~ ncomp)
head(z)
z[z$msep == min(z$msep), ]
plotmse(z)

u <- selwold(z$msep, alpha = .01,
    xlab = "Nb. components")
u$res
u$opt
u$sel

u <- selwold(z$msep, plot = "diff",
    xlab = "Nb. components")

u <- selwold(z$msep, type = "smooth", f = 1/3,  ## Smoothing not useful here
    xlab = "Nb. components")   
u$res
u$opt
u$sel

u <- selwold(z$msep, type = "integral", alpha = .05,
    xlab = "Nb. components")
u$res
u$opt
u$sel

mlesnoff/rnirs documentation built on April 24, 2023, 4:17 a.m.

mlesnoff/rnirs index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

mlesnoff/rnirs
Dimension reduction, Regression and Discrimination for Chemometrics

selwold: Heuristic selection of the dimension of a latent variable...
In mlesnoff/rnirs: Dimension reduction, Regression and Discrimination for Chemometrics

Heuristic selection of the dimension of a latent variable model with the Wold's criterion

Description

Usage

Arguments

Value

References

Examples

Related to selwold in mlesnoff/rnirs...

R Package Documentation

Browse R Packages

We want your feedback!

mlesnoff/rnirs Dimension reduction, Regression and Discrimination for Chemometrics

selwold: Heuristic selection of the dimension of a latent variable... In mlesnoff/rnirs: Dimension reduction, Regression and Discrimination for Chemometrics

Heuristic selection of the dimension of a latent variable model with the Wold's criterion

Description

Usage

Arguments

Value

References

Examples

Related to selwold in mlesnoff/rnirs...

R Package Documentation

Browse R Packages

We want your feedback!

mlesnoff/rnirs
Dimension reduction, Regression and Discrimination for Chemometrics

selwold: Heuristic selection of the dimension of a latent variable...
In mlesnoff/rnirs: Dimension reduction, Regression and Discrimination for Chemometrics