pls_rob: Robust PLS1 algorithm

View source: R/pls_rob.R

pls_robR Documentation

Robust PLS1 algorithm

Description

A robust PLS1 algorithm combining PCA outlyingness measures, PLS y-residuals and weighted PLS (WPLS). X- and y-outliers are detected in a PCA space and a PLS space, respectively, each one having a given number ncompw of components set by the user. These outliers receive a weight 0 in a final weighted PLS.

In details, the three steps are:

- Step1: A robust PCA is implemented (using pca_rob) with ncompw components. The SD-OD outlyingness (outsdod) computed on the robust score space are used for detecting X-outliers. The p.rm * n observations (where p.rm is a proportion) that have the highest outlyingness receive a weight wx = 0 (the other receive a weight wx = 1).

- Step2: A weighted PLS with ncompw components is implemented with weights wx. The y-residuals are robustly centered and scaled by median and MAD, respectively. Observations with residuals higher than a given cutoff (parametric or non parametric; argument typcut) received a weight wy = 0 (the other receive a weight wy = 1).

- Step3: The final PLS is a weighted PLS with ncomp components and weights wx * wy.

Matrix X is centered before the analyses, but X is not column-wise scaled (there is no argument scale available). If a scaling is needed, the user has to scale X before using the functions.

Row observations can receive additionnal a priori weights (using argument weights).

Usage


pls_rob(X, Y, ncomp, ncompw = 10, p.rm = .30,
                    typcut = c("param", "mad"), weights = NULL, ...)

Arguments

X

A n x p matrix or data frame of variables.

Y

A n x 1 matrix or data frame, or vector of length n, of responses.

ncomp

The maximal number of scores (= components = latent variables) to be calculated in the final PLS.

ncompw

The number of scores used for computing the X- and y-outliers (Steps 1 and 2).

p.rm

Proportion p.rm of the data used as hard rejection of X-outliers in Step 1 (See pca_rob). Default to p.rm = .30, i.e. 30pct are rejected.

typcut

Type of cutoff used for the y-residuals (centered and scaled) in Step 2. Possible values are "param" (default; cutoff = -/+ .975 Gaussian quantile) or "mad" (cutoff = 2.5).

weights

A vector of length n defining a priori weights to apply to the observations. Internally, weights are "normalized" to sum to 1. Default to NULL (weights are set to 1 / n).

...

Optional arguments to pass in function pca_rob.

Value

A list of outputs, such as:

T

The X-score matrix (n x ncomp).

P

The X-loadings matrix (p x ncomp).

W

The X-loading weights matrix (p x ncomp).

C

The Y-loading weights matrix (C = t(Beta), where Beta is the scores regression coefficients matrix).

R

The PLS projection matrix (p x ncomp).

xmeans

The centering vector of X (length p).

ymeans

The centering vector of Y (length q).

Examples


n <- 8
p <- 6
set.seed(1)
X <- matrix(rnorm(n * p, mean = 10), ncol = p, byrow = TRUE)
y <- 100 * rnorm(n)
set.seed(NULL)

pls_rob(X, y, ncomp = 3)

plsr(X, y, X, ncomp = 3, algo = pls_rob, ncompw = 2)$fit



mlesnoff/rnirs documentation built on April 24, 2023, 4:17 a.m.