daglm: DA using GLIM Regression on the Y-Dummy table In mlesnoff/rnirs: Dimension reduction, Regression and Discrimination for Chemometrics

 daglm R Documentation

DA using GLIM Regression on the Y-Dummy table

Description

DA-GLM

1- The class membership `y` (unidimensional variable) for the reference (= training) observations is firstly transformed (with function `dummy`) to a table `Ydummy` containing a number of `nclas` dummy variables, where `nclas` is the number of classes in `y`.

2- Then, a generalized linear regression model (GLIM, using function `glm`) is fitted between the `X`-data and each of the dummy variables (i.e. columns of the dummy table `Ydummy`).

3- For a given new observation, the final prediction (a class) corresponds to the dummy variable for which the prediction is the highest.

When the number of classes is higher than two, this method can be affected by a masking effect (see eg. Hastie et al. 2009, section 4.2): some class(es) can be masked (therefore not well predicted) if more than two classes are aligned in the `X`-space. Caution should thereefore be taken about such eventual masking effects.

Usage

``````daglm(Xr, Yr, Xu, Yu = NULL, family = binomial(link = "logit"), weights = NULL)
``````

Arguments

 `Xr` A `n x p` matrix or data frame of reference (= training) observations. `Yr` A vector of length `n`, or a `n x 1` matrix, of reference (= training) responses (class membership). `Xu` A `m x p` matrix or data frame of new (= test) observations to be predicted. `Yu` A vector of length `m`, or a `m x 1` matrix, of the true response (class membership). Default to `NULL`. `family` Specify the GLIM model used by function `glm`. See function `family`. By default, a logistic binomial regression model is fitted. `weights` A vector of length `n` defining a priori weights to apply to the training observations. Internally, weights are "normalized" to sum to 1. Default to `NULL` (weights are set to `1 / n`).

Value

A list of outputs, such as:

 `y` Responses for the test data. `fit` Predictions for the test data. `r` Residuals for the test data.

Examples

``````
data(iris)

X <- iris[, 1:4]
y <- iris[, 5]
N <- nrow(X)

m <- round(.25 * N) # Test
n <- N - m          # Training
s <- sample(1:N, m)
Xr <- X[-s, ]
yr <- y[-s]
Xu <- X[s, ]
yu <- y[s]

## Binomial model with logit link (logistic regression)

fm <- daglm(Xr, yr, Xu, yu)
names(fm)
fm\$ni

err(fm)

## Gaussian model with identity link (= usual linear model)

fm <- daglm(Xr, yr, Xu, yu, family = gaussian)
err(fm)

``````

mlesnoff/rnirs documentation built on April 24, 2023, 4:17 a.m.