daglm: DA using GLIM Regression on the Y-Dummy table

View source: R/daglm.R

daglmR Documentation

DA using GLIM Regression on the Y-Dummy table

Description

DA-GLM

1- The class membership y (unidimensional variable) for the reference (= training) observations is firstly transformed (with function dummy) to a table Ydummy containing a number of nclas dummy variables, where nclas is the number of classes in y.

2- Then, a generalized linear regression model (GLIM, using function glm) is fitted between the X-data and each of the dummy variables (i.e. columns of the dummy table Ydummy).

3- For a given new observation, the final prediction (a class) corresponds to the dummy variable for which the prediction is the highest.

When the number of classes is higher than two, this method can be affected by a masking effect (see eg. Hastie et al. 2009, section 4.2): some class(es) can be masked (therefore not well predicted) if more than two classes are aligned in the X-space. Caution should thereefore be taken about such eventual masking effects.

Usage

daglm(Xr, Yr, Xu, Yu = NULL, family = binomial(link = "logit"), weights = NULL)

Arguments

Xr

A n x p matrix or data frame of reference (= training) observations.

Yr

A vector of length n, or a n x 1 matrix, of reference (= training) responses (class membership).

Xu

A m x p matrix or data frame of new (= test) observations to be predicted.

Yu

A vector of length m, or a m x 1 matrix, of the true response (class membership). Default to NULL.

family

Specify the GLIM model used by function glm. See function family. By default, a logistic binomial regression model is fitted.

weights

A vector of length n defining a priori weights to apply to the training observations. Internally, weights are "normalized" to sum to 1. Default to NULL (weights are set to 1 / n).

Value

A list of outputs, such as:

y

Responses for the test data.

fit

Predictions for the test data.

r

Residuals for the test data.

Examples


data(iris)

X <- iris[, 1:4]
y <- iris[, 5]
N <- nrow(X)

m <- round(.25 * N) # Test
n <- N - m          # Training
s <- sample(1:N, m)
Xr <- X[-s, ]
yr <- y[-s]
Xu <- X[s, ]
yu <- y[s]

## Binomial model with logit link (logistic regression)

fm <- daglm(Xr, yr, Xu, yu)
names(fm)
headm(fm$y)
headm(fm$fit)
headm(fm$r)
fm$ni

err(fm)

## Gaussian model with identity link (= usual linear model)

fm <- daglm(Xr, yr, Xu, yu, family = gaussian)
err(fm)


mlesnoff/rnirs documentation built on April 24, 2023, 4:17 a.m.