# Linear Discriminant Analysis (LDA) with the Moore-Penrose Pseudo-Inverse

### Description

Given a set of training data, this function builds the Linear Discriminant Analysis (LDA) classifier, where the distributions of each class are assumed to be multivariate normal and share a common covariance matrix. When the pooled sample covariance matrix is singular, the linear discriminant function is incalculable. A common method to overcome this issue is to replace the inverse of the pooled sample covariance matrix with the Moore-Penrose pseudo-inverse, which is unique and always exists. Note that when the pooled sample covariance matrix is nonsingular, it is equal to the pseudo-inverse.

The Linear Discriminant Analysis (LDA) classifier involves the assumption that the distributions of each class are assumed to be multivariate normal and share a common covariance matrix. When the pooled sample covariance matrix is singular, the linear discriminant function is incalculable. A common method to overcome this issue is to replace the inverse of the pooled sample covariance matrix with the Moore-Penrose pseudo-inverse, which is unique and always exists. Note that when the pooled sample covariance matrix is nonsingular, it is equal to the pseudo-inverse.

### Usage

1 2 3 4 5 6 7 8 9 10 | ```
lda_pseudo(x, ...)
## Default S3 method:
lda_pseudo(x, y, prior = NULL, tol = 1e-08, ...)
## S3 method for class 'formula'
lda_pseudo(formula, data, prior = NULL, tol = 1e-08, ...)
## S3 method for class 'lda_pseudo'
predict(object, newdata, ...)
``` |

### Arguments

`x` |
matrix containing the training data. The rows are the sample observations, and the columns are the features. |

`...` |
additional arguments |

`y` |
vector of class labels for each training observation |

`prior` |
vector with prior probabilities for each class. If NULL (default), then equal probabilities are used. See details. |

`tol` |
tolerance value below which eigenvalues are considered numerically equal to 0 |

`formula` |
A formula of the form |

`data` |
data frame from which variables specified in |

`object` |
trained lda_pseudo object |

`newdata` |
matrix of observations to predict. Each row corresponds to a new observation. |

### Details

The matrix of training observations are given in `x`

. The rows of `x`

contain the sample observations, and the columns contain the features for each
training observation.

The vector of class labels given in `y`

are coerced to a `factor`

.
The length of `y`

should match the number of rows in `x`

.

An error is thrown if a given class has less than 2 observations because the variance for each feature within a class cannot be estimated with less than 2 observations.

The vector, `prior`

, contains the *a priori* class membership for
each class. If `prior`

is NULL (default), the class membership
probabilities are estimated as the sample proportion of observations belonging
to each class. Otherwise, `prior`

should be a vector with the same length
as the number of classes in `y`

. The `prior`

probabilties should be
nonnegative and sum to one.

### Value

`lda_pseudo`

object that contains the trained lda_pseudo
classifier

list predicted class memberships of each row in newdata

### Examples

1 2 3 4 5 6 7 8 | ```
n <- nrow(iris)
train <- sample(seq_len(n), n / 2)
lda_pseudo_out <- lda_pseudo(Species ~ ., data = iris[train, ])
predicted <- predict(lda_pseudo_out, iris[-train, -5])$class
lda_pseudo_out2 <- lda_pseudo(x = iris[train, -5], y = iris[train, 5])
predicted2 <- predict(lda_pseudo_out2, iris[-train, -5])$class
all.equal(predicted, predicted2)
``` |