classifSample.lda: Classificatory Discriminant Analysis
In MorphoTools2: Multivariate Morphometric Analysis

classifSample.lda

R Documentation

Classificatory Discriminant Analysis

Description

These functions compute discriminant function based on an independent training set and classify observations in sample set. Linear discriminant function (classifSample.lda), quadratic discriminant function (classifSample.qda), or nonparametric k-nearest neighbour classification method (classifSample.knn) can be used.

Usage

classifSample.lda(sampleData, trainingData)

classifSample.qda(sampleData, trainingData)

classifSample.knn(sampleData, trainingData, k)

Arguments

`sampleData`	observations which should be classified. An object of class `morphodata`.
`trainingData`	observations for computing discriminant function. An object of class `morphodata`.
`k`	number of neighbours considered.

Details

The classifSample.lda and classifSample.qda performs classification using linear and quadratic discriminant function using the lda and qda functions from the package MASS. Nonparametric classification method classifSample.knn (k-nearest neighbours) is performed using the knn functions from the package class. The classifSample functions are designed to classify hybrid populations, type herbarium specimens, atypical samples, entirely new data, etc. Discriminant criterion is developed from the original (training) dataset and applied to the specific sample (set).

LDA and QDA analyses have some requirements: (1) no character can be a linear combination of any other character; (2) no pair of characters can be highly correlated; (3) no character can be invariant in any taxon (group); (4) for the number of taxa (g), characters (p) and total number of samples (n) should hold: 0 < p < (n - g), and (5) there must be at least two groups (taxa), and in each group there must be at least two objects. Violation of some of these assumptions may result in warnings or error messages (rank deficiency).

Value

an object of class classifdata with the following elements:

`ID`	IDs of each row.
`Population`	population membership of each row.
`Taxon`	taxon membership of each row.
`classif`	classification from discriminant analysis.
`prob`	posterior probabilities of classification into each taxon (if calculated by `classif.lda` or `classif.qda`), or proportion of the votes for the winning class (calculated by `classif.knn`)
`correct`	logical, correctness of classification.

Examples

data(centaurea)

# remove NAs and linearly dependent characters (characters with unique contributions
#                  can be identified by stepwise discriminant analysis.)
centaurea = naMeanSubst(centaurea)
centaurea = removePopulation(centaurea, populationName = c("LIP", "PREL"))
centaurea = keepCharacter(centaurea, c("MLW", "ML", "IW", "LS", "IV", "MW", "MF",
                                    "AP", "IS", "LBA", "LW", "AL", "ILW", "LBS",
                                    "SFT", "CG", "IL", "LM", "ALW", "AW", "SF") )
# add a small constant to characters witch are invariant within taxa
centaurea$data[ centaurea$Taxon == "hybr", "LM" ][1] =
             centaurea$data[ centaurea$Taxon == "hybr", "LM" ][1] + 0.000001
centaurea$data[ centaurea$Taxon == "ph", "IV" ][1] =
             centaurea$data[ centaurea$Taxon == "ph", "IV" ][1] + 0.000001
centaurea$data[ centaurea$Taxon == "st", "LBS"][1] =
             centaurea$data[ centaurea$Taxon == "st", "LBS"][1] + 0.000001


trainingSet = removePopulation(centaurea, populationName = "LES")
LES = keepPopulation(centaurea, populationName = "LES")


# classification by linear discriminant function
classifSample.lda(LES, trainingSet)

# classification by quadratic discriminant function
classifSample.qda(LES, trainingSet)

# classification by nonparametric k-nearest neighbour method
# use knn.select to find the optimal K.
knn.select(trainingSet)
classifSample.knn(LES, trainingSet, k = 12)

MorphoTools2 documentation built on Oct. 2, 2024, 5:07 p.m.