classifSample.lda: Classificatory Discriminant Analysis

View source: R/classifSample.lda.R

classifSample.ldaR Documentation

Classificatory Discriminant Analysis

Description

These functions compute discriminant function based on an independent training set and classify observations in sample set. Linear discriminant function (classifSample.lda), quadratic discriminant function (classifSample.qda), or nonparametric k-nearest neighbour classification method (classifSample.knn) can be used.

Usage

classifSample.lda(sampleData, trainingData)

classifSample.qda(sampleData, trainingData)

classifSample.knn(sampleData, trainingData, k)

Arguments

sampleData

observations which should be classified. An object of class morphodata.

trainingData

observations for computing discriminant function. An object of class morphodata.

k

number of neighbours considered.

Details

The classifSample.lda and classifSample.qda performs classification using linear and quadratic discriminant function using the lda and qda functions from the package MASS. Nonparametric classification method classifSample.knn (k-nearest neighbours) is performed using the knn functions from the package class. The classifSample functions are designed to classify hybrid populations, type herbarium specimens, atypical samples, entirely new data, etc. Discriminant criterion is developed from the original (training) dataset and applied to the specific sample (set).

LDA and QDA analyses have some requirements: (1) no character can be a linear combination of any other character; (2) no pair of characters can be highly correlated; (3) no character can be invariant in any taxon (group); (4) for the number of taxa (g), characters (p) and total number of samples (n) should hold: 0 < p < (n - g), and (5) there must be at least two groups (taxa), and in each group there must be at least two objects. Violation of some of these assumptions may result in warnings or error messages (rank deficiency).

Value

an object of class classifdata with the following elements:

ID

IDs of each row.

Population

population membership of each row.

Taxon

taxon membership of each row.

classif

classification from discriminant analysis.

prob

posterior probabilities of classification into each taxon (if calculated by classif.lda or classif.qda), or proportion of the votes for the winning class (calculated by classif.knn)

correct

logical, correctness of classification.

See Also

classif.lda, classif.matrix, knn.select

Examples

data(centaurea)

# remove NAs and linearly dependent characters (characters with unique contributions
#                  can be identified by stepwise discriminant analysis.)
centaurea = naMeanSubst(centaurea)
centaurea = removePopulation(centaurea, populationName = c("LIP", "PREL"))
centaurea = keepCharacter(centaurea, c("MLW", "ML", "IW", "LS", "IV", "MW", "MF",
                                    "AP", "IS", "LBA", "LW", "AL", "ILW", "LBS",
                                    "SFT", "CG", "IL", "LM", "ALW", "AW", "SF") )
# add a small constant to characters witch are invariant within taxa
centaurea$data[ centaurea$Taxon == "hybr", "LM" ][1] =
             centaurea$data[ centaurea$Taxon == "hybr", "LM" ][1] + 0.000001
centaurea$data[ centaurea$Taxon == "ph", "IV" ][1] =
             centaurea$data[ centaurea$Taxon == "ph", "IV" ][1] + 0.000001
centaurea$data[ centaurea$Taxon == "st", "LBS"][1] =
             centaurea$data[ centaurea$Taxon == "st", "LBS"][1] + 0.000001


trainingSet = removePopulation(centaurea, populationName = "LES")
LES = keepPopulation(centaurea, populationName = "LES")


# classification by linear discriminant function
classifSample.lda(LES, trainingSet)

# classification by quadratic discriminant function
classifSample.qda(LES, trainingSet)

# classification by nonparametric k-nearest neighbour method
# use knn.select to find the optimal K.
knn.select(trainingSet)
classifSample.knn(LES, trainingSet, k = 12)

MorphoTools2 documentation built on March 7, 2023, 6:18 p.m.