LinearDA: Cross-validated Linear Discriminant Analysis

Description Usage Arguments Details Value Author(s) References Examples

View source: R/LinearDA.R

Description

A simple function to perform cross-validated Linear Discriminant Analysis

Usage

1
2
3
4
LinearDA(Data, classCol, selectedCols, cvType, nTrainFolds,
  ntrainTestFolds, modelTrainFolds, foldSep, CV = FALSE, cvFraction,
  extendedResults = FALSE, SetSeed = TRUE, silent = FALSE,
  NewData = NULL, ...)

Arguments

Data

(dataframe) Data dataframe

classCol

(numeric or string) column number that contains the variable to be predicted

selectedCols

(optional) (numeric or string) all the columns of data that would be used either as predictor or as feature

cvType

(optional) (string) which type of cross-validation scheme to follow; One of the following values:

  • folds = (default) k-fold cross-validation

  • LOSO = Leave-one-subject-out cross-validation

  • holdout = holdout Crossvalidation. Only a portion of data (cvFraction) is used for training.

  • LOTO = Leave-one-trial out cross-validation.

nTrainFolds

= (optional) (parameter for only k-fold cross-validation) No. of folds in which to further divide Training dataset

ntrainTestFolds

= (optional) (parameter for only k-fold cross-validation) No. of folds for training and testing dataset

modelTrainFolds

= (optional) (parameter for only k-fold cross-validation) specific folds from the first train/test split (ntrainTestFolds) to use for training

foldSep

(numeric) (parameter for only Leave-One_subject Out) mandatory column number for Leave-one-subject out cross-validation.

CV

(optional) (logical) perform Cross validation of training dataset? If TRUE, posterior probabilites are present with the model

cvFraction

(optional) (numeric) Fraction of data to keep for training data

extendedResults

(optional) (logical) Return extended results with model and other metrics

SetSeed

(optional) (logical) Whether to setseed or not. use SetSeed to seed the random number generator to get consistent results; set false only for permutation tests

silent

(optional) (logical) whether to print messages or not

NewData

(optional) (dataframe) New Data frame features for which the class membership is requested

...

(optional) additional arguments for the function

Details

The function implements Linear Disciminant Analysis, a simple algorithm for classification based analyses .LDA builds a model composed of a number of discriminant functions based on linear combinations of data features that provide the best discrimination between two or more conditions/classes. The aim of the statistical analysis in LDA is thus to combine the data features scores in a way that a single new composite variable, the discriminant function, is produced (for details see Fisher, 1936; Rao, 1948)).

Value

Depending upon extendedResults. extendedResults = FALSE outputs Test accuracy accTest of discrimination; extendedResults = TRUE outputs Test accuracy accTest of discrimination, ConfusionMatrixResults Overall cross-validated confusion matrix results,ConfMatrix Confusion matrices and fitLDA the fit cross-validated LDA model. If CV = TRUE , Posterior probabilities are generated and stored in the model.

Author(s)

Atesh Koul, C'MON unit, Istituto Italiano di Tecnologia

atesh.koul@gmail.com

References

Fisher, R. A. (1936). The Use of Multiple Measurements in Taxonomic Problems. Annals of Eugenics, 7(2), 179-188.

Rao, C. (1948). The Utilization of Multiple Measurements in Problems of Biological Classification. In Journal of the Royal Statistical Society. Series B (Methodological) (Vol. 10, pp. 159-203).

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
# simple model with holdout data partition of 80% and no extended results 
LDAModel <- LinearDA(Data = KinData, classCol = 1, 
selectedCols = c(1,2,12,22,32,42,52,62,72,82,92,102,112),cvType="holdout")
# Output:
#
# Performing Linear Discriminant Analysis
#
#
# Performing holdout Cross-validation
# 
# cvFraction was not specified,
#  Using default value of 0.8 (80%) fraction for training (cvFraction = 0.8)
# 
# Proportion of Test/Train Data was :  0.2470588 
# Predicted
# Actual  1  2
# 1 51 32
# 2 40 45
# [1] "Test holdout Accuracy is 0.57"
# holdout LDA Analysis: 
# cvFraction : 0.8 
# Test Accuracy 0.57
# *Legend:
# cvFraction = Fraction of data to keep for training data 
# Test Accuracy = mean accuracy from the Testing dataset

# alt uses:
# holdout cross-validation with 80% training data
LDAModel <- LinearDA(Data = KinData, classCol = 1,
selectedCols = c(1,2,12,22,32,42,52,62,72,82,92,102,112),
CV=FALSE,cvFraction = 0.8,extendedResults = TRUE,cvType="holdout")

# For a 10 fold cross-validation without outputting messages 
LDAModel <-  LinearDA(Data = KinData, classCol = 1,
selectedCols = c(1,2,12,22,32,42,52,62,72,82,92,102,112),
extendedResults = FALSE,cvType = "folds",nTrainFolds=10,silent = TRUE)

PredPsych documentation built on July 23, 2019, 9:04 a.m.