Robda-methods: Robust Discriminant Analysis of Interval Data

Robda-methodsR Documentation

Robust Discriminant Analysis of Interval Data

Description

Roblda and Robqda perform linear and quadratic discriminant analysis of Interval Data based on robust estimates of location and scatter.

Usage


## S4 method for signature 'IData'
Roblda( x, grouping, prior="proportions", CVtol=1.0e-5, egvtol=1.0e-10,
  subset=1:nrow(x), CovCase=1:4, SelCrit=c("BIC","AIC"), silent=FALSE, 
  CovEstMet=c("Pooled","Globdev"), SngDMet=c("fasttle","fulltle"), k2max=1e6,
  Robcontrol=RobEstControl(), ... )

## S4 method for signature 'IData'
Robqda( x, grouping, prior="proportions", CVtol=1.0e-5, 
  subset=1:nrow(x), CovCase=1:4, SelCrit=c("BIC","AIC"), silent=FALSE,
  SngDMet=c("fasttle","fulltle"), k2max=1e6, Robcontrol=RobEstControl(), ... )

Arguments

x

An object of class IData with the original Interval Data.

grouping

Factor specifying the class for each observation.

prior

The prior probabilities of class membership. If unspecified, the class proportions for the training set are used. If present, the probabilities should be specified in the order of the factor levels.

CVtol

Tolerance level for absolute value of the coefficient of variation of non-constant variables. When a MidPoint or LogRange has an absolute value within-groups coefficient of variation below CVtol, it is considered to be a constant.

egvtol

Tolerance level for the eigenvalues of the product of the inverse within by the between covariance matrices. When a eigenvalue has an absolute value below egvtol, it is considered to be zero.

subset

An index vector specifying the cases to be used in the analysis.

CovCase

Configuration of the variance-covariance matrix: a set of integers between 1 and 4.

SelCrit

The model selection criterion.

silent

A boolean flag indicating wether a warning message should be printed if the method fails.

CovEstMet

Method used to estimate the common covariance matrix in Roblda (Robust linear discriminant analysis). Alternatives are “Pooled” (default) for a pooled average of the the robust within-groups covariance estimates, and “Globdev” for a global estimate based on all deviations from the groups multivariate l_1 medians. See Todorov and Filzmoser (2009) for details.

SngDMet

Algorithm used to find the robust estimates of location and scatter. Alternatives are “fasttle” (default) and “fulltle”.

k2max

Maximal allowed l2-norm condition number for correlation matrices. Correlation matrices with condition number above k2max are considered to be numerically singular, leading to degenerate results.

Robcontrol

A control object (S4) of class RobEstControl-class containing estimation options - same as these provided in the function specification. If the control object is supplied, the parameters from it will be used. If parameters are passed also in the invocation statement, they will override the corresponding elements of the control object.

...

Other named arguments.

References

Duarte Silva, A.P. and Brito, P. (2015), Discriminant analysis of interval data: An assessment of parametric and distance-based approaches. Journal of Classification 39(3), 516–541.

Duarte Silva, A.P., Filzmoser, P. and Brito, P. (2017), Outlier detection in interval data. Advances in Data Analysis and Classification, 1–38.

See Also

lda, qda, snda, IData, RobEstControl,codeConfMat

Examples

# Create an Interval-Data object containing the intervals for 899 observations 
# on the temperatures by quarter in 60 Chinese meteorological stations.

ChinaT <- IData(ChinaTemp[1:8],VarNames=c("T1","T2","T3","T4"))

#Robust Linear Discriminant Analysis

## Not run: 

ChinaT.rlda <- Roblda(ChinaT,ChinaTemp$GeoReg)
cat("Temperatures of China -- robust lda discriminant analysis results:\n")
print(ChinaT.rlda)
cat("Resubstition confusion matrix:\n")
ConfMat(ChinaTemp$GeoReg,predict(ChinaT.rlda,ChinaT)$class)

#Estimate error rates by ten-fold cross-validation with 5 replications 

CVrlda <- DACrossVal(ChinaT,ChinaTemp$GeoReg,TrainAlg=Roblda,CovCase=CovCase(ChinaT.rlda),
   CVrep=5)
summary(CVrlda[,,"Clerr"])

#Robust Quadratic Discriminant Analysis

ChinaT.rqda <- Robqda(ChinaT,ChinaTemp$GeoReg)
cat("Temperatures of China -- robust qda discriminant analysis results:\n")
print(ChinaT.rqda)
cat("Resubstition confusion matrix:\n")
ConfMat(ChinaTemp$GeoReg,predict(ChinaT.rqda,ChinaT)$class)

#Estimate error rates by ten-fold cross-validation with 5 replications 

CVrqda <- DACrossVal(ChinaT,ChinaTemp$GeoReg,TrainAlg=Robqda,CovCase=CovCase(ChinaT.rqda),
   CVrep=5)
summary(CVrqda[,,"Clerr"])


## End(Not run)


MAINT.Data documentation built on April 4, 2023, 9:09 a.m.