disqual: Discriminant Analysis on Qualitative Variables

Description Usage Arguments Details Value Author(s) References See Also Examples

View source: R/disqual.R

Description

Implementation of the DISQUAL methodology. Disqual performs a Fishers Discriminant Analysis on components from a Multiple Correspondence Analysis

Usage

1
2
  disqual(variables, group, validation = NULL,
    learn = NULL, test = NULL, autosel = TRUE, prob = 0.05)

Arguments

variables

data frame with qualitative explanatory variables (coded as factors)

group

vector or factor with group memberships

validation

type of validation, either "crossval" or "learntest". Default NULL

learn

optional vector of indices for a learn-set. Only used when validation="learntest". Default NULL

test

optional vector of indices for a test-set. Only used when validation="learntest". Default NULL

autosel

logical indicating automatic selection of MCA components

prob

probability level for automatic selection of MCA components. Default prob = 0.05

Details

When validation=NULL there is no validation
When validation="crossval" cross-validation is performed by randomly separating the observations in ten groups.
When validation="learntest" validationi is performed by providing a learn-set and a test-set of observations.

Value

An object of class "disqual", basically a list with the following elements:

raw_coefs

raw coefficients of discriminant functions

norm_coefs

normalizaed coefficients of discriminant functions, ranging from 0 - 1000

confusion

confusion matrix

scores

discriminant scores for each observation

classification

assigned class

error_rate

misclassification error rate

Author(s)

Gaston Sanchez

References

Lebart L., Piron M., Morineau A. (2006) Statistique Exploratoire Multidimensionnelle. Dunod, Paris.

Saporta G. (2006) Probabilites, analyse des donnees et statistique. Editions Technip, Paris.

Saporta G., Niang N. (2006) Correspondence Analysis and Classification. In Multiple Correspondence Analysis and Related Methods, Eds. Michael Greenacre and Jorg Blasius, 371-392. Chapman and Hall/CRC

See Also

easyMCA, classify, binarize

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
## Not run: 
  # load insurance dataset
  data(insurance)

  # disqual analysis with no validation
  my_disq1 = disqual(insurance[,-1], insurance[,1], validation=NULL)
  my_disq1

  # disqual analysis with cross-validation
  my_disq2 = disqual(insurance[,-1], insurance[,1], validation="crossval")
  my_disq2
  
## End(Not run)

Example output

Discriminant Analysis on Qualitative Variables
----------------------------------------------
$raw_coefs        raw coeffcients
$norm_coefs       normalized coefficients
$confusion        confusion matrix
$scores           scores
$classification   assigned class
$error_rate       error rate
----------------------------------------------

$raw_coefs
               bad       good    
private        -0.05135   0.05080
professional    0.25565  -0.25289
companies       0.12490  -0.12355
female          0.00364  -0.00360
male           -0.01226   0.01213
flemish        -0.15579   0.15411
french          0.05332  -0.05274
BD_1890_1949   -0.01594   0.01577
BD_1950_1973    0.64884  -0.64184
BD_unknown     -0.39455   0.39029
Brussels        0.37897  -0.37488
Other_regions  -0.18820   0.18617
BM_minus        0.96468  -0.95427
BM_plus        -0.97874   0.96818
YS<86          -0.12340   0.12207
YS>=86          0.16273  -0.16097
HP<=39         -0.34695   0.34320
HP>=40          0.08469  -0.08377
YC_33_89       -0.19634   0.19422
YC_90_91        0.57098  -0.56482


$norm_coefs
               bad     good  
private          0.00   47.39
professional    50.44    0.00
companies       22.53    0.00
female           2.61   18.72
male             0.00   21.17
flemish          0.00   32.28
french          34.35    0.00
BD_1890_1949    62.20  102.62
BD_1950_1973   171.42    0.00
BD_unknown       0.00  161.06
Brussels        93.18    0.00
Other_regions    0.00   87.55
BM_minus       319.28    0.00
BM_plus          0.00  299.99
YS<86            0.00   44.17
YS>=86          47.01    0.00
HP<=39           0.00   66.63
HP>=40          70.91    0.00
YC_33_89         0.00  118.44
YC_90_91       126.06    0.00


$confusion
        predicted
original  bad  good
    bad   467    83
    good   77   479


$error_rate
[1] 0.1446655


$scores
            bad      good
[1,]   34.35353  846.3900
[2,]   70.91234  812.0407
[3,]    0.00000  878.6674
[4,]  152.27392  735.5963
[5,]   96.55433  787.9484
[6,]  105.26587  779.7634
...

$classification
[1] good good good good good good
Levels: bad good
...

Discriminant Analysis on Qualitative Variables
----------------------------------------------
$raw_coefs        raw coeffcients
$norm_coefs       normalized coefficients
$confusion        confusion matrix
$scores           scores
$classification   assigned class
$error_rate       error rate
----------------------------------------------

$raw_coefs
               bad       good    
private        -0.05135   0.05080
professional    0.25565  -0.25289
companies       0.12490  -0.12355
female          0.00364  -0.00360
male           -0.01226   0.01213
flemish        -0.15579   0.15411
french          0.05332  -0.05274
BD_1890_1949   -0.01594   0.01577
BD_1950_1973    0.64884  -0.64184
BD_unknown     -0.39455   0.39029
Brussels        0.37897  -0.37488
Other_regions  -0.18820   0.18617
BM_minus        0.96468  -0.95427
BM_plus        -0.97874   0.96818
YS<86          -0.12340   0.12207
YS>=86          0.16273  -0.16097
HP<=39         -0.34695   0.34320
HP>=40          0.08469  -0.08377
YC_33_89       -0.19634   0.19422
YC_90_91        0.57098  -0.56482


$norm_coefs
               bad     good  
private          0.00   47.39
professional    50.44    0.00
companies       22.53    0.00
female           2.61   18.72
male             0.00   21.17
flemish          0.00   32.28
french          34.35    0.00
BD_1890_1949    62.20  102.62
BD_1950_1973   171.42    0.00
BD_unknown       0.00  161.06
Brussels        93.18    0.00
Other_regions    0.00   87.55
BM_minus       319.28    0.00
BM_plus          0.00  299.99
YS<86            0.00   44.17
YS>=86          47.01    0.00
HP<=39           0.00   66.63
HP>=40          70.91    0.00
YC_33_89         0.00  118.44
YC_90_91       126.06    0.00


$confusion
        predicted
original  bad  good
    bad   467    83
    good   77   479


$error_rate
[1] 0.1573237


$scores
            bad      good
[1,]   34.35353  846.3900
[2,]   70.91234  812.0407
[3,]    0.00000  878.6674
[4,]  152.27392  735.5963
[5,]   96.55433  787.9484
[6,]  105.26587  779.7634
...

$classification
[1] good good good good good good
Levels: bad good
...

DiscriMiner documentation built on May 1, 2019, 10:32 p.m.