vtest: Description of a Classification with Test Values

Description Usage Arguments Details Value References See Also Examples

Description

Facilitate the description (Morineau, 1984) of the classes of a partition (e.g., after an automatic classification). Test values are calculated for each continuous variable or category of a qualitative variable. They are measurements of the distance between the within-class value and the overall value.

Usage

1
2
3
4
5
6
  vtest(formula = NULL, data)
  ## S3 method for class 'vtest'
plot(x, group = TRUE, conf = NULL,
  main = if(group) "test-values by groups" else "test values by variables",
  xlab = "test value", bg = "black", ...)
  

Arguments

formula

A 2-sided formula with all numerical or categorical variable(s) on the left-hand side and a single variable (factor) indicating group membership on the right-hand side. When several variables are included, they must be handled with cbind (e.g., cbind(y1, y2, y3)).

data

A data frame containing all the variables in the left-hand side of formula.

x

An object of class “vtest”.

group

A logical indicating if test values should be ordered by groups (TRUE) or by variables (FALSE). Default to TRUE.

conf

Confidence level for plotting confidence region, in percents. Default to no plotting (NULL).

main

Title of the plot. Default set up according to group.

xlab

Label for x axis. Default set up according to group.

bg

Background color for the points. Default to “black”.

...

Other graphical parameters to be passed to the plotting function.

Details

For a continuous variable X, test values compare mean(Xk), the mean of X in the group k with the overall mean mean(X) accounting for the within-group variance Sk(X)^2. The test value for a variable X and a group k is:

Tk(X) = (mean(Xk) - mean(X))/Sk(X)

with Sk(X)^2 = [(n - Nk)/(n - 1)] * (S(X)^2/Nk). n is the total number of observations and Nk is the number of observations in the group k.

Under the null hypothesis that mean(Xk) has expectation equal to mean(X) the test value Tk(X) is asymptotically distributed like N(0, 1). The table pval contains the P-values of the tests.

For a qualitative variable, test values compare the proportion of the population with the category j in a group k with the proportion of the population with the category j in the whole population (n). A normal approximation is used for the hypergeometric distribution of these counts.

The number N of observations with the category j in the group k is estimated by Nkj.

The test-value is:

Tk(N) = (N - E(N)) * Sk(N)

with the expectation value of N:

E(N) = Nk * Nj / n

and the variance of N :

Sk(N)^2 = Nk * (n - Nk)/(n - 1) * Nj / n * (1 - Nj / n)

P-values for the test values only make sense if the set of variables in the left-hand side of the formula were not used to build the partition. When this situation is met, test values may only be used as similarity indices between variables and groups (Lebart and al., 1995).

When several test values are computed, P-values should be adjusted for multiple comparisons.

Value

An object of class “vtest”, with 3 components:

CALL

The call which produced the result.

vtest

A data frame with the test values.

pval

A data frame with P-values of corresponding test values.

References

Morineau, A., 1984. Note sur la caractérisation statistique d'une classe et les valeurs tests. Bulletin Technique du Centre de Statistique et d'Informatique Appliqués 1, 9:12.
Lebart, L., Morineau, A., Piron M., 1995. Statistique exploratoire multidimensionnelle. Dunod. 439 p.

See Also

aggstat

Examples

1
2
3
4
5
6
7
8
9
 
  f <- vtest(cbind(Sepal.Width, Petal.Length, Petal.Width) ~ Species,
             data = iris)
  plot(f)
  # with 95 per cent confidence region
  plot(f, conf = 95)
  # display test values ordered by variables
  plot(f, group = FALSE)
  

tdisplay documentation built on May 2, 2019, 4:46 p.m.