A sort of correlation matrix useful to detect (hierarchical) relationships between the levels of factor variables.

Input matrix or data frame containing the variables |

Output is a sort of correlation matrix.

Here we refer to ni as the number of present levels of variable i (the number of unique elements) and we refer to nij as the number of present levels obtained by crossing variable i and variable j (the number unique rows of x[,c(i,j)]).

The diagonal elements of the output matrix contains the number of present levels of each variable (=ni).

The absolute values of off-diagonal elements:

when nij = ni*nj |

when nij = max(ni,nj) |

Computed as (ni*nj-nij)/(ni*nj-max(ni,nj)) |

So 0 means that all possible level combinations exist in the data and 1 means that the two variables are hierarchically related.

The sign of off-diagonal elements:

when ni<nj |

when ni>nj |

In cases where ni=nj elements will be positive above the diagonal and negative below.

