A number of interesting packages are available to perform Correspondence Analysis in R. At the best of my knowledge, however, they lack some tools to help users to eyeball some critical CA aspects (e.g., contribution of rows/cols categories to the principal axes, quality of the display,correlation of rows/cols categories with dimensions, etc). Besides providing those facilities, this package allows calculating the significance of the CA dimensions by means of the 'Average Rule', the Malinvaud test, and by permutation test. Further, it allows to also calculate the permuted significance of the CA total inertia.
The package comes with a dataset (greenacre_data) after Greenacre 2007 (p. 90, exhibit 12.1).
to load the sample dataset:
to display a bar plot of the strength of the correlation between rows and columns of the input contingency table:
to calculate the significance of the CA total inertia via permutation test; a density curve of the permuted total inertia is displayed along with the observed total inertia and the 95th percentile of the permuted total inertia. The latter can be regarded as a 0.05 alpha threshold for the observed total inertia's significance:
to return a chart suggesting which CA dimension is important for data structure interpretation, according to the so-called 'average rule':
to perform the Malinvaud test and to print on screen the test's result (among which the significance of the CA dimensions); a plot is also provided, wherein a reference line (in RED) indicates the 0.05 threshold:
to calculate the significance of the 1 and 2 CA dimensions via permutation test, and to display the results as a scatterplot; reference lines provide information about the significance of the selected dimensions:
to display the contribution of the row categories to the 1 CA dimension; a reference line indicates the threshold above which a contribution can be considered important for the determination of the dimension. The parameter 'T' specifies that the categories' contribution to the total inertia is also shown (hollow circle):
to display a scatterplot for the row categories contribution to dimension 1&2:
to chart the quality of row categories display on the sub-space determined by, say, the 1&2 CA dimensions:
to display the correlation of the row categories with the 1 CA dimension:
to display a scatterplot for row categories correlation with dimension 1&2:
The column equivalent of the last five functions:
cols.cntr(greenacre_data,1,cti=TRUE,sort=TRUE) cols.cntr.scatter(greenacre_data,1,2) cols.qlt(greenacre_data,3) cols.corr(greenacre_data,1) cols.corr.scatter(greenacre_data,1,2)
New in version 0.5: ca.scater(): allows to get different types of CA scatterplots. It is just a wrapper for functions from the 'ca' and 'FactoMineR' packages.
ca.plus(): allows to plot Correspondence Analysis scatterplots modified to help interpreting the analysis' results. In particular, the function aims at making easier to understand in the same visual context (a) which (say, column) categories are actually contributing to the definition of given pairs of dimensions, and (b) to eyeball which (say, row) categories are more correlated to which dimension.
sig.dim.perm.scree(): allows to test the significance of the CA dimensions by means of permutation of the input contingency table. The number of permutations used is entered by the user. The function return a scree plot displaying for each dimension the observed eigenvalue and the 95th percentile of the permuted distribution of the corresponding eigenvalue. Observed eigenvalues that are larger than the corresponding 95th percentile are significant at alpha 0.05.
New in version 0.6: 'ggplot2' and 'ggrepel' are used to produce the charts returned by the functions: cols.cntr.scatter(), rows.cntr.scatter(), cols.corr.scatter(), rows.corr.scatter(). The two packages have been preferred over R base plotting facitily for their ability to plot non overlapping point labels. This will allow complex charts to have no-to-less cluttered labels.
New in version 0.7: 'ca.percept' has been added to the package. The 'brand_coffee' dataset has been also included. The dataset is after Kennedy et al, Practical Applications of Correspondence Analysis to Categorical Data in Market Research, in Journal of Targeting Measurement and Analysis for Marketing, 1996. Minor corrections have been done to the help documentation of a handfull of commands.
New in version 0.8: the facility has been added to the rows.cntr() and cols.contr() functions to sort the categories in descending order of contribution to the inertia of the selected dimension. Minor corrections have been done to the help documentation of a handfull of commands.
New in version 0.9: the facility has been added to the sig.dim.perm.scree() function to display p values directly into the chart.
New in version 0.10: the facility has been added to the rows.corr(), cols.corr(), rows.qlt(), and cols.qlt() functions to sort the categories in descending order of correlation to the selected dimension and of quality of the representation on the subspace defined by the selected pair of dimensions. Minor corrections have been done to the help documentation of a handfull of commands.
To install the package in R, just follow the few steps listed below:
1) install the 'devtools' package:
2) load that package:
3) download the 'CAinterprTools' package from GitHub via the 'devtools''s command:
4) load the package:
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.