cabootcrs: Calculate category point variances using bootstrapping
In cabootcrs: Bootstrap Confidence Regions for Simple and Multiple Correspondence Analysis

cabootcrs

R Documentation

Calculate category point variances using bootstrapping

Description

cabootcrs performs simple or multiple correspondence analysis and uses bootstrap resampling to construct confidence ellipses for each appropriate category point, printing and plotting the results; for help on the package see cabootcrs-package.

Usage

cabootcrs(
  xobject = NULL,
  datafile = NULL,
  datasetname = NULL,
  nboots = 999,
  resampledistn = "Poisson",
  multinomialtype = "whole",
  printdims = 4,
  lastaxis = 4,
  maxrearrange = 6,
  rearrangemethod = "lpassign",
  usebootcrits = TRUE,
  groupings = NULL,
  grouplabels = NULL,
  varnames = NULL,
  plotsymbolscolours = c(19, "inferno", 18, "inferno"),
  othersmonochrome = "grey",
  crpercent = 95,
  catype = "sca",
  scainput = "CT",
  mcainput = "nbyp",
  mcatype = "Burt",
  mcavariant = "mca",
  mcasupplementary = "offdiag",
  mcaadjustinertias = TRUE,
  mcauseadjustinertiasum = FALSE,
  mcaadjustcoords = TRUE,
  mcaadjustmassctr = FALSE,
  mcaoneploteach = TRUE,
  mcashowindividuals = FALSE,
  mcavariablecolours = FALSE,
  mcacategorycolours = FALSE,
  Jk = NULL,
  varandcat = TRUE,
  likertarrows = FALSE,
  mcastoreindicator = TRUE,
  mcaindividualboot = FALSE,
  mcalikertnoise = 0.1,
  poissonzeronewmean = 0,
  newzeroreset = 0,
  bootstdcoords = FALSE,
  reflectonly = FALSE,
  showresults = TRUE,
  eps = 1e-15
)

Arguments

`xobject`	Name of data object (data frame or similar class that can be coerced to data frame). For simple CA (SCA) the default is contingency table format, recommended that rows >= columns. For multiple CA (MCA) the default is an n individuals by p variables matrix of category values (numbers or text).
`datafile`	Name of a text file (in " ") containing the data, same defaults as xobject, ignored if xobject is non-null
`datasetname`	A string to use as the name of the data set in the plots, defaults to name of xobject or datafile
`nboots`	Number of boostrap replicate matrices used, default and recommended minimum is 999, but 9999 is recommended if machine and data set size allows; the calculated variances will sometimes differ around the third decimal place, but the pictures should look the same. If nboots=0 then correspondence analysis is performed as usual with no variances calculated
`resampledistn`	Poisson resampling is the default for SCA, nonparametric is the default for MCA (Poisson and multinomial will both default to nonparametric) "Poisson" resampled matrices constructed using Poisson resampling on each cell separately (only SCA) "multinomial" resampled matrices constructed using multinomial resampling, treating the cells as defining one or more multinomial distributions (only SCA) "nonparametric" non-parametric resampling of the rows of the n individuals by p variables matrix, equivalent to multinomial (only MCA) "balanced" multinomial resampling balanced so that each data point occurs equally often over the resamples (only MCA) "myresample" resampling algorithm is contained in a file called myresample.R
`multinomialtype`	Only relevant for multinomial sampling in SCA, otherwise ignored: "whole" all cells define a single multinomial distribution "rowsfixed" row sums fixed, each row defines a separate multinomial distribution "columnsfixed" column sums fixed, each column defines a separate multinomial distribution
`printdims`	Print full correspondence analysis coordinates, contributions, correlations etc for all output dimensions up to and including this one
`lastaxis`	Calculate variances and covariances for all output axes (dimensions) up to this one (or the number of dimensions in the solution if smaller). Recommended maximum is maxrearrange-1 as variances for those above this axis may be inaccurate
`maxrearrange`	The maximum number of axes to consider when rearranging
`rearrangemethod`	The method used to rearrange the axes: "lpassign" The Hungarian algorithm in the lpSolve package anything else The embarrassingly slow direct comparison method used in version 1.0 - don't use it Option is only included in case something weird goes wrong with lpSolve.
`usebootcrits`	To be passed to the plot routine, see `plotca` for details
`groupings`	To be passed to the plot routine, see `plotca` for details
`grouplabels`	To be passed to the plot routine, see `plotca` for details
`varnames`	Character p-vector naming the variables, defaults to c("Rows","Columns") in sca
`plotsymbolscolours`	To be passed to the plot routine, see `plotca` for details
`othersmonochrome`	To be passed to the plot routine, see `plotca` for details
`crpercent`	To be passed to the plot routine, see `plotca` for details
`catype`	Type of correspondence analysis: "sca" Simple (classical) correspondence analysis of a contingency table "mca" Multiple correspondence analysis of a Burt, indicator or doubled matrix
`scainput`	Format of input data, only applies for SCA: "CT" Contingency table of counts, preferably with rows >= columns "nbyp" An n individuals/objects/data points by p=2 categorical variables matrix, where each row is a different data point and each column contains the category for that data point on that variable, where these categories can be numbers, strings or factors "nbypcounts" Similar to the above, but each row represents all of the data points taking the same combination of categories, and the first column contains the count for this combination (hence the name used here is a bit of a misnomer, but it emphasises the similarities to an n by p=2)
`mcainput`	Format of input data, only applies for MCA: "nbyp" An n individuals/objects/data points by p categorical variables matrix, where each row is a different data point and each column contains the category for that data point on that variable, where these categories can be numbers, strings or factors "nbypcounts" Similar to the above, but each row represents all of the data points taking the same combination of categories, and the now first column contains the count for this combination, so that the input matrix is n by p+1 (hence the name used here is a bit of a misnomer, but it emphasises the similarities to an n by p) "indicator" An n by sum-of-distinct-variable-categories (i.e. sum of elements of Jk) indicator matrix
`mcatype`	Format of data matrix analysed, only applies for MCA: "Burt" Analyse the Burt matrix. Output will be given for the column (variable category) points but not for the (identical) row points. NOTE: it is highly recommended that this version is used "indicator" Analyse the indicator matrix. Output will be given for the column (variable category) points but not usually for the row (individual) points. NOTE: the bootstrap method used in this case is highly experimental and very slow, see Details section part (2) "doubled" Analyse the doubled matrix. Output will be given for the column points (two points, high and low, for each variable) but not usually for the row (individual) points. NOTE: the bootstrap method used in this case is highly experimental and very slow, see Details section part (2)
`mcavariant`	Currently must be "mca", placeholder for future updates
`mcasupplementary`	How the sample points are projected as supplementary points onto the bootstrap axes when calculating the variances in MCA of a Burt matrix, see Details section for full explanation "offdiag" Only the off-diagonal parts of the Burt matrix are used in the projection "all" Projection is calculated in the usual way If "offdiag" then when p=2 the variances will be very similar to those from SCA. If "all" then the fact that the diagonal elements of the Burt matrix are by definition the same for both the sample and bootstrap matrices means that the projected differences between sample and population points, and hence the variances, will be artificially small. NOTE: if mcaadjustinertias is FALSE then mcasupplementary will be set to "all", because adjusting the coordinates in the calculation of the variances but not adjusting the inertias makes no sense
`mcaadjustinertias`	Whether to adjust inertias to allow for the meaningless inertia terms induced by the diagonal of the Burt matrix in MCA: TRUE Analysing Burt matrix: subtract 1/p from all singular values, then positive singular values are multiplied by p/(p-1) while negative ones are ignored Analysing indicator matrix: same applies but to the squared singular values FALSE No adjustment If TRUE then when p=2 the inertias will agree with those from SCA. NOTE: inertias are the square (Burt) or fourth power (indicator) of the (adjusted) singular values. NOTE: if mcaadjustinertias is FALSE then mcaadjustcoords will also be set to FALSE and mcasupplementary set to "all", as adjusting the coordinates but not the inertias makes no sense
`mcauseadjustinertiasum`	How to define the total inertia in MCA, whether to just use the sum of the adjusted inertias: TRUE The inertias are expressed as a percentage of the sum of the adjusted inertias (Benzecri) FALSE The inertias are expressed as a percentage of the average of off-diagonal inertias (Greenacre), note that this will be incorrect if Jk includes categories that are not observed in the data If TRUE then when p=2 the inertias will agree with those from SCA If TRUE then the percentage inertias will sum to 100% If FALSE then the percentage inertias will usually sum to less than 100%
`mcaadjustcoords`	Whether to adjust the principal coordinates in MCA using the adjusted inertias above, as in Greenacre and Blasius, p68: TRUE Adjust coordinates, but only for column points FALSE No adjustment If TRUE then when p=2 the coordinates will agree with those from SCA. NOTE: if mcaadjustinertias is FALSE then mcaadjustcoords will also be set to FALSE, as adjusting the coordinates but not the inertias makes no sense
`mcaadjustmassctr`	Whether to adjust the point masses and column contributions in MCA so that the masses and contributions are with respect to each variable (as in SCA) rather than with respect to all variables together: TRUE Multiply point masses and contributions by p so that they sum to p over all variables FALSE No adjustment If TRUE then when p=2 the CTR will agree with those from SCA, though when p>2 the contributions can be >1
`mcaoneploteach`	Parameter passed to `plotca` for MCA. A flag saying whether to produce one plot for each variable, where confidence ellipses are shown for that variable but not others: TRUE p plots are produced, each showing confidence regions for the category points for just one variable FALSE only one plot produced, with confidence regions shown for each category point of each variable, which could be very "busy"
`mcashowindividuals`	Parameter passed to `plotca` for MCA. For MCA on an indicator matrix only, a flag saying whether to plot the individuals: TRUE plot the individuals, which could be very "busy". NOTE: if mcaindividualboot=TRUE this also plots the CRs constructed using the experimental method, see Details section part (2) FALSE don't plot the individuals
`mcavariablecolours`	Parameter passed to `plotca` for MCA: TRUE In MCA each variable has its own colour, and all category points and ellipses for that variable have the same colour FALSE Colours chosen in the default way
`mcacategorycolours`	Parameter passed to `plotca` for MCA: TRUE In MCA each category number has its own colour, and all points and ellipses for that category number have the same colour, for all variables (intended for Likert type data so that all category 1 points are the same colour etc) FALSE Colours chosen in the default way
`Jk`	The number of classes for each variable in MCA, as a list or vector, which only needs specifying when inputting an indicator matrix, as in other cases it can be derived from the input matrix
`varandcat`	Flag for how to construct variable category names: TRUE names are varname:catname, to be used if many variables have the same categories, e.g. Likert FALSE names are just category names, to be used if variables all have distinct categories
`likertarrows`	Parameter passed to `plotca` for MCA. If TRUE then, for likert-type ordered categorical data, draw arrows connecting the category points for each variable, with the arrow drawn from a category point to the next higher category point
`mcastoreindicator`	If TRUE then store the indicator matrix created for MCA
`mcaindividualboot`	If TRUE then use the experimental method to bootstrap an indicator or doubled matrix, see Details section part (2) for full explanation
`mcalikertnoise`	The "noise" value to use in the experimental method (above) to bootstrap an indicator or doubled matrix, see Details section part (2) for full explanation
`poissonzeronewmean`	Experimental method for SCA to deal with contingency tables where zero cells could have been non-zero, i.e. they are not structural zeros. Only relevant for Poisson sampling in SCA, otherwise ignored: 0 : no effect, method as described in paper 1 : cells which are zero in the data are instead resampled from a Poisson distribution with this mean, which should be very small (say 0.1); this is for situations where rare cases did not occur but could have done, so that it might be appropriate for zero cells in the sample to be occasionally non-zero in resamples
`newzeroreset`	Experimental method for SCA to deal with sparse contingency tables. Only relevant for SCA, otherwise ignored: 0 : no effect, method as described in paper 1 : if a cell value is non-zero in the sample but zero in the resample then it is reset to 1 in the resample, so that the sparsity structure of the sample is maintained in the resample. This can be useful with sparse data sets and, in effect, conditions on the sample sparsity structure
`bootstdcoords`	If TRUE then produce bootstrap variances for points in standard coordinates instead of principal coordinates Note: intended only for experiments with the methodology
`reflectonly`	If TRUE then just allow for axis reflections and not axis reorderings Note: intended only for experiments with the methodology
`showresults`	If TRUE then output the results using `summaryca` and `plotca`, otherwise output suppressed
`eps`	Any value less than this is treated as zero in some calculations and comparisons

Details

This routine performs all of the usual Correspondence Analysis calculations while also using bootstrapping to estimate the variance of the difference between the sample and population point when both are projected onto the sample axes in principal coordinates. This is done for each row and column category on each dimension of the solution, allowing for sampling variation in both the points and the axes.

It hence constructs confidence ellipses for each category point, plots the results by a call to plotca and prints the usual Correspondence Analysis summary output and the calculated standard deviations through a call to summaryca. Use printca for more detailed numerical results.

For further examples and help on the package as a whole see cabootcrs-package.

(1) Corrections for Burt diagonal

It is well-known that in multiple CA (MCA) the results are distorted by the diagonal elements of the Burt matrix. As well as the standard methods to correct for this, here we propose and implement a new method to correct for this when bootstrapping. If bootstrapping is applied in a naive way then, even when the standard corrections are used, the estimated variances will be much too small because diagonal elements of the standardised Burt matrix are the same in every bootstrap replicate, thus underestimating the true variation in the data.

All bootstrapping is performed on the indicator matrix (or equivalently the n by p matrix) and the resampled Burt matrix is then constructed from the resampled indicator matrix in the usual way.

Included here are the usual corrections to the inertias (mcaadjustinertias=TRUE, the default) and the coordinates (mcaadjustcoords=TRUE, the default). In addition you can choose to use, as the total inertia, either the sum of these adjusted inertias (mcauseadjustinertiasum=TRUE) as proposed by Benzecri or the average of the off-diagonal inertias (mcauseadjustinertiasum=FALSE, the default) as proposed by Greenacre. You can adjust (multiply by p) the Contribution figures in MCA so that they sum to p over all variables, i.e. an average of 1 for each variable as in SCA (mcaadjustmassctr=TRUE), rather than a total of 1 over all variables, as usually in MCA (mcaadjustmassctr=FALSE, the default). Note that when p=2 this will be the same as in SCA, but when p>3 you can get contributions greater than 1, so use with caution. This also adjusts (multiplies by p) the point masses so that they sum to 1 for each variable, rather than over all variables, same caveat applies.

The fundamental problem with MCA is that in a Burt matrix each diagonal element is the value of a variable category cross-classified with itself, so it is always equal to the number of times that category appears. Hence if a category appears k times then the row (or column) in the Burt matrix consists of p blocks each of which sum to k, so its row (and column) sum is kp.

Therefore when the row (or column) profile matrix is calculated the diagonal elements of the Burt matrix are always all 1/p while the elements for the categories in the offdiagonal blocks sum to 1/p in each block. Hence when the projected difference between the sample and resample row (or column) profile matrices is calculated this is artificially small because the diagonals of the two matrices are always the same, no matter how different the off-diagonal elements are.

The new method to correct for the diagonal elements of the Burt matrix when calculating variances therefore works by re-expressing the coordinates purely in terms of the interesting and variable off-diagonal elements, excluding the uninteresting and constant diagonal elements.

First calculate the Burt principal coordinates (PC), but with the diagonal elements of the profile matrix ignored (or set to zero). It is easy to verify that the usual Burt principal coordinates can be re-expressed, using the singular values (SV), as

Burt PC = ( Burt SV / (Burt SV - (1/p)) ) x Burt PC without diagonal element

Hence in the bootstrapping the sample and resample points are re-expressed in this way and their differences when projected onto the bootstrap axes are calculated as usual.

Similarly the adjusted principal coordinates are calculated as

adj Burt PC = (p/(p-1)) x ( (Burt SV - (1/p)) / Burt SV ) x Burt PC

So, when using the usual adjusted coordinates (and adjusted inertias), both of the above will be used, hence ending up with just a correction of (p/(p-1)). The unadjusted coordinates can be used, but this is not recommended.

One consequence of this correction is that when mcaadjustinertias=TRUE, mcaadjustcoords=TRUE, mcauseadjustinertiasum=TRUE, mcaadjustmassctr=TRUE and mcasupplementary="offdiag" then when p=2 the bootstrap variances for MCA are almost the same as those for SCA, while all other results for MCA are the same as those for SCA. The package author regards this as a good thing, making MCA more of a proper generalisation of MCA, but recognises that some people regard MCA as a fundamentally different method to SCA, linked only by the common algebra.

Note that if this adjustment is not made then in the p=2 case it is easy to see that the projected differences are half those in the SCA case and the standard deviations are a quarter the size.

The new method will be written up for publication once this update to the package is finished.

(2) Experimental method for bootstrapping indicator matrices

A highly experimental method is included for bootstrapping with an indicator or doubled matrix in the case of ordered categorical (e.g. Likert scale) data. This has not been studied or optimised extensively, and is currently very slow, so use is very much at the user's discretion and at user's risk. To try it, choose:

mcatype="indicator" or mcatype="doubled" with nboots>0

If CRs are required for the individual points then also choose:

mcaindividualboot=TRUE

The bootstrap methodology used here relies on the comparison of bootstrap to sample points when both are projected onto bootstrap axes. In SCA this is fine because when looking at column points the axes are given by the rows and vice versa, and row i and column j each represent the same category in all bootstrap replicates. Similarly in MCA with a Burt matrix when looking at column points the axes are given by the rows, which again each represent the same category in all bootstrap replicates.

However, with an indicator or doubled matrix, row i of the matrix does not represent the same individual in each replicate, it just represents the i-th individual drawn in the resampling for that particular replicate. In order for this type of bootstrapping to work with an indicator or doubled matrix we would need a resampling method whereby the i-th row represents the i-th individual in all bootstrap replicates. The resampled row would need to represent the answers of the same individual "on another day". This might make sense with questionnaire data where an individual's answers have uncertainty attached, in that if asked the same questions on multiple occasions they would give different answers due to random variation rather than temporal change. If a believable model for the sampling, and hence the resampling, could be derived (CUB models perhaps) then this could lead to CRs for an individual point, representing the uncertainty in what that person actually thinks.

The method here uses the same idea of treating the sample row as representing an individual, and bootstrap replicates of that row as representing the variability in how that individual might have answered the questionnaire (or similar). However, instead of an explicit model to represent this variability, it is generated by the data themselves. The bootstrap replicate indicator (or doubled) matrix is generated as usual, by either non-parametric (multinomial) or balanced resampling, and then its rows are "matched" to the rows of the sample indicator (or doubled) matrix. The rows of the resampled matrix are reordered so that the rows of the resampled matrix are, overall, as similar as possible to the same rows of the sample matrix. Hence the resampled rows can reasonably be viewed as representing the same individual in each bootstrap replicate, and hence variances for the column points (categories) and row points (individuals) can be produced.

The matching again uses the Hungarian algorithm from lpSolve. This only makes sense if all variables are ordered categorical. Applying this to (ordered) categorical data results in a very large number of ties, so the mcalikertnoise parameter defines the standard deviation of white noise added to each of the sample and resample category numbers for the purpose of the matching (only).

Note that this is very slow for all but small data sets, and if all of the CRs for individuals are shown then the plot is impossibly busy. Hence it is recommended that this experimental method only be used for fairly small data sets where CRs are wanted for only a few example individuals. A better approach might be to average the CRs of all of the individuals who give the same results in the sample, but this is not implemented yet.

Note that supplementary row points in indicator matrix MCA are usually regarded as different people answering the same questions, whereas in this case for CRs to make any sense we need to regard them as the same people answering the same questions on different days.

(3) Critical values

Bootstrap critical values are calculated by re-using the bootstrap replicates used to calculate the variances, with a critical value calculated for each ellipse. The projected differences between bootstrap and sample points are ordered and the appropriate percentile value picked. These are usually slightly larger than the χ^2 critical values. Only 90%, 95% (default) and 99% critical values are calculated.

Alternatively use χ^2 critical values, usually with df=2, but with df=1 if only 2 non-zero cells in row/col.

The experimental method to construct ellipses for individuals in MCA always uses bootstrap critical values

Value

An object of class cabootcrsresults

Examples

# Simple CA (SCA) of a 5 by 4 contingency table, using all SCA defaults:
# 999 bootstraps, Poisson resampling, variances for up to first four axes,
# usual output for up to the first 4 axes,
# one biplot with CRs for rows in principal coordinates and another with
# CRs for columns in principal coordinates

bd <- cabootcrs(DreamData)

## Not run: 

# Same data set with a completely random three-category third variable added,
# analysed with MCA but with standardisations which mimic SCA as much as possible

bd3 <- cabootcrs(DreamData223by3, catype="mca")

Explicitly stating what the rows and columns represent, often needed for a contingency table

bd <- cabootcrs(DreamData, datasetname="Maxwell's dream data",
                varnames=c("What the rows are","What the columns are"))

# Multiple CA (MCA) of 3 categorical variables with all defaults:
# non-parametric resampling, Burt matrix analysed,
# each variable has one plot with it in colour with CRs shown, other variables in monochrome.
# Same data set but now as 223 by 3 matrix, with random 3rd column (with 3 categories) added.

bd3 <- cabootcrs(DreamData223by3, catype="mca")

# Comparison of SCA to MCA with p=2, by converting contingency table to 223 by 2 matrix.
# Note that the coordinates and inertias etc are the same while the standard deviations
# and hence the ellipses are very similar but not identical.

bd <- cabootcrs(DreamData)
DreamData223by2 <- convert(DreamData,input="CT",output="nbyp")$result
bdmca <- cabootcrs(DreamData223by2, catype="mca", varandcat=FALSE)

# Not adjusting inertias, which means that coordinates will also not be adjusted and
# the bootstrapping will use the Burt diagonal.
# Note how the coordinates are larger but the inertias and ellipses are smaller.

bdmcaunadj <- cabootcrs(DreamData223by2, catype="mca", varandcat=FALSE, mcaadjustinertias=FALSE)

# Applying the standard adjustments to inertias and coordinates, but with
# the bootstrapping still using the Burt diagonal.
# Note how inertias and coordinates are now the same as SCA, but ellipses are smaller.

bdmcaadjbutall <- cabootcrs(DreamData223by2, catype="mca", varandcat=FALSE, mcasupplementary="all")


# Effect of sample size in SCA:

bdx4 <- cabootcrs(4*DreamData)
bdx9 <- cabootcrs(9*DreamData)

ba <- cabootcrs(AttachmentData)

bs <- cabootcrs(SuicideData)

bas <- cabootcrs(AsbestosData)


# Options for SCA:

# SCA with multinomial resampling, with the matrix treated as a single multinomial distribution

bdm <- cabootcrs(DreamData, resampledistn="multinomial")

# Fix the row sums, i.e. keep sum of age group constant

bdmrf <- cabootcrs(DreamData, resampledistn="multinomial", multinomialtype="rowsfixed")

# Use chi-squared critical values for the CRs

bdchisq <- cabootcrs(DreamData, usebootcrits=FALSE)

# Just perform correspondence analysis, without bootstrapping

bdnb0 <- cabootcrs(DreamData, nboots=0)


# Effect of sample size in MCA:

bn <- cabootcrs(NishData, catype="mca")


# Options for MCA

# Using default settings the SCA and MCA standard results are the same when p=2,
# bootstrap standard deviations (multinomial/nonparametric) are similar but not identical

bdsca <- cabootcrs(DreamData,resampledistn="multinomial")
bdmca <- cabootcrs(convert(DreamData,input="CT",output="nbyp")$result, catype="mca")

# Row A can be labelled A rather than R:A
# because the three variables have all different category names

bd3l <- cabootcrs(DreamData223by3, catype="mca", varandcat=FALSE)

# Balanced resampling, each of the 223 rows occurs 999 times in the 999 resamples

bd3b <- cabootcrs(DreamData223by3, catype="mca", resampledistn="balanced")

# Do not adjust inertias, coordinates or contributions
# (if inertias are not adjusted then coordinates are also not adjusted)

bd3unadj <- cabootcrs(DreamData223by3, catype="mca",mcaadjustinertias=FALSE)

## Comparisons to ellipses from FactoMineR

# Generate some completely random uniform categorical data, construct ellipses.
# The cabootcrs ellipses are very large and overlap extensively, as you would expect
# from completely random data.
# The FactoMineR ellipses are much smaller, often with minimal overlaps,
# giving a completely false impression of genuine differences between categories.

library(FactoMineR)
p <- 4
maxcat <- 5
n <- 100
Xnpr <- apply( as.data.frame( matrix( round(runif(n*p,0.5,maxcat+0.5)), n, p)), 2, factor )
fr <- MCA(Xnpr, method="Burt")
plotellipses(fr)
br <- cabootcrs(Xnpr, catype="mca", showresults=FALSE)
plotca(br, mcacategorycolours = TRUE, showcolumnlabels=FALSE)

## Comparisons to results in ca and FactoMineR

Summary: If using unadjusted inertias, coordinates the packages produce identical results,
apart from differences in presentation (rounding off, the naming of rep/cor/cos2).

Summary: When using adjusted inertias and coordinates (not an option in FactoMineR::MCA)
the correlations in ca::mjca no longer sum to 1 over all dimensions, in cabootcrs they do.
Ratios are the same for each dimension, but not each point, they are standardised differently.

# Example comparisons with random data
library(FactoMineR)
library(ca)
p <- 4
maxcat <- 5
n <- 100
Xnpdf <- as.data.frame( matrix( round(runif(n*p,0.5,maxcat+0.5)), n, p))
Xnpr <- apply( Xnpdf, 2, factor )

# Note that ca::mjca only accepts the data as numerical,
# FactoMineR::MCA only acccepts the data as characters
rbun <- cabootcrs(Xnpr, catype="mca", nboots=0, mcaadjustinertias = FALSE)
rcun <-  mjca(Xnpdf,lambda="Burt")
summary(rcun)
rfm <- MCA(Xnpr,method="Burt", graph=FALSE)
summary(rfm)
rb <- cabootcrs(Xnpr, catype="mca", nboots=0)
rc <- mjca(Xnpdf)
summary(rc)
realr <- rb@br@realr
rb@ColREP[,1:realr]
rc$colcor[,1:realr]
apply(rb@ColREP[,1:realr],1,"sum")
apply(rc$colcor[,1:realr],1,"sum")


## End(Not run)

cabootcrs documentation built on March 18, 2022, 7:50 p.m.

cabootcrs index

Package overview README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

cabootcrs
Bootstrap Confidence Regions for Simple and Multiple Correspondence Analysis

cabootcrs: Calculate category point variances using bootstrapping
In cabootcrs: Bootstrap Confidence Regions for Simple and Multiple Correspondence Analysis

Calculate category point variances using bootstrapping

Description

Usage

Arguments

Details

Value

See Also

Examples

Related to cabootcrs in cabootcrs...

R Package Documentation

Browse R Packages

We want your feedback!

cabootcrs Bootstrap Confidence Regions for Simple and Multiple Correspondence Analysis

cabootcrs: Calculate category point variances using bootstrapping In cabootcrs: Bootstrap Confidence Regions for Simple and Multiple Correspondence Analysis

Calculate category point variances using bootstrapping

Description

Usage

Arguments

Details

Value

See Also

Examples

Related to cabootcrs in cabootcrs...

R Package Documentation

Browse R Packages

We want your feedback!

cabootcrs
Bootstrap Confidence Regions for Simple and Multiple Correspondence Analysis

cabootcrs: Calculate category point variances using bootstrapping
In cabootcrs: Bootstrap Confidence Regions for Simple and Multiple Correspondence Analysis