R/cabootcrs.R

Defines functions cabootcrs

Documented in cabootcrs

#'
#' Calculate category point variances using bootstrapping
#'
#' \code{cabootcrs} performs simple or multiple correspondence analysis
#' and uses bootstrap resampling to
#' construct confidence ellipses for each appropriate category point,
#' printing and plotting the results;
#' for help on the package see \code{\link{cabootcrs-package}}.
#'
#' This routine performs all of the usual Correspondence Analysis calculations while also using bootstrapping
#' to estimate the variance of the difference between the sample and population point when both are projected
#' onto the sample axes in principal coordinates.
#' This is done for each row and column category on each dimension of the solution,
#' allowing for sampling variation in both the points and the axes.
#'
#' It hence constructs confidence ellipses for each category point, plots the results
#' by a call to \code{\link{plotca}} and prints the usual Correspondence Analysis summary
#' output and the calculated standard deviations through a call to \code{\link{summaryca}}.
#' Use \code{\link{printca}} for more detailed numerical results.
#'
#' For further examples and help on the package as a whole see \code{\link{cabootcrs-package}}.
#'
#' \strong{(1) Corrections for Burt diagonal}
#'
#' It is well-known that in multiple CA (MCA) the results are distorted by the diagonal elements of
#' the Burt matrix. As well as the standard methods to correct for this, here we propose
#' and implement a new method to correct for this when bootstrapping.
#' If bootstrapping is applied in a naive way then, even when the standard corrections are used,
#' the estimated variances will be much too small because diagonal elements of the standardised Burt matrix are
#' the same in every bootstrap replicate, thus underestimating the true variation in the data.
#'
#' All bootstrapping is performed on the indicator matrix (or equivalently the n by p matrix)
#' and the resampled Burt matrix is then constructed from the resampled indicator matrix in the usual way.
#'
#' Included here are the usual corrections to the
#' inertias (mcaadjustinertias=TRUE, the default)
#' and the coordinates (mcaadjustcoords=TRUE, the default).
#' In addition you can choose to use, as the total inertia, either
#' the sum of these adjusted inertias (mcauseadjustinertiasum=TRUE) as proposed by Benzecri or
#' the average of the off-diagonal inertias (mcauseadjustinertiasum=FALSE, the default)
#' as proposed by Greenacre.
#' You can adjust (multiply by p) the Contribution figures in MCA so that they sum to p over all variables,
#' i.e. an average of 1 for each variable as in SCA (mcaadjustmassctr=TRUE),
#' rather than a total of 1 over all variables, as usually in MCA (mcaadjustmassctr=FALSE, the default).
#' Note that when p=2 this will be the same as in SCA, but when p>3 you can get contributions
#' greater than 1, so use with caution.
#' This also adjusts (multiplies by p) the point masses so that they sum to
#' 1 for each variable, rather than over all variables, same caveat applies.
#'
#' The fundamental problem with MCA is that in a Burt matrix each diagonal element is the
#' value of a variable category cross-classified with itself, so it is always equal
#' to the number of times that category appears.
#' Hence if a category appears k times then the row (or column) in the Burt matrix
#' consists of p blocks each of which sum to k, so its row (and column) sum is kp.
#'
#' Therefore when the row (or column) profile matrix
#' is calculated the diagonal elements of the Burt matrix are always all 1/p while
#' the elements for the categories in the offdiagonal blocks sum to 1/p in each block.
#' Hence when the projected difference between the sample and resample row (or column)
#' profile matrices is calculated this is artificially small because the
#' diagonals of the two matrices are always the same, no matter how different
#' the off-diagonal elements are.
#'
#' The new method to correct for the diagonal elements of the Burt matrix when
#' calculating variances therefore works by re-expressing the coordinates
#' purely in terms of the interesting and variable off-diagonal elements,
#' excluding the uninteresting and constant diagonal elements.
#'
#' First calculate the Burt principal coordinates (PC), but with the
#' diagonal elements of the profile matrix ignored (or set to zero).
#' It is easy to verify that the usual Burt principal coordinates can be re-expressed,
#' using the singular values (SV), as
#'
#' Burt PC = ( Burt SV / (Burt SV - (1/p)) ) x Burt PC without diagonal element
#'
#' Hence in the bootstrapping the sample and resample points are re-expressed in this way
#' and their differences when projected onto the bootstrap axes are calculated as usual.
#'
#' Similarly the adjusted principal coordinates are calculated as
#'
#' adj Burt PC = (p/(p-1)) x ( (Burt SV - (1/p)) / Burt SV ) x Burt PC
#'
#' So, when using the usual adjusted coordinates (and adjusted inertias),
#' both of the above will be used, hence ending up with just a correction of (p/(p-1)).
#' The unadjusted coordinates can be used, but this is not recommended.
#'
#' One consequence of this correction is that when mcaadjustinertias=TRUE, mcaadjustcoords=TRUE,
#' mcauseadjustinertiasum=TRUE, mcaadjustmassctr=TRUE and mcasupplementary="offdiag"
#' then when p=2 the bootstrap
#' variances for MCA are almost the same as those for SCA, while all other results
#' for MCA are the same as those for SCA. The package author regards this as a good thing,
#' making MCA more of a proper generalisation of MCA, but recognises that some people
#' regard MCA as a fundamentally different method to SCA, linked only by the common algebra.
#'
#' Note that if this adjustment is not made then in the p=2 case it is easy to see that the
#' projected differences are half those in the SCA case and the standard deviations
#' are a quarter the size.
#'
#' The new method will be written up for publication once this update to the package is finished.
#'
#' \strong{(2) Experimental method for bootstrapping indicator matrices}
#'
#' A highly experimental method is included for bootstrapping with an indicator or doubled matrix
#' in the case of ordered categorical (e.g. Likert scale) data.
#' This has not been studied or optimised extensively, and is currently very slow, so use is very
#' much at the user's discretion and at user's risk. To try it, choose:\cr\cr
#' mcatype="indicator" or mcatype="doubled" with nboots>0\cr\cr
#' If CRs are required for the individual points then also choose:\cr\cr
#' mcaindividualboot=TRUE
#'
#' The bootstrap methodology used here relies on the comparison of bootstrap to sample points when
#' both are projected onto bootstrap axes. In SCA this is fine because when looking at column points
#' the axes are given by the rows and vice versa, and row i and column j each represent the same
#' category in all bootstrap replicates. Similarly in MCA with a Burt matrix when looking at column
#' points the axes are given by the rows, which again each represent the same category in all bootstrap replicates.
#'
#' However, with an indicator or doubled matrix, row i of the matrix does not represent the same
#' individual in each replicate, it just represents the i-th individual drawn in the resampling
#' for that particular replicate.
#' In order for this type of bootstrapping to work with an indicator or doubled matrix we would
#' need a resampling method whereby the i-th row represents the i-th individual in all bootstrap replicates.
#' The resampled row would need to represent the answers of the same individual "on another day".
#' This might make sense with questionnaire data where an individual's answers have uncertainty
#' attached, in that if asked the same questions on multiple occasions they would give different answers
#' due to random variation rather than temporal change. If a believable model for the sampling,
#' and hence the resampling, could be derived (CUB models perhaps) then this could lead to CRs
#' for an individual point, representing the uncertainty in what that person actually thinks.
#'
#' The method here uses the same idea of treating the sample row as
#' representing an individual, and bootstrap replicates of that row as representing the variability
#' in how that individual might have answered the questionnaire (or similar).
#' However, instead of an explicit model to represent this variability, it is generated
#' by the data themselves. The bootstrap replicate indicator (or doubled) matrix is generated as usual,
#' by either non-parametric (multinomial) or balanced resampling, and then its rows are "matched"
#' to the rows of the sample indicator (or doubled) matrix. The rows of the resampled matrix
#' are reordered so that the rows of the resampled matrix are, overall, as similar as possible
#' to the same rows of the sample matrix. Hence the resampled rows can reasonably be viewed
#' as representing the same individual in each bootstrap replicate, and hence variances for
#' the column points (categories) and row points (individuals) can be produced.
#'
#' The matching again uses the Hungarian algorithm from lpSolve. This only makes sense if
#' all variables are ordered categorical. Applying this to (ordered) categorical data results
#' in a very large number of ties, so the mcalikertnoise parameter defines the standard deviation
#' of white noise added to each of the sample and resample category numbers for the purpose
#' of the matching (only).
#'
#' Note that this is very slow for all but small data sets, and if all of the CRs for individuals
#' are shown then the plot is impossibly busy. Hence it is recommended that this experimental
#' method only be used for fairly small data sets where CRs are wanted for only a few example individuals.
#' A better approach might be to average the CRs of all of the individuals who give the
#' same results in the sample, but this is not implemented yet.
#'
#' Note that supplementary row points in indicator matrix MCA are usually regarded as different people answering
#' the same questions, whereas in this case for CRs to make any sense we need to regard them
#' as the same people answering the same questions on different days.
#'
#' \strong{(3) Critical values}
#'
#' Bootstrap critical values are calculated by re-using the bootstrap replicates used to calculate
#' the variances, with a critical value calculated for each ellipse.
#' The projected differences between bootstrap and sample points are ordered and the appropriate
#' percentile value picked. These are usually slightly larger than the \eqn{\chi^2} critical values.
#' Only 90\%, 95\% (default) and 99\% critical values are calculated.
#'
#' Alternatively use \eqn{\chi^2} critical values, usually with df=2, but with df=1 if only 2 non-zero cells in row/col.
#'
#' The experimental method to construct ellipses for individuals in MCA always uses bootstrap critical values
#'
#' @param xobject Name of data object (data frame or similar class that can be coerced to data frame).\cr
#' For simple CA (SCA) the default is contingency table format, recommended that rows >= columns. \cr
#' For multiple CA (MCA) the default is an n individuals by p variables matrix of category values (numbers or text).
#'
#' @param datafile Name of a text file (in " ") containing the data, same defaults as xobject, ignored if xobject is non-null
#'
#' @param datasetname A string to use as the name of the data set in the plots, defaults to name of xobject or datafile
#'
#' @param nboots Number of boostrap replicate matrices used, default and recommended minimum is 999,
#' but 9999 is recommended if machine and data set size allows;
#' the calculated variances will sometimes differ
#' around the third decimal place, but the pictures should look the same. \cr
#' If nboots=0 then correspondence analysis is performed as usual with no variances calculated
#'
#' @param resampledistn
#' Poisson resampling is the default for SCA,\cr
#' nonparametric is the default for MCA (Poisson and multinomial will both default to nonparametric)
#' \describe{
#' \item{"Poisson"}{resampled matrices constructed using Poisson resampling on each cell separately (only SCA)}
#' \item{"multinomial"}{resampled matrices constructed using multinomial resampling, treating the
#' cells as defining one or more multinomial distributions (only SCA)}
#' \item{"nonparametric"}{non-parametric resampling of the rows of the n individuals by p variables matrix, equivalent to multinomial (only MCA)}
#' \item{"balanced"}{multinomial resampling balanced so that each data point occurs equally often over the resamples (only MCA)}
#' \item{"myresample"}{resampling algorithm is contained in a file called myresample.R}
#' }
#'
#' @param multinomialtype Only relevant for multinomial sampling in SCA, otherwise ignored:
#' \describe{
#' \item{"whole"}{all cells define a single multinomial distribution}
#' \item{"rowsfixed"}{row sums fixed, each row defines a separate multinomial distribution}
#' \item{"columnsfixed"}{column sums fixed, each column defines a separate multinomial distribution}
#' }
#'
#' @param printdims Print full correspondence analysis coordinates, contributions, correlations etc for all output
#' dimensions up to and including this one
#'
#' @param lastaxis Calculate variances and covariances for all output axes (dimensions) up to this one
#' (or the number of dimensions in the solution if smaller). \cr
#' Recommended maximum is maxrearrange-1 as variances for those above this axis may be inaccurate
#'
#' @param maxrearrange The maximum number of axes to consider when rearranging
#'
#' @param rearrangemethod The method used to rearrange the axes:
#' \describe{
#' \item{"lpassign"}{The Hungarian algorithm in the lpSolve package}
#' \item{anything else}{The embarrassingly slow direct comparison method used in version 1.0 - don't use it}
#' }
#' Option is only included in case something weird goes wrong with lpSolve.
#'
#' @param usebootcrits To be passed to the plot routine, see \code{\link{plotca}} for details
#'
#' @param groupings To be passed to the plot routine, see \code{\link{plotca}} for details
#'
#' @param grouplabels To be passed to the plot routine, see \code{\link{plotca}} for details
#'
#' @param varnames Character p-vector naming the variables, defaults to c("Rows","Columns") in sca
#'
#' @param plotsymbolscolours To be passed to the plot routine, see \code{\link{plotca}} for details
#'
#' @param othersmonochrome To be passed to the plot routine, see \code{\link{plotca}} for details
#'
#' @param crpercent To be passed to the plot routine, see \code{\link{plotca}} for details
#'
#' @param catype Type of correspondence analysis:
#' \describe{
#' \item{"sca"}{Simple (classical) correspondence analysis of a contingency table}
#' \item{"mca"}{Multiple correspondence analysis of a Burt, indicator or doubled matrix}
#' }
#'
#' @param scainput Format of input data, only applies for SCA:
#' \describe{
#' \item{"CT"}{Contingency table of counts, preferably with rows >= columns}
#' \item{"nbyp"}{An n individuals/objects/data points by p=2 categorical variables matrix,
#'   where each row is a different data point and each column contains the category
#'   for that data point on that variable, where these categories can be numbers, strings or factors}
#' \item{"nbypcounts"}{Similar to the above, but each row represents all of the data points taking
#'   the same combination of categories, and the first column contains the count for this combination
#'   (hence the name used here is a bit of a misnomer, but it emphasises the similarities to an n by p=2)}
#' }
#'
#' @param mcainput Format of input data, only applies for MCA:
#' \describe{
#' \item{"nbyp"}{An n individuals/objects/data points by p categorical variables matrix,
#'   where each row is a different data point and each column contains the category
#'   for that data point on that variable, where these categories can be numbers, strings or factors}
#' \item{"nbypcounts"}{Similar to the above, but each row represents all of the data points taking
#'   the same combination of categories, and the now first column contains the count for this combination,
#'   so that the input matrix is n by p+1
#'   (hence the name used here is a bit of a misnomer, but it emphasises the similarities to an n by p)}
#' \item{"indicator"}{An n by sum-of-distinct-variable-categories (i.e. sum of elements of Jk) indicator matrix}
#' }
#'
#' @param mcatype Format of data matrix analysed, only applies for MCA:
#' \describe{
#' \item{"Burt"}{Analyse the Burt matrix. Output will be given for the column (variable category) points but
#'   not for the (identical) row points. \cr
#'   NOTE: it is highly recommended that this version is used}
#' \item{"indicator"}{Analyse the indicator matrix. Output will be given for the column (variable category)
#'   points but not usually for the row (individual) points. \cr
#'   NOTE: the bootstrap method used in this case is highly experimental and very slow, see Details section part (2)}
#' \item{"doubled"}{Analyse the doubled matrix. Output will be given for the column
#'   points (two points, high and low, for each variable) but not usually for the row (individual) points. \cr
#'   NOTE: the bootstrap method used in this case is highly experimental and very slow, see Details section part (2)}
#' }
#'
#' @param mcavariant Currently must be "mca", placeholder for future updates
#'
#' @param mcasupplementary How the sample points are projected as supplementary points onto the bootstrap axes
#' when calculating the variances in MCA of a Burt matrix, see Details section for full explanation
#' \describe{
#' \item{"offdiag"}{Only the off-diagonal parts of the Burt matrix are used in the projection}
#' \item{"all"}{Projection is calculated in the usual way}
#' }
#' If "offdiag" then when p=2 the variances will be very similar to those from SCA. \cr\cr
#' If "all" then the fact that the diagonal elements of the Burt matrix are by definition
#' the same for both the sample and bootstrap matrices
#' means that the projected differences between sample and population points,
#' and hence the variances, will be artificially small. \cr\cr
#' NOTE: if mcaadjustinertias is FALSE then mcasupplementary will be set to "all",
#' because adjusting the coordinates in the calculation of the variances but not adjusting
#' the inertias makes no sense
#'
#' @param mcaadjustinertias Whether to adjust inertias to allow for the meaningless inertia terms
#'   induced by the diagonal of the Burt matrix in MCA:
#' \describe{
#' \item{TRUE}{Analysing Burt matrix: subtract 1/p from all singular values, then positive singular values are
#'   multiplied by p/(p-1) while negative ones are ignored \cr
#'   Analysing indicator matrix: same applies but to the squared singular values }
#' \item{FALSE}{No adjustment}
#' }
#' If TRUE then when p=2 the inertias will agree with those from SCA. \cr\cr
#' NOTE: inertias are the square (Burt) or fourth power (indicator) of the (adjusted) singular values.\cr\cr
#' NOTE: if mcaadjustinertias is FALSE then mcaadjustcoords will also be set to FALSE and mcasupplementary
#' set to "all", as adjusting the coordinates but not the inertias makes no sense
#'
#' @param mcauseadjustinertiasum How to define the total inertia in MCA,
#'   whether to just use the sum of the adjusted inertias:
#' \describe{
#' \item{TRUE}{The inertias are expressed as a percentage of the sum of the adjusted inertias (Benzecri)}
#' \item{FALSE}{The inertias are expressed as a percentage of the average of off-diagonal inertias (Greenacre),
#'   note that this will be incorrect if Jk includes categories that are not observed in the data }
#' }
#' If TRUE then when p=2 the inertias will agree with those from SCA \cr
#' If TRUE then the percentage inertias will sum to 100\% \cr
#' If FALSE then the percentage inertias will usually sum to less than 100\%
#'
#' @param mcaadjustcoords Whether to adjust the principal coordinates in MCA using the adjusted
#' inertias above, as in Greenacre and Blasius, p68:
#' \describe{
#' \item{TRUE}{Adjust coordinates, but only for column points}
#' \item{FALSE}{No adjustment}
#' }
#' If TRUE then when p=2 the coordinates will agree with those from SCA. \cr\cr
#' NOTE: if mcaadjustinertias is FALSE then mcaadjustcoords will also be set to FALSE, as adjusting the coordinates
#' but not the inertias makes no sense
#'
#' @param mcaadjustmassctr Whether to adjust the point masses and column contributions in MCA so that the masses and contributions are
#'   with respect to each variable (as in SCA) rather than with respect to all variables together:
#' \describe{
#' \item{TRUE}{Multiply point masses and contributions by p so that they sum to p over all variables}
#' \item{FALSE}{No adjustment}
#' }
#' If TRUE then when p=2 the CTR will agree with those from SCA,
#' though when p>2 the contributions can be >1
#'
#' @param mcaoneploteach Parameter passed to \code{\link{plotca}} for MCA.\cr\cr
#' A flag saying whether to produce one plot for each variable,
#' where confidence ellipses are shown for that variable but not others:
#' \describe{
#' \item{TRUE}{p plots are produced, each showing confidence regions for the category points
#' for just one variable}
#' \item{FALSE}{only one plot produced, with confidence regions shown for each category
#' point of each variable, which could be very "busy"}
#' }
#'
#' @param mcashowindividuals Parameter passed to \code{\link{plotca}} for MCA.\cr\cr
#' For MCA on an indicator matrix only, a flag saying whether to plot the individuals:
#' \describe{
#' \item{TRUE}{plot the individuals, which could be very "busy". \cr
#' NOTE: if mcaindividualboot=TRUE this also plots the CRs constructed using the experimental method,
#' see Details section part (2)}
#' \item{FALSE}{don't plot the individuals}
#' }
#'
#' @param mcavariablecolours Parameter passed to \code{\link{plotca}} for MCA:
#' \describe{
#' \item{TRUE}{In MCA each variable has its own colour, and all category
#' points and ellipses for that variable have the same colour}
#' \item{FALSE}{Colours chosen in the default way}
#' }
#'
#' @param mcacategorycolours Parameter passed to \code{\link{plotca}} for MCA:
#' \describe{
#' \item{TRUE}{In MCA each category number has its own colour,
#' and all points and ellipses for that category number have the same colour,
#' for all variables (intended for Likert type data so that all category 1 points are
#' the same colour etc)}
#' \item{FALSE}{Colours chosen in the default way}
#' }
#'
#' @param Jk The number of classes for each variable in MCA, as a list or vector, which only needs
#' specifying when inputting an indicator matrix, as in other cases it can be derived from the input matrix
#'
#' @param varandcat Flag for how to construct variable category names:
#' \describe{
#' \item{TRUE}{names are varname:catname, to be used if many variables have the same categories, e.g. Likert}
#' \item{FALSE}{names are just category names, to be used if variables all have distinct categories}
#' }
#'
#' @param likertarrows Parameter passed to \code{\link{plotca}} for MCA.\cr\cr
#' If TRUE then, for likert-type ordered categorical data,
#' draw arrows connecting the category points for each variable,
#' with the arrow drawn from a category point to the next higher category point
#'
#' @param mcastoreindicator If TRUE then store the indicator matrix created for MCA
#'
#' @param mcaindividualboot If TRUE then use the experimental method to bootstrap an indicator or doubled matrix,
#' see Details section part (2) for full explanation
#'
#' @param mcalikertnoise The "noise" value to use in the experimental method (above)
#' to bootstrap an indicator or doubled matrix,
#' see Details section part (2) for full explanation
#'
#' @param poissonzeronewmean Experimental method for SCA to deal with contingency tables where zero cells
#' could have been non-zero, i.e. they are not structural zeros.\cr
#' Only relevant for Poisson sampling in SCA, otherwise ignored:\cr
#' 0 : no effect, method as described in paper \cr
#' 1 : cells which are zero in the data are instead resampled from a
#' Poisson distribution with this mean, which should be very small (say 0.1);
#' this is for situations where rare cases did not occur but could have done,
#' so that it might be appropriate for zero cells in the sample to be
#' occasionally non-zero in resamples
#'
#' @param newzeroreset Experimental method for SCA to deal with sparse contingency tables.\cr
#' Only relevant for SCA, otherwise ignored:\cr
#' 0 : no effect, method as described in paper\cr
#' 1 : if a cell value is non-zero in the sample but zero in the resample
#' then it is reset to 1 in the resample, so that the sparsity structure
#' of the sample is maintained in the resample. This can be useful with
#' sparse data sets and, in effect, conditions on the sample sparsity structure
#'
#' @param bootstdcoords If TRUE then produce bootstrap variances for points in standard coordinates
#' instead of principal coordinates\cr
#' Note: intended only for experiments with the methodology
#'
#' @param reflectonly If TRUE then just allow for axis reflections and not axis reorderings\cr
#' Note: intended only for experiments with the methodology
#'
#' @param showresults If TRUE then output the results using \code{\link{summaryca}}
#' and \code{\link{plotca}}, otherwise output suppressed
#'
#' @param eps Any value less than this is treated as zero in some calculations and comparisons
#'
#' @return An object of class \code{\linkS4class{cabootcrsresults}}
#'
#' @examples
#' # Simple CA (SCA) of a 5 by 4 contingency table, using all SCA defaults:
#' # 999 bootstraps, Poisson resampling, variances for up to first four axes,
#' # usual output for up to the first 4 axes,
#' # one biplot with CRs for rows in principal coordinates and another with
#' # CRs for columns in principal coordinates
#'
#' bd <- cabootcrs(DreamData)
#'
#' \dontrun{
#'
#' # Same data set with a completely random three-category third variable added,
#' # analysed with MCA but with standardisations which mimic SCA as much as possible
#'
#' bd3 <- cabootcrs(DreamData223by3, catype="mca")
#'
#' Explicitly stating what the rows and columns represent, often needed for a contingency table
#'
#' bd <- cabootcrs(DreamData, datasetname="Maxwell's dream data",
#'                 varnames=c("What the rows are","What the columns are"))
#'
#' # Multiple CA (MCA) of 3 categorical variables with all defaults:
#' # non-parametric resampling, Burt matrix analysed,
#' # each variable has one plot with it in colour with CRs shown, other variables in monochrome.
#' # Same data set but now as 223 by 3 matrix, with random 3rd column (with 3 categories) added.
#'
#' bd3 <- cabootcrs(DreamData223by3, catype="mca")
#'
#' # Comparison of SCA to MCA with p=2, by converting contingency table to 223 by 2 matrix.
#' # Note that the coordinates and inertias etc are the same while the standard deviations
#' # and hence the ellipses are very similar but not identical.
#'
#' bd <- cabootcrs(DreamData)
#' DreamData223by2 <- convert(DreamData,input="CT",output="nbyp")$result
#' bdmca <- cabootcrs(DreamData223by2, catype="mca", varandcat=FALSE)
#'
#' # Not adjusting inertias, which means that coordinates will also not be adjusted and
#' # the bootstrapping will use the Burt diagonal.
#' # Note how the coordinates are larger but the inertias and ellipses are smaller.
#'
#' bdmcaunadj <- cabootcrs(DreamData223by2, catype="mca", varandcat=FALSE, mcaadjustinertias=FALSE)
#'
#' # Applying the standard adjustments to inertias and coordinates, but with
#' # the bootstrapping still using the Burt diagonal.
#' # Note how inertias and coordinates are now the same as SCA, but ellipses are smaller.
#'
#' bdmcaadjbutall <- cabootcrs(DreamData223by2, catype="mca", varandcat=FALSE, mcasupplementary="all")
#'
#'
#' # Effect of sample size in SCA:
#'
#' bdx4 <- cabootcrs(4*DreamData)
#' bdx9 <- cabootcrs(9*DreamData)
#'
#' ba <- cabootcrs(AttachmentData)
#'
#' bs <- cabootcrs(SuicideData)
#'
#' bas <- cabootcrs(AsbestosData)
#'
#'
#' # Options for SCA:
#'
#' # SCA with multinomial resampling, with the matrix treated as a single multinomial distribution
#'
#' bdm <- cabootcrs(DreamData, resampledistn="multinomial")
#'
#' # Fix the row sums, i.e. keep sum of age group constant
#'
#' bdmrf <- cabootcrs(DreamData, resampledistn="multinomial", multinomialtype="rowsfixed")
#'
#' # Use chi-squared critical values for the CRs
#'
#' bdchisq <- cabootcrs(DreamData, usebootcrits=FALSE)
#'
#' # Just perform correspondence analysis, without bootstrapping
#'
#' bdnb0 <- cabootcrs(DreamData, nboots=0)
#'
#'
#' # Effect of sample size in MCA:
#'
#' bn <- cabootcrs(NishData, catype="mca")
#'
#'
#' # Options for MCA
#'
#' # Using default settings the SCA and MCA standard results are the same when p=2,
#' # bootstrap standard deviations (multinomial/nonparametric) are similar but not identical
#'
#' bdsca <- cabootcrs(DreamData,resampledistn="multinomial")
#' bdmca <- cabootcrs(convert(DreamData,input="CT",output="nbyp")$result, catype="mca")
#'
#' # Row A can be labelled A rather than R:A
#' # because the three variables have all different category names
#'
#' bd3l <- cabootcrs(DreamData223by3, catype="mca", varandcat=FALSE)
#'
#' # Balanced resampling, each of the 223 rows occurs 999 times in the 999 resamples
#'
#' bd3b <- cabootcrs(DreamData223by3, catype="mca", resampledistn="balanced")
#'
#' # Do not adjust inertias, coordinates or contributions
#' # (if inertias are not adjusted then coordinates are also not adjusted)
#'
#' bd3unadj <- cabootcrs(DreamData223by3, catype="mca",mcaadjustinertias=FALSE)
#'
#' ## Comparisons to ellipses from FactoMineR
#'
#' # Generate some completely random uniform categorical data, construct ellipses.
#' # The cabootcrs ellipses are very large and overlap extensively, as you would expect
#' # from completely random data.
#' # The FactoMineR ellipses are much smaller, often with minimal overlaps,
#' # giving a completely false impression of genuine differences between categories.
#'
#' library(FactoMineR)
#' p <- 4
#' maxcat <- 5
#' n <- 100
#' Xnpr <- apply( as.data.frame( matrix( round(runif(n*p,0.5,maxcat+0.5)), n, p)), 2, factor )
#' fr <- MCA(Xnpr, method="Burt")
#' plotellipses(fr)
#' br <- cabootcrs(Xnpr, catype="mca", showresults=FALSE)
#' plotca(br, mcacategorycolours = TRUE, showcolumnlabels=FALSE)
#'
#' ## Comparisons to results in ca and FactoMineR
#'
#' Summary: If using unadjusted inertias, coordinates the packages produce identical results,
#' apart from differences in presentation (rounding off, the naming of rep/cor/cos2).
#'
#' Summary: When using adjusted inertias and coordinates (not an option in FactoMineR::MCA)
#' the correlations in ca::mjca no longer sum to 1 over all dimensions, in cabootcrs they do.
#' Ratios are the same for each dimension, but not each point, they are standardised differently.
#'
#' # Example comparisons with random data
#' library(FactoMineR)
#' library(ca)
#' p <- 4
#' maxcat <- 5
#' n <- 100
#' Xnpdf <- as.data.frame( matrix( round(runif(n*p,0.5,maxcat+0.5)), n, p))
#' Xnpr <- apply( Xnpdf, 2, factor )
#'
#' # Note that ca::mjca only accepts the data as numerical,
#' # FactoMineR::MCA only acccepts the data as characters
#' rbun <- cabootcrs(Xnpr, catype="mca", nboots=0, mcaadjustinertias = FALSE)
#' rcun <-  mjca(Xnpdf,lambda="Burt")
#' summary(rcun)
#' rfm <- MCA(Xnpr,method="Burt", graph=FALSE)
#' summary(rfm)
#' rb <- cabootcrs(Xnpr, catype="mca", nboots=0)
#' rc <- mjca(Xnpdf)
#' summary(rc)
#' realr <- rb@br@realr
#' rb@ColREP[,1:realr]
#' rc$colcor[,1:realr]
#' apply(rb@ColREP[,1:realr],1,"sum")
#' apply(rc$colcor[,1:realr],1,"sum")
#'
#' }
#'
#' @seealso \code{\link{cabootcrs-package}}, \code{\linkS4class{cabootcrsresults}},
#' \code{\link{plotca}}, \code{\link{printca}}, \code{\link{summaryca}},
#' \code{\link{covmat}}, \code{\link{allvarscovs}}
#'
#' @import lpSolve
#'
#' @rawNamespace import(colorspace, except=plot)
#'
#' @import methods
#'
#' @import grDevices
#'
#' @import graphics
#'
#' @import stats
#'
#' @import utils
#'
#' @export
cabootcrs <- function(
  xobject=NULL, datafile=NULL, datasetname=NULL,
  nboots=999, resampledistn="Poisson", multinomialtype="whole",
  printdims=4, lastaxis=4,
  maxrearrange=6, rearrangemethod="lpassign", usebootcrits=TRUE,
  groupings=NULL, grouplabels=NULL, varnames=NULL,
  plotsymbolscolours=c(19,"inferno",18,"inferno"),
  othersmonochrome="grey", crpercent=95,
  catype="sca", scainput="CT", mcainput="nbyp", mcatype="Burt",  mcavariant="mca",
  mcasupplementary="offdiag", mcaadjustinertias=TRUE, mcauseadjustinertiasum=FALSE,
  mcaadjustcoords=TRUE, mcaadjustmassctr=FALSE,
  mcaoneploteach=TRUE, mcashowindividuals=FALSE,
  mcavariablecolours=FALSE, mcacategorycolours=FALSE,
  Jk=NULL, varandcat=TRUE, likertarrows=FALSE,
  mcastoreindicator=TRUE, mcaindividualboot=FALSE, mcalikertnoise=0.1,
  poissonzeronewmean=0, newzeroreset=0, bootstdcoords=FALSE, reflectonly=FALSE,
  showresults=TRUE, eps=1e-15 ) {

if (!(is.null(xobject))) { if ( !(is.matrix(xobject)) & !(is.data.frame(xobject)) ) stop(paste("Input must be matrix or data frame\n\n")) }
if (nboots==0) {
  cat(paste("\n Standard Correspondence Analysis results only, no confidence regions\n\n"))
} else {
if (nboots<999) cat(paste("\n WARNING", nboots, "is too few bootstrap replicates for reliable results\n\n"))
if ( (catype=="sca") & !any(resampledistn==c("Poisson","multinomial","myresample")) ) stop(paste("Resampling must be Poisson or multinomial"))
if (catype=="mca") {
  # in mca multinomial is same as nonparametric
  if ( (any(resampledistn==c("Poisson","multinomial"))) ) { resampledistn <- "nonparametric" }
  if (mcaindividualboot) {
    if (!(resampledistn=="balanced")) {
      cat(paste("\n WARNING balanced resampling used \n\n"))
      resampledistn <- "balanced"
    }
  } else {
    if (!any(resampledistn==c("nonparametric","balanced","myresample"))) stop(paste("Resampling must be nonparametric or balanced \n\n"))
  }
} # mca
if (!any(multinomialtype==c("whole","rowsfixed","columnsfixed")))
    stop(paste("Multinomial resampling is either whole, rowsfixed or columnsfixed\n\n"))
if ( !(resampledistn=="multinomial") & any(multinomialtype==c("rowsfixed","columnsfixed")) )
    cat(paste("\n WARNING can only fix rows or columns with multinomial resampling\n\n"))
if (poissonzeronewmean<0) stop(paste("Poisson mean for zero entries must be non-negative\n\n"))
if (!any(newzeroreset==c(0,1))) stop(paste("Zeros in sample can only be set to zero or one in bootstrap\n\n"))
} # if bootstrapping
if (printdims<2) stop(paste("number of dims for output must be at least 2\n\n"))
if (lastaxis<2) stop(paste("last axis must be at least 2\n\n"))
if (!any(catype==c("sca","mca"))) stop(paste("Must be sca or mca"))
if (!any(scainput==c("nbyp","nbypcounts","CT"))) stop(paste("Must be CT, nbyp or nbypcounts"))
if (!any(mcatype==c("Burt","indicator","doubled"))) stop(paste("Must be Burt, indicator or doubled"))
if ((mcainput=="doubled")) stop(paste("For doubled analysis please supply data as indicator, nbyp or nbypcounts"))
if (!any(mcainput==c("nbyp","nbypcounts","indicator"))) stop(paste("Must be nbyp, nbypcounts or indicator, can't be Burt as this loses the\n multi-way associations used in the bootstrapping,\n where resampling is performed on the indicator matrix"))
if (!any(mcavariant==c("mca"))) stop(paste("Must be mca")) # eventually add omca
if (!any(mcasupplementary==c("offdiag","all"))) stop(paste("Must be all or offdiag"))
if (!is.null(Jk)) { if ( !(any( is(Jk,"numeric"), is(Jk,"integer") )) ) stop(paste("Must be rep(maximum category,number of variables) or similar")) }
if ((is.null(Jk))&(mcainput=="indicator")) stop(paste("Must specify Jk when input is indicator matrix"))
if ((mcaadjustinertias==FALSE)&(mcaadjustcoords==TRUE)) {
  mcaadjustcoords <- FALSE
  cat(paste("\n WARNING Not adjusting inertias so mcaadjustcoords set to FALSE\n"))
}
if ((mcaadjustinertias==FALSE)&(mcasupplementary=="offdiag")) {
  mcasupplementary <- "all"
  cat(paste("\n WARNING Not adjusting inertias so mcasupplementary set to all\n"))
}
if ((mcaadjustcoords==TRUE)&(mcasupplementary=="all")) {
  cat(paste("\n WARNING Using adjusted coordinates in the results but not correcting for the Burt diagonal in the bootstrapping is not recommended \n\n"))
}
if ((mcaadjustcoords==FALSE)&(mcasupplementary=="offdiag")) {
  cat(paste("\n WARNING Correcting for the Burt diagonal in the bootstrapping but not using adjusted coordinates in the usual results is not recommended \n\n"))
}
if (mcalikertnoise<0) stop(paste("Must be non-negative"))
if ((catype=="mca")&(any(mcatype==c("indicator","doubled")))&(mcavariant=="mca")&(nboots>1)) {
  if (mcaindividualboot) {
    cat(paste("\n WARNING bootstrap standard deviations calculated from indicator/doubled matrix \n are based on highly experimental methods and should not be relied upon\n"))
  } else {
    cat(paste("\n WARNING bootstrap standard deviations calculated from indicator/doubled matrix \n do not make sense because rows of sample and bootstrap matrices \n do not correspond to each other\n"))
  }
}

IndicatorMatrix <- matrix(0)

# READ DATA FILE OR DATA OBJECT
# add row and column names if needed, or if currently has default "1" and "V1" etc

if (catype=="sca") {

if (is.null(xobject)) { # data file

Xtable <- read.table(file = datafile)
if ((scainput=="CT")&(rownames(Xtable)[[1]]=="1")&(names(Xtable)[[1]]=="V1")) {
  for (i in 1:dim(Xtable)[1]) rownames(Xtable)[i] <- paste("r",i,sep="")
  for (i in 1:dim(Xtable)[2]) colnames(Xtable)[i] <- paste("c",i,sep="")
}
if (is.null(datasetname)) datasetname <- as.character(datafile)

} else { # data object

# horrible bodge because if xobject was created with table() then as.data.frame
# will undo the tabulation for no good reason, and as.array() does not work
#if (class(xobject)=="table") {
#  class(xobject) <- "array"
#  cat("Note: input class has been changed from table to array so that as.data.frame will not undo the tabulation")
#}

Xtable <- as.data.frame.matrix(xobject) # seems to coerce to data frame without undoing table()
if (scainput=="CT") {
  if ((is.null(rownames(xobject)))|(rownames(Xtable)[[1]]=="1")) {
    for (i in 1:dim(Xtable)[1]) rownames(Xtable)[i] <- paste("r",i,sep="")
  }
  if ((is.null(colnames(xobject)))|(names(Xtable)[[1]]=="V1")) {
    for (i in 1:dim(Xtable)[2]) colnames(Xtable)[i] <- paste("c",i,sep="")
  }
}
if (is.null(datasetname)) datasetname <- deparse(substitute(xobject))

} # input file or object

if (scainput=="CT") {
  X <- as.matrix(Xtable)
  p <- 2
  needtrans <- dim(X)[1] < dim(X)[2]
  rowlabels <- rownames(Xtable)
  collabels <- colnames(Xtable)
  if (is.null(varnames)) { varnames <- c("Rows","Columns") }
} else { # n by p
  Xlist <- convert(Xtable)
  Xind <- Xlist$result
  if (is.null(varnames)) { varnames <- Xlist$varnames }
  cats <- Xlist$catnames
  Jk <- Xlist$Jk
  p <- Xlist$p
  if (p>2) stop(paste("More than 2 categorical variables, need to use mca not sca\n\n"))
  X <- t(Xind) %*% Xind
  if (Jk[1]>=Jk[2]) {
    rowlabels <- cats[1]
    collabels <- cats[2]
  } else {
    X <- t(X)
    rowlabels <- cats[2]
    collabels <- cats[1]
  }
}

} # sca

if (catype=="mca") {

if (is.null(xobject)) { # data file
  Xtable <- read.table(file = datafile)
  if (is.null(datasetname)) datasetname <- as.character(datafile)
} else { # data object
  Xtable <- as.data.frame(xobject)
  if (is.null(datasetname)) datasetname <- deparse(substitute(xobject))
}

if ((mcainput=="nbypcounts")&(names(Xtable)[[1]]=="V1")&(names(Xtable)[[2]]=="V2")) {
  names(Xtable)[[1]] <- "Counts"
  for (i in 2:dim(Xtable)[2]) colnames(Xtable)[i] <- paste("V",i-1,sep="")
}

if (any(mcainput==c("nbyp","nbypcounts"))) {
  if (mcainput=="nbypcounts") {
    Xnp <- convert(Xtable,input=mcainput,output="nbyp",varandcat=varandcat)$result
  }
  Xindlist <- convert(Xtable,input=mcainput,output="indicator",Jk=Jk,varandcat=varandcat)
  Xind <- Xindlist$result
  if (is.null(varnames)) { varnames <- Xindlist$varnames }
  cats <- Xindlist$catnames
  Jk <- Xindlist$Jk
  p <- Xindlist$p
  if (mcatype=="Burt") {
    X <- t(Xind) %*% Xind
    rowlabels <- colnames(Xind)
    collabels <- colnames(Xind)
  } else {
    if (mcatype=="indicator") {
      X <- Xind
      rowlabels <- rownames(Xtable)
      collabels <- colnames(Xind)
    }
  }
}

if (mcainput=="indicator") {
  Xind <- as.matrix(Xtable)
  if (any(Xind>1)) stop(paste("Not an indicator matrix\n\n"))
  p <- length(Jk)
  if (is.null(varnames)) {
    varnames <- character(p)
    for (i in 1:p) varnames[i] <- paste("V",i,sep="")
  }
  #cats <- vector("list", length=p)
  #for (k in 1:p) { cats[[k]] <- 1:Jk[k] }
  if (mcatype=="Burt") {
    X <- t(Xind) %*% Xind
    rowlabels <- colnames(Xtable)
    collabels <- colnames(Xtable)
  } else {
    if (mcatype=="indicator") {
      X <- Xind
      rowlabels <- rownames(Xtable)
      collabels <- colnames(Xtable)
    }
    if (mcaindividualboot) {
      Xnp <- convert(Xind,input="indicator",output="nbyp",Jk=Jk,varandcat=varandcat)$result
    }
  }
}

if (mcatype=="doubled") {
  X <- getdoubled(Xind,input="indicator",Jk=Jk)
  rowlabels <- rownames(Xtable)
  collabels <- colnames(X)
}

if ((mcastoreindicator==TRUE)) { IndicatorMatrix <- Xind }

  if (mcaindividualboot) {
    for (k in 1:p) {
      if (!(is.numeric(Xnp[,k]))) stop(paste("Only numerical categories make sense for bootstrapping individuals\n  If input is an indicator matrix then categories will be treated as numeric\n\n"))
    }
  }

} # mca

rows <- dim(X)[1]
cols <- dim(X)[2]
if (catype=="sca") {
  Jk = c(rows,cols)
  n <- sum(X)
}

S <- switch(catype, "sca"=sca(X,needtrans=needtrans), "mca"=sca(X,catype,mcatype,p) )

# most axes for which variances can be calculated
if ( (catype=="mca") & (mcatype=="Burt") & ( (mcasupplementary=="offdiag") | (mcaadjustinertias==TRUE) ) ) {
  axisvariances <- min(lastaxis,sum(S@mu>1/p))
} else {
  axisvariances <- min(lastaxis,rows-1,cols-1)
}

if ((lastaxis >= maxrearrange)&(cols >=maxrearrange+2))
  cat(paste("\n WARNING variances for the ", maxrearrange, "th axis and above may be unreliable\n\n"))
# if (lastaxis>axisvariances)
#   cat(paste("\n WARNING there are only", axisvariances, "axes in the solution\n\n"))

# if (mcavariant=="omca") { hdS <- ht(Xind,Jk) }

# zero rows/cols to be omitted from bootstrapping and summaries
# rows/cols with one non-zero element, will have variance zero
# rows/cols with two non-zero elements, will use chi^2 with df 1

zerorowS <- rowSums(X>0)==0
zerocolS <- colSums(X>0)==0
onerowS <- rowSums(X>0)==1
onecolS <- colSums(X>0)==1
tworowS <- rowSums(X>0)==2
twocolS <- colSums(X>0)==2


# BOOTSTRAPPING

# Initialise sums of squares matrices
#  (computationally poor and may cause problems if large,
#   but should be OK as values will tend to be close to zero)

RSBsum <- matrix(0,rows,axisvariances)
RSBsumsq <- matrix(0,rows,axisvariances)
RSBsumcp <- array(0,c(rows,axisvariances,axisvariances))
CSBsum <- matrix(0,cols,axisvariances)
CSBsumsq <- matrix(0,cols,axisvariances)
CSBsumcp <- array(0,c(cols,axisvariances,axisvariances))
rownB <- matrix(nboots,rows,axisvariances)
colnB <- matrix(nboots,cols,axisvariances)
#rownB <- rep(nboots,rows)
#colnB <- rep(nboots,cols)
RSBvar <- RSBsum
CSBvar <- CSBsum
RSBcov <- RSBsumcp
CSBcov <- CSBsumcp
sameaxisorder <- 0

# in fact cb will be changed for rowzeros/colzeros and/or when not all boot matrices full rank
# sorting for boot crit vals will give lots of zeros - start at first real value
alphas = c(0.9,0.95,0.99)
cb = (nboots+1) * alphas;
bootcritR <- array(dim=c(rows,axisvariances,axisvariances,length(alphas)))
bootcritC <- array(dim=c(cols,axisvariances,axisvariances,length(alphas)))


if (nboots>0) {

RSBB <- array(dim=c(rows,axisvariances,nboots))
CSBB <- array(dim=c(cols,axisvariances,nboots))
RowsBB <- array(0,c(rows,axisvariances,axisvariances,nboots))
ColsBB <- array(0,c(cols,axisvariances,axisvariances,nboots))

# Used when bootstrapping indicator matrix (equivalent to using n by p) in MCA
# note that bootstrapping for Burt is still done on indicator matrix

if (catype=="mca") { # bootstrap indicator matrix
  Xindrows <- dim(Xind)[1]
  if (resampledistn=="nonparametric") {
    bno <- matrix( sample( rep(1:Xindrows,nboots), replace=TRUE ), Xindrows, nboots)
  } else {
    if (resampledistn=="balanced") {
      bno <- matrix( sample( rep(1:Xindrows,nboots) ), Xindrows, nboots)
    }
  }
}

for (b in 1:nboots) {

if (catype=="sca") {

  if (resampledistn=="multinomial") {
    Xr <- vector("numeric",rows*cols)
    if (multinomialtype=="whole") { Xr <- rmultinom(1,n,X) }
    if (multinomialtype=="rowsfixed") { for (i in 1:rows) { Xr[ seq(i,(cols-1)*rows+i,by=rows) ] <- rmultinom(1,sum(X[i,]),X[i,]) } }
    if (multinomialtype=="columnsfixed") { for (i in 1:cols) { Xr[((i-1)*rows+1):(i*rows)] <- rmultinom(1,sum(X[,i]),X[,i]) } }
  } else {
    if (resampledistn=="Poisson") {
      Xr <- rpois(rows*cols,X+poissonzeronewmean*(X==0))
    } else {
      if (resampledistn=="myresample") { Xr <- myresamplefn(X) }
    }
  }

} else { # mca

  # resample rows, so row i in boot does not correspond to row i in sample
  if (any(mcatype==c("indicator","doubled"))) {
    Xrinitial <- X[bno[,b],] # standard version, balanced or not
    Xr <- Xrinitial
  } else { # Burt
    Xrinitial <- Xind[bno[,b],] # standard version, balanced or not
    Xr <- t(Xrinitial) %*% Xrinitial
  }
  if ( (any(mcatype==c("indicator","doubled")))&(mcaindividualboot) ) { # match boot rows to sample rows
    # i,j element is cost of source i to destination j, solution same
    # source i = i-th row in sample; destination j = j-th row in bootstrap
    Xrnp <- Xnp[bno[,b],]
    val <- matrix(0,Xindrows,Xindrows)
    if (mcalikertnoise>0) { # add noise/jitter to both to avoid tied matches
      Xrnpnoisy <- Xrnp + matrix(rnorm(rows*p,0,mcalikertnoise),nrow=rows,ncol=p)
      Xnpnoisy <- Xnp + matrix(rnorm(rows*p,0,mcalikertnoise),nrow=rows,ncol=p)
      for (i in 1:Xindrows) { for (j in 1:Xindrows) { val[i,j] <- sum((Xnpnoisy[i,]-Xrnpnoisy[j,])^2) } }
    } else {
      for (i in 1:Xindrows) { for (j in 1:Xindrows) { val[i,j] <- sum((Xnp[i,]-Xrnp[j,])^2) } }
    }
    # Want i-th row in reordered boot to correspond to i-th row in sample
    # 1 at (1,2) allocates 2nd boot row to correspond with 1st sample row
    assign <- lp.assign(val, direction="min")
    Xr <- (assign$solution) %*% Xrinitial
  } # likert reordering

} # mca

XB <- matrix(data=Xr,nrow=rows,ncol=cols)

if (newzeroreset==1) { XB <- (XB==0 & X>0)+XB } # set new zeros to 1

B <- switch(catype, "sca"=sca(XB), "mca"=sca(XB,catype,mcatype,p) )

# may have problem if X B more sparse than X S, as will then have different Jk and some zeros
# if (mcavariant=="omca") { hdB <- hd(Xrind,Jk) }

# check for "genuine" bootstrap axes so don't include meaningless ones
if ( (catype=="mca") & (mcatype=="Burt") & ( (mcasupplementary=="offdiag")|(mcaadjustinertias==TRUE) ) ) {
  bootaxes <- sum(B@mu>1/p)
} else {
  bootaxes <- axisvariances
}
fakeaxes <- rep(FALSE,axisvariances)
if (bootaxes < axisvariances) { fakeaxes[(bootaxes+1):axisvariances] <- TRUE }

# Check for zero rows/cols
zerorowB <- rowSums(XB>0)==0
zerocolB <- colSums(XB>0)==0

# reduce effective number of boots by 1 if zero row/col or meaningless axis
rownB <- rownB - 1 + as.matrix(1-zerorowB) %*% (1-fakeaxes)
colnB <- colnB - 1 + as.matrix(1-zerocolB) %*% (1-fakeaxes)

# Rearrange bootstrap axes if needed

if (rearrangemethod=="lpassign") {
  Re <- rearrange( S@Raxes, B@Raxes, S@Caxes, B@Caxes, B@realr, reflectonly, catype, mcatype, mcaindividualboot, maxrearrange)
} else {
  Re <- rearrange_old( S@Raxes, B@Raxes, S@Caxes, B@Caxes, B@realr)
}

RaxesBRe <- B@Raxes
CaxesBRe <- B@Caxes

# T is such that 1 at (2,1) means that 2nd boot axis is matched to 1st sample axis
#                1 at (1,2) means that 1st boot axis is matched to 2nd sample axis
# i.e. i,j-th element matches i-th boot axis with j-th sample axis
# Note that it is transposed from lp.assign solution
RaxesBRe[ ,1:Re$numre] <- RaxesBRe[ ,1:Re$numre] %*% Re$T
CaxesBRe[ ,1:Re$numre] <- CaxesBRe[ ,1:Re$numre] %*% Re$T

RSB <- ( B@Rprofile - S@Rprofile ) %*% B@Rweights %*% RaxesBRe[ ,1:axisvariances]
CSB <- ( B@Cprofile - S@Cprofile ) %*% B@Cweights %*% CaxesBRe[ ,1:axisvariances]

# Only use off-diagonal elements of Burt to calculate coordinates and hence supplementary points.
# Use singular values to get back to usual or adjusted MCA coordinates.
#  - using the difference between profiles already eliminates the diagonal element.
# currently only corrects for Burt, not Indicator

if ( (catype=="mca") & (mcatype=="Burt") & (mcasupplementary=="offdiag") ) {
  truebsvs <- B@mu[B@mu>1/p]
  if (mcaadjustcoords==TRUE) { # use offdiag only and adjust coords
    svcorrection <- (p/(p-1))*diag(1,max(S@realr,Re$numre))
  } else { # use offdiag only but usual MCA coords
    svcorrection <- diag(truebsvs/(truebsvs-1/p))
  }
  svcorrection[1:Re$numre,1:Re$numre] <- svcorrection[1:Re$numre,1:Re$numre] %*% Re$T
  # If B has fewer >1/p than sample then add extra zeros to fill size
  dimsvc <- dim(svcorrection)[1]
  if (dimsvc<axisvariances) {
    svc <- diag(0,axisvariances)
    svc[1:dimsvc,1:dimsvc] <- svcorrection
    svcorrection <- svc
  }
  # If meaningless axes then last RSB and CSB will be set to zero as they are also meaningless
  RSB <- RSB %*% svcorrection[1:axisvariances,1:axisvariances]
  CSB <- CSB %*% svcorrection[1:axisvariances,1:axisvariances]
}

# if ( (catype=="mca") & (mcatype=="nbyptoburt") & (mcaadjustcoords==TRUE) ) {
# adjust principal coordinates as in Greenacre and Blasius, p68, but only for "columns"
# (not needed here as adjustment done below)
# adjsvs <- (p/(p-1)) * (B@mu[B@mu>(1/p)]-(1/p))
# svcorrection <- diag(adjsvs/(B@mu[B@mu>(1/p)]))
# svcorrection[1:Re$numre,1:Re$numre] <- svcorrection[1:Re$numre,1:Re$numre] %*% Re$T
# CSB <- CSB %*% svcorrection[1:axisvariances,1:axisvariances]
# }

# Might also need to set to zero when Burt and mcaadjustinertias but not offdiag,
# but shouldn't do that in the first place
RSB <- RSB * (1-zerorowS) * (1-zerorowB)
CSB <- CSB * (1-zerocolS) * (1-zerocolB)
sameaxisorder <- sameaxisorder + Re$same

# Unlikely to want to bootstrap standard coordinates in practice, but may be useful in research

if (bootstdcoords==TRUE) {
dmum1 <- diag( 1/(B@mu + (B@mu==0)) * (1-(B@mu==0)) )
dmum1[1:Re$numre,1:Re$numre] <- dmum1[1:Re$numre,1:Re$numre] %*% Re$T
dmum1 <- dmum1[1:axisvariances,1:axisvariances]
RSB <- RSB %*% dmum1
CSB <- CSB %*% dmum1
}

# bootstrap-derived critical values, re-using bootstraps
RSBB[ , , b] <- RSB
CSBB[ , , b] <- CSB

RSBsum <- RSBsum + RSB
RSBsumsq <- RSBsumsq + RSB*RSB
CSBsum <- CSBsum + CSB
CSBsumsq <- CSBsumsq + CSB*CSB
if (axisvariances>1) {
for (a1 in 1:(axisvariances-1)) {
  for (a2 in (a1+1):axisvariances) {
    RSBsumcp[ ,a1,a2] <- RSBsumcp[ ,a1,a2] + RSB[ ,a1]*RSB[ ,a2]
    CSBsumcp[ ,a1,a2] <- CSBsumcp[ ,a1,a2] + CSB[ ,a1]*CSB[ ,a2]
} } }

} # boots

RSBmean <- ( 1/( rownB + (rownB==0) ) ) * (1-(rownB==0)) * RSBsum
CSBmean <- ( 1/( colnB + (colnB==0) ) ) * (1-(colnB==0)) * CSBsum

rbm1 <- ( 1/( rownB-1 + 2*(rownB<=1) ) ) * (1-(rownB<=1))
cbm1 <- ( 1/( colnB-1 + 2*(colnB<=1) ) ) * (1-(colnB<=1))

RSBvar <- rbm1 * ( RSBsumsq - rownB * RSBmean * RSBmean )
CSBvar <- cbm1 * ( CSBsumsq - colnB * CSBmean * CSBmean )

rownames(RSBvar) <- rowlabels
rownames(CSBvar) <- collabels

if (axisvariances>1) {
for (a1 in 1:(axisvariances-1)) {
  for (a2 in (a1+1):axisvariances) { # later axis gives effective sample size
  RSBcov[ ,a1,a2] <- rbm1[ ,a2] * ( RSBsumcp[ ,a1,a2] - rownB[ ,a2] * RSBmean[ ,a1] * RSBmean[ ,a2] )
  CSBcov[ ,a1,a2] <- cbm1[ ,a2] * ( CSBsumcp[ ,a1,a2] - colnB[ ,a2] * CSBmean[ ,a1] * CSBmean[ ,a2] )
} } }

# bootstrap-derived critical values, re-using bootstraps
# must correct cb when not all boots used due to rowzeros or not full rank
# start at first genuine value and then replace nboots with rownB and round
for (a1 in 1:(axisvariances-1)) {
  for (a2 in (a1+1):axisvariances) {

for (i in 1:rows) {
  detcovmat <- RSBvar[i,a1]*RSBvar[i,a2]-RSBcov[i,a1,a2]^2
  if (detcovmat > eps) {
    Sm1 <- (1/detcovmat)*matrix(c(RSBvar[i,a2],-RSBcov[i,a1,a2],-RSBcov[i,a1,a2],RSBvar[i,a1]),nrow=2)
    for (bb in 1:nboots) {
      RowsBB[i,a1,a2,bb] <- RSBB[i,c(a1,a2),bb] %*% Sm1 %*% as.matrix(RSBB[i,c(a1,a2),bb])
    }
  } else {
    for (bb in 1:nboots) { RowsBB[i,a1,a2,bb] <- 0 }
  }
}
for (i in 1:rows) {
  RowsBB[i,a1,a2,] <- sort(RowsBB[i,a1,a2,])
  cb = round( nboots - rownB[i,a2] + (rownB[i,a2]+1) * alphas )
  if (cb[3]>nboots) { cb <- cb-1 } # above gives nboots+1 if rownB=0
  bootcritR[i,a1,a2,] <- RowsBB[i,a1,a2,cb]
}

for (i in 1:cols) {
  detcovmat <- CSBvar[i,a1]*CSBvar[i,a2]-CSBcov[i,a1,a2]^2
  if (detcovmat > eps) {
    Sm1 <- (1/detcovmat)*matrix(c(CSBvar[i,a2],-CSBcov[i,a1,a2],-CSBcov[i,a1,a2],CSBvar[i,a1]),nrow=2)
    for (bb in 1:nboots) {
      ColsBB[i,a1,a2,bb] <- CSBB[i,c(a1,a2),bb] %*% Sm1 %*% as.matrix(CSBB[i,c(a1,a2),bb])
    }
  } else {
    for (bb in 1:nboots) { ColsBB[i,a1,a2,bb] <- 0 }
  }
}
for (i in 1:cols) {
  ColsBB[i,a1,a2,] <- sort(ColsBB[i,a1,a2,])
  cb = round( nboots - colnB[i,a2] + (colnB[i,a2]+1) * alphas )
  if (cb[3]>nboots) { cb <- cb-1 } # above gives nboots+1 if colnB=0
  bootcritC[i,a1,a2,] <- ColsBB[i,a1,a2,cb]
}

} } # a1, a2

} # checking boots>0


# OTHER CALCULATIONS

Fmat <- S@Rprofile %*% S@Rweights %*% S@Raxes
Gmat <- S@Cprofile %*% S@Cweights %*% S@Caxes

rownames(Fmat) <- rowlabels
rownames(Gmat) <- collabels

dmum1 <- diag( 1/(S@mu + (S@mu==0)) * (1-(S@mu==0)) )
Gbi <- Gmat %*% dmum1
Fbi <- Fmat %*% dmum1

# Calc inertia sum

# Calculate adjusted inertias if mcaadjustinertias is true (Greenacre and Blasius, p68),
# also adjust principal coordinates if mcaadjustcoords is true.
# Use sum of adjusted inertias if mcauseadjustinertiasum is true (Benzecri),
# otherwise use average of off-diagonal inertias (Greenacre)
#  - this will be wrong if Jk is used to include zero categories

if ( (catype=="mca") & (mcaadjustinertias==TRUE) & (!(mcatype=="doubled")) ) {
  genuinesvs <- S@mu[1:S@realr]
  if (mcatype=="Burt") {
    adjsvs <- (p/(p-1)) * (genuinesvs-(1/p))
  } else { # indicator
    adjsvs <- (p/(p-1)) * (genuinesvs^2-(1/p))
  }
  if (mcaadjustcoords==TRUE) {
    svcorrection <- diag(adjsvs/genuinesvs)
    Gmat[,1:S@realr] <- Gmat[,1:S@realr] %*% svcorrection
    Fmat[,1:S@realr] <- Fmat[,1:S@realr] %*% svcorrection
  }
  Fsq <- Fmat[,1:S@realr]^2
  Gsq <- Gmat[,1:S@realr]^2
  inertia <- adjsvs^2
  if (mcauseadjustinertiasum==TRUE) { # sum of adjusted inertias
    inertiasum <- sum(inertia)
  } else { # average of off-diagonal inertias
    if (mcatype=="Burt") {
      inertiasum <- (p/(p-1)) * ( sum(S@mu^2) - (cols-p)/p^2 )
    } else { # indicator
      inertiasum <- (p/(p-1)) * ( sum(S@mu^4) - (cols-p)/p^2 )
    }
  }
} else {
  Fsq <- Fmat^2
  Gsq <- Gmat^2
  inertia <- (S@mu)^2
  inertiasum <- sum(inertia)
}

dmum2 <- diag( 1/(inertia + (inertia==0)) * (1-(inertia==0)) )
inertiapc <- 100*inertia/inertiasum
cuminertiapc <- cumsum(inertiapc)
inertiapc <- round(100*inertiapc)/100
cuminertiapc <- round(100*cuminertiapc)/100
inertias <- cbind(inertia,inertiapc,cuminertiapc)

# Calc contributions and correlations
# if mcaadjustmassctr is true then adjust ColCTR by *p so sum to p overall (i.e. average 1 each)

Xstd <- X/sum(X)
dr <- diag( as.vector(rowSums(Xstd)) )
dc <- diag( as.vector(colSums(Xstd)) )

RowCTR <- dr %*% Fsq %*% dmum2
Frs <- diag(1/rowSums(Fsq))
RowREP <- Frs %*% Fsq

ColCTR <- dc %*% Gsq %*% dmum2
Grs <- diag(1/rowSums(Gsq))
ColREP <- Grs %*% Gsq

if ( (catype=="mca")&(mcaadjustmassctr==TRUE) ) { massmult<-p } else { massmult<-1 }
rowmasses <- massmult*1000*rowSums(X)/sum(X)
colmasses <- massmult*1000*colSums(X)/sum(X)

rownames(RowCTR) <- rowlabels
rownames(RowREP) <- rowlabels
rownames(ColCTR) <- collabels
rownames(ColREP) <- collabels

# If inertias and coords adjusted, and mcaadjustmassctr then total CTR should sum to p
# if only inertias adjusted, use adjusted inertia for decomposition
# Should agree with SCA for p=2
if ( (catype=="mca") & (mcaadjustinertias==TRUE) & (!(mcatype=="doubled")) ) { # allows for sum being over all p variables
  if (mcaadjustmassctr==TRUE) { ColCTR <- ColCTR * p }
  if (mcaadjustcoords==FALSE) { # correct for coordinates not being adjusted
    ColCTR <- ColCTR %*% diag(inertia) %*% diag( 1/genuinesvs^2 )
  }
}

bootca <- new("cabootcrsresults", br=S,
  DataMatrix=X, datasetname=datasetname, rows=rows, columns=cols,
  rowlabels=rowlabels, collabels=collabels, varnames=varnames,
  Rowprinccoord=Fmat, Colprinccoord=Gmat, Rowstdcoord=Fbi, Colstdcoord=Gbi,
  RowCTR=RowCTR, RowREP=RowREP, ColCTR=ColCTR, ColREP=ColREP,
  RowVar=RSBvar, RowCov=RSBcov, ColVar=CSBvar, ColCov=CSBcov,
  inertiasum=inertiasum, inertias=inertias, rowmasses=rowmasses, colmasses=colmasses,
  nboots=nboots, resampledistn=resampledistn,
  multinomialtype=multinomialtype, sameaxisorder=sameaxisorder,
  poissonzeronewmean=poissonzeronewmean, newzeroreset=newzeroreset,
  printdims=printdims, axisvariances=axisvariances,
  bootcritR=bootcritR, bootcritC=bootcritC, usebootcrits=usebootcrits,
  catype=catype, mcatype=mcatype, mcaindividualboot=mcaindividualboot,
  IndicatorMatrix=IndicatorMatrix, Jk=Jk, p=p,
  mcalikertnoise=mcalikertnoise,
  mcaadjustinertias=mcaadjustinertias, mcauseadjustinertiasum=mcauseadjustinertiasum,
  mcaadjustcoords=mcaadjustcoords, mcaadjustmassctr=mcaadjustmassctr, mcasupplementary=mcasupplementary )

if (showresults==TRUE) {
summaryca(bootca, datasetname=datasetname)
plotca(bootca, datasetname=datasetname,
       groupings=groupings, grouplabels=grouplabels,
       plotsymbolscolours=plotsymbolscolours, picsize=NULL, likertarrows=likertarrows,
       othersmonochrome=othersmonochrome, crpercent=crpercent, usebootcrits=usebootcrits,
       mcaoneploteach=mcaoneploteach, mcashowindividuals=mcashowindividuals,
       mcavariablecolours=mcavariablecolours, mcacategorycolours=mcacategorycolours, eps=eps)
}

bootca

}

Try the cabootcrs package in your browser

Any scripts or data that you put into this service are public.

cabootcrs documentation built on March 18, 2022, 7:50 p.m.