R/table_collapse.R

#' Collapse rows and columns of a table on the basis of hierarchical clustering
#'
#' This function allows to collapse the rows and columns of the input
#' contingency table on the basis of the results of a hierarchical
#' clustering.\cr
#'
#' The function returns a list containing the input table, the rows-collapsed
#' table, the columns-collapsed table, and a table with both rows and columns
#' collapsed. It optionally returns two dendrograms (one for the row profiles,
#' one for the column profiles) representing the clusters.\cr 
#' 
#' The hierarchical clustering is obtained using the FactoMineR's 'HCPC()' function.\cr
#' Rationale: clustering rows and/or columns of a table could interest the users
#' who want to know where a "significant association is concentrated" by
#' "collecting together similar rows (or columns) in discrete groups" (Greenacre
#' M, Correspondence Analysis in Practice, Boca Raton-London-New York,
#' Chapman&Hall/CRC 2007, pp. 116, 120). Rows and/or columns are progressively
#' aggregated in a way in which every successive merging produces the smallest
#' change in the table's inertia. The underlying logic lies in the fact that
#' rows (or columns) whose merging produces a small change in table's inertia
#' have similar profiles. This procedure can be thought of as maximizing the
#' between-group inertia and minimizing the within-group inertia.\cr 
#' A method essentially similar is that provided by the 'FactoMineR' package (Husson F,
#' Le S, Pages J, Exploratory Multivariate Analysis by Example Using R, Boca
#' Raton-London-New York, CRC Press, pp. 177-185). The cluster solution is based
#' on the following rationale: a division into Q (i.e., a given number of)
#' clusters is suggested when the increase in between-group inertia attained
#' when passing from a Q-1 to a Q partition is greater than that from a Q to a
#' Q+1 clusters partition. In other words, during the process of rows (or
#' columns) merging, if the following aggregation raises highly the
#' within-group inertia, it means that at the further step very different
#' profiles are being aggregated.
#'
#' @param data Name of the dataset (must be in dataframe format)
#' @param graph Logical (TRUE/FALSE); it takes TRUE if the user wants the row
#'   and colum profiles dendrograms to be produced.
#' @keywords table.collapse
#' @export
#' @examples
#' data(greenacre_data)
#'
#' #collapse the table, store the results into an object called 'res', and return 2 dendrograms
#' res <- table.collapse(greenacre_data, graph=TRUE)
#' 
#' @seealso \code{\link[FactoMineR]{HCPC}} , \code{\link[FactoMineR]{plot.CA}}
#' 
table.collapse <- function (data, graph=FALSE) {
  
  clust=NULL
  
  clst.rows <- HCPC(data, nb.clust=-1, cluster.CA="rows", graph=FALSE)
  clst.cols <- HCPC(as.data.frame(t(data)), nb.clust=-1, cluster.CA="rows", graph=FALSE)
  rows.clust <- clst.rows$data.clust
  cols.clust <- clst.cols$data.clust
  
  row.collaps.table <- aggregate(. ~ clust, data=rows.clust, sum)
  row.nms <- tapply(rownames(data), rows.clust$clust, paste, collapse = "-")
  rownames(row.collaps.table) <- row.nms
  final.row.table <- row.collaps.table[-c(1)]
  
  col.collaps.table <- aggregate(. ~ clust, data=cols.clust, sum)
  col.nms <- tapply(colnames(data), cols.clust$clust, paste, collapse = "-")
  rownames(col.collaps.table) <- col.nms
  pre.final.table <- subset(col.collaps.table, select = -c(clust))
  final.col.table <- as.data.frame(t(pre.final.table))
  
  final.col.table$clust <- rows.clust$clust
  collaps.table.all <- aggregate(. ~ clust, data=final.col.table, sum)
  rownames(collaps.table.all) <- row.nms
  collaps.table.all <- collaps.table.all[-c(1)]  
  
  if(graph==TRUE){
    plot(clst.rows, choice="tree") | plot(clst.cols, choice="tree")
  } else {}
  
  results <- list("original.table"=data,
                  "rows.collaps.table"=final.row.table, 
                  "cols.collaps.table" =subset(final.col.table, select = -c(clust)), 
                  "all.collaps.table"=collaps.table.all)
  return(results)
}

Try the CAinterprTools package in your browser

Any scripts or data that you put into this service are public.

CAinterprTools documentation built on July 8, 2020, 5:15 p.m.