assoc_matrix: Association matrix
In bryancquach/omixjutsu: Utilities for omics analysis tasks

assoc_matrix

R Documentation

Association matrix

Description

Creates an association matrix based on pairwise measures of association between variables.

Usage

assoc_matrix(
  data,
  var_names = NULL,
  factor_vars = NULL,
  method = c("pearson", "spearman", "kendall", "eta_squared", "cramers_v"),
  use = c("pairwise.complete.obs", "everything", "all.obs", "complete.obs",
    "na.or.complete"),
  bias_correction = F
)

Arguments

`data`	A data frame with columns from which to retrieve variables to compute associations.
`var_names`	A vector of variables names from the columns of `data` to consider.
`factor_vars`	A vector that includes the names of variables that should be converted to factors. Must be in `data` or `var_names` if specified.
`method`	The type of association to calculate. One of `pearson` (default), `spearman`, `kendall`, `eta_squared`, `cramers_v`.
`use`	A string giving a method for computing covariances in the presence of missing values. This must be one of the strings `everything`, `all.obs`, `complete.obs`, `na.or.complete`, or `pairwise.complete.obs`. See the `cor` function documentation for details on what each value specifies. Only relevant for correlation metrics.
`bias_correction`	A boolean indicating whether bias correction for Cramer's V should be applied. Only relevant when `method` is `cramers_v`.

Details

Calculates pairwise associations for a set of variables. Depending on the measure of association specified, variables will be excluded if the variable type (e.g., nominal or continuous) does not make sense to include in the calculations. Cramer's V will only be calculated between two nominal variables. Eta-squared will only be applied to nominal-continuous variable pairs. Pearson, spearman, and kendall correlations exclude nominal variables with >2 values. Variable exclusions are based on the variable type as defined in data, so these should be verified. Categorical variables with values coded as integers can be mistakenly treated as continuous variables. Nominal variables should be of class factor (not character). Numeric variables should be of class integer or numeric.

For Cramer's V calculations, nominal variables with many categories or in small sample size settings can inflate the strength of association. A bias correction can be applied as detailed in Bergsma (2013).

Bergsma, W. (2013). A bias-correction for Cramer's V and Tschuprow's T. Journal of Korean Statistical Society, 42(3), 323-238.