assoc_matrix: Association matrix

View source: R/model_selection.R

assoc_matrixR Documentation

Association matrix

Description

Creates an association matrix based on pairwise measures of association between variables.

Usage

assoc_matrix(
  data,
  var_names = NULL,
  factor_vars = NULL,
  method = c("pearson", "spearman", "kendall", "eta_squared", "cramers_v"),
  use = c("pairwise.complete.obs", "everything", "all.obs", "complete.obs",
    "na.or.complete"),
  bias_correction = F
)

Arguments

data

A data frame with columns from which to retrieve variables to compute associations.

var_names

A vector of variables names from the columns of data to consider.

factor_vars

A vector that includes the names of variables that should be converted to factors. Must be in data or var_names if specified.

method

The type of association to calculate. One of pearson (default), spearman, kendall, eta_squared, cramers_v.

use

A string giving a method for computing covariances in the presence of missing values. This must be one of the strings everything, all.obs, complete.obs, na.or.complete, or pairwise.complete.obs. See the cor function documentation for details on what each value specifies. Only relevant for correlation metrics.

bias_correction

A boolean indicating whether bias correction for Cramer's V should be applied. Only relevant when method is cramers_v.

Details

Calculates pairwise associations for a set of variables. Depending on the measure of association specified, variables will be excluded if the variable type (e.g., nominal or continuous) does not make sense to include in the calculations. Cramer's V will only be calculated between two nominal variables. Eta-squared will only be applied to nominal-continuous variable pairs. Pearson, spearman, and kendall correlations exclude nominal variables with >2 values. Variable exclusions are based on the variable type as defined in data, so these should be verified. Categorical variables with values coded as integers can be mistakenly treated as continuous variables. Nominal variables should be of class factor (not character). Numeric variables should be of class integer or numeric.

For Cramer's V calculations, nominal variables with many categories or in small sample size settings can inflate the strength of association. A bias correction can be applied as detailed in Bergsma (2013).

Bergsma, W. (2013). A bias-correction for Cramer's V and Tschuprow's T. Journal of Korean Statistical Society, 42(3), 323-238.

Value

A data frame with association metric values.


bryancquach/omixjutsu documentation built on Jan. 29, 2023, 3:47 p.m.