calculate_cors: Calculate correlation coefficients
In corrgrapher: Explore Correlations Between Variables in a Machine Learning Model

Description Usage Arguments Value X argument Default functions Custom functions See Also Examples

View source: R/calculate_cors.R

Calculate correlation coefficients between variables in a data.frame, matrix or table using 3 different functions for 3 different possible pairs of vairables:

numeric - numeric
numeric - categorical
categorical - categorical

calculate_cors(
  x,
  num_num_f = NULL,
  num_cat_f = NULL,
  cat_cat_f = NULL,
  max_cor = NULL
)

## S3 method for class 'explainer'
calculate_cors(
  x,
  num_num_f = NULL,
  num_cat_f = NULL,
  cat_cat_f = NULL,
  max_cor = NULL
)

## S3 method for class 'matrix'
calculate_cors(
  x,
  num_num_f = NULL,
  num_cat_f = NULL,
  cat_cat_f = NULL,
  max_cor = NULL
)

## S3 method for class 'table'
calculate_cors(
  x,
  num_num_f = NULL,
  num_cat_f = NULL,
  cat_cat_f = NULL,
  max_cor = NULL
)

## Default S3 method:
calculate_cors(
  x,
  num_num_f = NULL,
  num_cat_f = NULL,
  cat_cat_f = NULL,
  max_cor = NULL
)

`x`	object used to select method. See more below.
`num_num_f`	A `function` used to determine correlation coefficient between a pair of numeric variables
`num_cat_f`	A `function` used to determine correlation coefficient between a pair of numeric and categorical variable
`cat_cat_f`	A `function` used to determine correlation coefficient between a pair of categorical variables
`max_cor`	A number used to indicate absolute correlation (like 1 in `cor`). Must be supplied if any of `*_f` arguments is supplied.

A symmetrical matrix A of size n x n, where n - amount of columns in x (or dimensions for table). The value at A(i,j) is the correlation coefficient between ith and jth variable. On the diagonal, values from max_cor are set.

When x is a data.frame, all columns of numeric type are treated as numeric variables and all columns of factor type are treated as categorical variables. Columns of other types are ignored.

When x is a matrix, it is converted to data.frame using as.data.frame.matrix.

When x is a explainer, the tests are performed on its data element.

When x is a table, it is treated as contingency table. Its dimensions must be named, but none of them may be named Frequency.

By default, the function calculates p_value of statistical tests ( cor.test for 2 numeric, chisq.test for factor and kruskal.test for mixed).

Then, the correlation coefficients are calculated as -log10(p_value). Any results above 100 are treated as absolute correlation and cut to 100.

The results are then divided by 100 to fit inside [0,1].

If only numeric data was supplied, the function used is cor.test.

Creating consistent measures for correlation coefficients, which are comparable for different kinds of variables, is a non-trivial task. Therefore, if user wishes to use custom function for calculating correlation coefficients, he must provide all necessary functions. Using a custom function for one case and a default for the other is consciously not supported. Naturally, user may supply copies of default functions at his own responsibility.

Function calculate_cors chooses, which parameters of *_f are required based on data supported. For example, for a matrix with numeric data only num_num_f is required. On the other hand, for a table only cat_cat_f is required.

All *_f parameters must be functions, which accept 2 parameters (numeric or factor vectors respectively) and return a single number from [0,max_num]. The num_cat_f must accept numeric argument as first and factor argument as second.

cor.test, chisq.test, kruskal.test

data(mtcars)
# Make sure, that categorical variables are factors
mtcars$vs <- factor(mtcars$vs, labels = c('V-shaped', 'straight'))
mtcars$am <- factor(mtcars$am, labels = c('automatic', 'manual'))
calculate_cors(mtcars)

# For a table:
data(HairEyeColor)
calculate_cors(HairEyeColor)

# Custom functions:
num_mtcars <- mtcars[,-which(colnames(mtcars) %in% c('vs', 'am'))]
my_f <- function(x,y) cor.test(x, y, method = 'spearman', exact=FALSE)$estimate
calculate_cors(num_mtcars, num_num_f = my_f, max_cor = 1)