correlation: Correlation Coefficients: Pearson, Spearman, Kendall,...

View source: R/correlation.R

correlationR Documentation

Correlation Coefficients: Pearson, Spearman, Kendall, Chatterjee, and Biweight Midcorrelation

Description

Computes various correlation coefficients between a specified response variable and each of the remaining variables in a given data frame or tibble. The available correlation methods are Pearson's product-moment correlation (parametric), Spearman's rank correlation, Kendall's tau correlation (non-parametric), Chatterjee's new correlation coefficient, and the biweight midcorrelation (a robust correlation measure).

Usage

correlation(
  x,
  var,
  method = "pearson",
  plot = FALSE,
  color = "#111D71",
  interactive = FALSE
)

Arguments

x

A data frame or tibble containing the variables of interest.

var

A character string specifying the name of the response variable.

method

A character string indicating the correlation method to use. Allowed values are "pearson", "spearman", "kendall", "chatterjee", or "bicor" (for biweight midcorrelation). The default is "pearson".

plot

A logical value indicating whether to produce a visualization of the correlations. Default is FALSE (no plot).

color

A character string specifying the color to use for the plot. Default is "#111D71".

interactive

A logical value indicating whether to create an interactive plot using plotly. Default is FALSE (static ggplot2 plot).

Details

The Pearson correlation coefficient measures the linear relationship between two continuous variables and is suitable when the data follows a bivariate normal distribution. The Spearman and Kendall correlations are non-parametric measures of monotonic association, making them suitable for non-linear relationships and when the data deviates from normality. The Chatterjee correlation coefficient is a recently proposed measure that aims to address some limitations of existing correlation coefficients, particularly for heavy-tailed distributions and in the presence of outliers. The biweight midcorrelation is a robust correlation measure that downweights the influence of outliers and is recommended when the data contains extreme values or deviates significantly from normality.

Value

A list containing:

  • correlation: A tibble with columns for the variable name, correlation value, and method used.

  • plot: If plot = TRUE, a ggplot2 object (or a plotly object if interactive = TRUE).

Author(s)

Christian L. Goueguel

References

  • Chatterjee, S. (2021). A new coefficient of correlation. Journal of the American Statistical Association, 116(536):2009-2022.

  • Wilcox, R. (2012). Introduction to robust estimation and hypothesis testing (3rd ed.). Academic Press. (ISBN 978-0123869838).


ChristianGoueguel/specProc documentation built on Nov. 9, 2024, 3:23 p.m.