correlation_threshold: Remove redundant variables.

View source: R/correlation_threshold.R

correlation_thresholdR Documentation

Remove redundant variables.

Description

correlation_threshold returns list of variables such that no two variables have a correlation greater than a specified threshold.

Usage

correlation_threshold(variables, sample, cutoff = 0.9, method = "pearson")

Arguments

variables

character vector specifying observation variables.

sample

tbl containing sample used to estimate parameters.

cutoff

threshold between [0,1] that defines the minimum correlation of a selected feature.

method

optional character string specifying method for calculating correlation. This must be one of the strings "pearson" (default), "kendall", "spearman".

Details

correlation_threshold is a wrapper for caret::findCorrelation.

Value

character vector specifying observation variables to be excluded.

Examples


suppressMessages(suppressWarnings(library(magrittr)))
sample <- tibble::tibble(
  x = rnorm(30),
  y = rnorm(30) / 1000
)

sample %<>% dplyr::mutate(z = x + rnorm(30) / 10)
variables <- c("x", "y", "z")

head(sample)
cor(sample)

# `x` and `z` are highly correlated; one of them will be removed

correlation_threshold(variables, sample)

cytomining/cytominer documentation built on July 5, 2023, 3:34 a.m.