correlation_threshold: Remove redundant variables.

Description Usage Arguments Details Value Examples

View source: R/correlation_threshold.R

Description

correlation_threshold returns list of variables such that no two variables have a correlation greater than a specified threshold.

Usage

1
correlation_threshold(variables, sample, cutoff = 0.9, method = "pearson")

Arguments

variables

character vector specifying observation variables.

sample

tbl containing sample used to estimate parameters.

cutoff

threshold between [0,1] that defines the minimum correlation of a selected feature.

method

optional character string specifying method for calculating correlation. This must be one of the strings "pearson" (default), "kendall", "spearman".

Details

correlation_threshold is a wrapper for caret::findCorrelation.

Value

character vector specifying observation variables to be excluded.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
suppressMessages(suppressWarnings(library(magrittr)))
sample <- tibble::tibble(
  x = rnorm(30),
  y = rnorm(30) / 1000
)

sample %<>% dplyr::mutate(z = x + rnorm(30) / 10)
variables <- c("x", "y", "z")

head(sample)
cor(sample)

# `x` and `z` are highly correlated; one of them will be removed

correlation_threshold(variables, sample)

cytominer documentation built on July 8, 2020, 5:08 p.m.