correlation_threshold: Remove redundant variables.

Description Usage Arguments Details Value Examples

View source: R/correlation_threshold.R

Description

correlation_threshold returns list of variables such that no two variables have a correlation greater than a specified threshold.

Usage

1
correlation_threshold(variables, sample, cutoff = 0.9, method = "pearson")

Arguments

variables

character vector specifying observation variables.

sample

tbl containing sample used to estimate parameters.

cutoff

threshold between [0,1] that defines the minimum correlation of a selected feature.

method

optional character string specifying method for calculating correlation. This must be one of the strings "pearson" (default), "kendall", "spearman".

Details

correlation_threshold is a wrapper for caret::findCorrelation.

Value

character vector specifying observation variables to be excluded.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
suppressMessages(suppressWarnings(library(magrittr)))
sample <- tibble::data_frame(
   x = rnorm(30),
   y = rnorm(30)/1000
 )
 
sample %<>% dplyr::mutate(z = x + rnorm(30) / 10)
variables <- c("x", "y", "z")

head(sample)
cor(sample)

# `x` and `z` are highly correlated; one of them will be removed

correlation_threshold(variables, sample)

cytominer documentation built on Sept. 18, 2017, 1:03 a.m.