treatment_corr: Diagnosis and removal of highly correlated variables

View source: R/preprocess.R

treatment_corrR Documentation

Diagnosis and removal of highly correlated variables

Description

The treatment_corr() diagnose pairs of highly correlated variables or remove on of them.

Usage

treatment_corr(.data, corr_thres = 0.8, treat = TRUE, verbose = TRUE)

Arguments

.data

a data.frame or a tbl_df.

corr_thres

numeric. Set a threshold to detecting variables when correlation greater then threshold.

treat

logical. Set whether to removing variables

verbose

logical. Set whether to echo information to the console at runtime.

Details

The correlation coefficient of pearson is obtained for continuous variables and the correlation coefficient of spearman for categorical variables.

Value

An object of data.frame or train_df. and return value is an object of the same type as the .data argument. However, several variables can be excluded by correlation between variables.

Examples

# numerical variable
x1 <- 1:100
set.seed(12L)
x2 <- sample(1:3, size = 100, replace = TRUE) * x1 + rnorm(1)
set.seed(1234L)
x3 <- sample(1:2, size = 100, replace = TRUE) * x1 + rnorm(1)

# categorical variable
x4 <- factor(rep(letters[1:20], time = 5))
set.seed(100L)
x5 <- factor(rep(letters[1:20 + sample(1:6, size = 20, replace = TRUE)], time = 5))
set.seed(200L)
x6 <- factor(rep(letters[1:20 + sample(1:3, size = 20, replace = TRUE)], time = 5))
set.seed(300L)
x7 <- factor(sample(letters[1:5], size = 100, replace = TRUE))

exam <- data.frame(x1, x2, x3, x4, x5, x6, x7)
str(exam)
head(exam)

# default case
treatment_corr(exam)

# not removing variables
treatment_corr(exam, treat = FALSE)

# Set a threshold to detecting variables when correlation greater then 0.9
treatment_corr(exam, corr_thres = 0.9, treat = FALSE)

# not verbose mode
treatment_corr(exam, verbose = FALSE)


alookr documentation built on May 29, 2024, 10:38 a.m.