treatment_corr: Diagnosis and removal of highly correlated variables
In alookr: Model Classifier for Binary Classification

treatment_corr

R Documentation

Diagnosis and removal of highly correlated variables

Description

The treatment_corr() diagnose pairs of highly correlated variables or remove on of them.

Usage

treatment_corr(.data, corr_thres = 0.8, treat = TRUE, verbose = TRUE)

Arguments

`.data`	a data.frame or a `tbl_df`.
`corr_thres`	numeric. Set a threshold to detecting variables when correlation greater then threshold.
`treat`	logical. Set whether to removing variables
`verbose`	logical. Set whether to echo information to the console at runtime.

Details

The correlation coefficient of pearson is obtained for continuous variables and the correlation coefficient of spearman for categorical variables.

Value

An object of data.frame or train_df. and return value is an object of the same type as the .data argument. However, several variables can be excluded by correlation between variables.

Examples

# numerical variable
x1 <- 1:100
set.seed(12L)
x2 <- sample(1:3, size = 100, replace = TRUE) * x1 + rnorm(1)
set.seed(1234L)
x3 <- sample(1:2, size = 100, replace = TRUE) * x1 + rnorm(1)

# categorical variable
x4 <- factor(rep(letters[1:20], time = 5))
set.seed(100L)
x5 <- factor(rep(letters[1:20 + sample(1:6, size = 20, replace = TRUE)], time = 5))
set.seed(200L)
x6 <- factor(rep(letters[1:20 + sample(1:3, size = 20, replace = TRUE)], time = 5))
set.seed(300L)
x7 <- factor(sample(letters[1:5], size = 100, replace = TRUE))

exam <- data.frame(x1, x2, x3, x4, x5, x6, x7)
str(exam)
head(exam)

# default case
treatment_corr(exam)

# not removing variables
treatment_corr(exam, treat = FALSE)

# Set a threshold to detecting variables when correlation greater then 0.9
treatment_corr(exam, corr_thres = 0.9, treat = FALSE)

# not verbose mode
treatment_corr(exam, verbose = FALSE)

alookr documentation built on May 29, 2024, 10:38 a.m.