AutoSpearman: AutoSpearman - an automated variable selection approach based...

Description Usage Arguments Examples

View source: R/AutoSpearman.R

Description

AutoSpearman is an automated metric selection approach based on a Spearman rank correlation test and a VIF analysis. The approach is made up of two steps: Step 1 - Automatically select non-correlated metrics based on a Spearman rank correlation test. To do so, we start from the pair of the strongest correlated metrics. Since these two metrics can be linearly predicted with each other, one of these two metrics must be removed while selecting the other. We select the metric that has the lowest Spearman correlation coefficient with the other metrics that are not in the pair. We repeat this process until all metrics have their Spearman correlation coefficient below a threshold value (default = 0.7). Step 2 - Automatically select non-correlated metrics based on a Variance Inflation Factor analysis. To do so, we exclude the metric that has the highest VIF score above a threshold value (default = 5) since the metric is the most predictable by others. We repeat an application of VIF analysis on the remaining metrics until all remaining metrics have their VIF scores below a threshold value and free from multicollinearity. Finally, AutoSpearman select only non-correlated metrics and produces a simpler non-correlated representative of all metrics.

Usage

1
2
AutoSpearman(dataset, metrics, spearman.threshold = 0.7,
  vif.threshold = 5, groups = FALSE, verbose = F)

Arguments

dataset

a data frame for data

metrics

a characters or a vector of characters for independent variables

spearman.threshold

a numeric for a threshold of Spearman rank correlation test (default = 0.7)

vif.threshold

a numeric for a threshold of VIF score (default = 5)

verbose

TRUE for printing

Examples

1
2
Data = loadDefectDataset('groovy-1_5_7','jira')
AutoSpearman(dataset = Data$data, metrics = Data$indep)

software-analytics/Rnalytica documentation built on Aug. 16, 2020, 9:38 p.m.