WOEClust_hclust: Variable Clustering

Description Usage Arguments Value Examples

View source: R/Rprofet.R

Description

Function that implements hierarchical clustering on the variables to be used as a form of variable selection.

Usage

1
WOEClust_hclust(object, id, target, num_clusts, method = "ward.D")

Arguments

object

A WOEProfet object containing dataframes with binned and WOE values.

id

ID variable.

target

A binary target variable.

num_clusts

Number of desired clusters.

method

Clustering method to be used. This should be one of "ward.D", "ward.D2", "single", "average", "mcquitty", "median",or "centroid".

Value

A dataframe with the name of all the variables to be clustered, the corresponding cluster and the information value for each variable.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
mydata <- ISLR::Default
mydata$ID = seq(1:nrow(mydata)) ## make the ID variable
mydata$default<-ifelse(mydata$default=="Yes",1,0) ## Creating numeric binary target variable

## create two new variables from bivariate normal
sigma <- matrix(c(45000,-3000,-3000, 55000), nrow = 2)
set.seed(10)
newvars <- MASS::mvrnorm(nrow(mydata),
                         mu=c(1000,200), Sigma=sigma)

mydata$newvar1 <- newvars[,1]
mydata$newvar2 <- newvars[,2]

binned <- BinProfet(mydata, id= "ID", target= "default", num.bins = 5) ## Binning variables

WOE_dat <- WOEProfet(binned, "ID","default")

## Cluster variables by WOEClust_hclust
clusters <- WOEClust_hclust(WOE_dat, id="ID", target="default", num_clusts=3)
clusters

Rprofet documentation built on April 1, 2020, 5:11 p.m.