View source: R/04_RF_CLUSTERING.R
rf.clustering | R Documentation |
rf.clustering
implements correlation based clustering of risk factors.
Clustering procedure is base on hclust from stats
package.
rf.clustering(db, metric, k = NA)
db |
Data frame of risk factors supplied for clustering analysis. |
metric |
Correlation metric used for distance calculation. Available options are:
|
k |
Number of clusters. If default value ( |
The function rf.clustering
returns a data frame with: risk factors, clusters assigned and
distance to centroid (ordered from smallest to largest).
The last column (distance to centroid) can be used for selection of one or more risk factors per
cluster.
suppressMessages(library(PDtoolkit))
library(rpart)
data(loans)
#clustering using common spearman metric
#first we need to categorize numeric risk factors
num.rf <- sapply(loans, is.numeric)
num.rf <- names(num.rf)[!names(num.rf)%in%"Creditability" & num.rf]
loans[, num.rf] <- sapply(num.rf, function(x)
sts.bin(x = loans[, x], y = loans[, "Creditability"])[[2]])
#replace woe in order to convert to all numeric factors
loans.woe <- replace.woe(db = loans, target = "Creditability")[[1]]
cr <- rf.clustering(db = loans.woe[, -which(names(loans.woe)%in%"Creditability")],
metric = "common spearman",
k = NA)
cr
#select one risk factor per cluster with min distance to centorid
cr %>% group_by(clusters) %>%
slice(which.min(dist.to.centroid))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.