UnsupRF: Unsupervised random forest clustering with fpc.

Description Usage Arguments Value Examples

Description

Unsupervised random forest clustering. A Random forest (RF) classifier is trained to predict the data labeled as class “True.Data" and a synthetic data labeled as class “Synthetic.Data". The synthetic data is generated by taking a random sample from each dimension of the true data, with or without replacement (see RFdist). The dissimilarity matrix from RFdist is then passed to the algorithms in the "flexible point clustering" fpc package for clustering and selection of optimal number of clusters through the bootstrap cluster-wise stability method.

Usage

1
2
3
4
5
6
7
8
9
UnsupRF(data, ...)

## Default S3 method:
UnsupRF(data, RFdist, B = 10,
  clustermethod = pamkCBI, classification = "centroid", krange = 2:5,
  kopt = 2, run.boot = FALSE, fun = "sum", ...)

## S3 method for class 'UnsupRF'
print(x, ...)

Arguments

data

data.frame or matrix

...

further arguments passed to or from other methods.

RFdist

RF distance matrix computed from RFdist.

B

number of bootstraps

clustermethod

clustering method, options are pamkCBI, or claraCBI, or hclustCBI. Not to sure about hclustCBI see the fpc package. pamkCBI is recommended for RF dissimilarity matrix, but we have found standard hclust in base R works well with Ward's minimum variance creterion

classification

type of prediction for finding optimal number of clusters see nselectboot.

krange

integer vector; numbers of clusters to be tried

kopt

user provided optimal number of clusters

run.boot

(logical) run bootstrap cluster-wise stability ?

fun

function to determine mediods, should be mean, median, or sum. See mediod

x

object of class UnsupRF

Value

A list with elements:

  1. cluster.model: The cluster model

  2. cluster: cluster memberships

  3. kopt: optimal number of clusters

  4. mediods: a mediod object

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
## Not run: 
set.seed(12345)
data(iris)
dat <- iris[, -5]
RF.dist <- RFdist(data=dat, ntree = 10, no.rep=20, syn.type = "permute", 
               importance=TRUE, oob.prox=TRUE, proxConver=FALSE)
#            
Clus.res <- UnsupRF(data = dat, RFdist=RF.dist$RFdist, 
             B =  5, clustermethod=pamkCBI, classification="centroid", 
             krange= 2:4, kopt=2, run.boot = TRUE)
 print(Clus.res)            
clusters <- Clus.res$clusters 
kopt <- Clus.res$kopt # optimal number of clusters 

## End(Not run)

nguforche/UnsupRF documentation built on May 5, 2019, 4:51 p.m.