rf.unsupervised: Unsupervised Random Forests

View source: R/rf.unsupervised.R

rf.unsupervisedR Documentation

Unsupervised Random Forests

Description

Performs an unsupervised Random Forests for returning clustering, based on dissimilarity, and optional neighbor distance.

Usage

rf.unsupervised(
  x,
  n = 2,
  proximity = FALSE,
  silhouettes = FALSE,
  clara = FALSE,
  ...
)

Arguments

x

A matrix/data/frame object to cluster

n

Number of clusters

proximity

(FALSE/TRUE) Return matrix of neighbor distances based on proximity

silhouettes

(FALSE/TRUE) Return adjusted silhouette values

clara

(FALSE/TRUE) Use clara partitioning, for large data

...

Additional Random Forests arguments

Details

Clusters (k) are derived using the random forests proximity matrix, treating it as dissimilarity neighbor distances. The clusters are identified using a Partitioning Around Medoids where negative silhouette values are assigned to the nearest neighbor.

Value

A vector of clusters or list class object of class "unsupervised", containing the following components:

  • distances = [Scaled proximity matrix representing dissimilarity neighbor distances]

  • k = [Vector of cluster labels using adjusted silhouettes]

  • silhouette.values = [Adjusted silhouette cluster labels and silhouette values]

Author(s)

Jeffrey S. Evans <jeffrey_evans<at>tnc.org>

References

Rand, W.M. (1971) Objective Criteria for the Evaluation of Clustering Methods. Journal of the American Statistical Association, 66:846-850.

Shi, T., Seligson, D., Belldegrun, A.S., Palotie, A., and Horvath, Ss (2005) Tumor Classification by Tissue Microarray Profiling: Random Forest Clustering Applied to Renal Cell Carcinoma. Modern Pathology, 18:547-557.

See Also

randomForest for ... options

pam for details on Partitioning Around Medoids (PAM)

clara for details on Clustering Large Applications (clara)

Examples

 library(randomForest) 
 data(iris)
 n = 4
 clust.iris <- rf.unsupervised(iris[,1:4], n=n, proximity = TRUE, 
                               silhouettes = TRUE)
 clust.iris$k

 mds <- stats:::cmdscale(clust.iris$distances, eig=TRUE, k=n)
   colnames(mds$points) <- paste("Dim", 1:n)
   mds.col <- ifelse(clust.iris$k == 1, rainbow(4)[1],
                ifelse(clust.iris$k == 2, rainbow(4)[2],
 			     ifelse(clust.iris$k == 3, rainbow(4)[3],
 				   ifelse(clust.iris$k == 4, rainbow(4)[4], NA))))
 plot(mds$points[,1:2],col=mds.col, pch=20) 				   
 pairs(mds$points, col=mds.col, pch=20)
  

jeffreyevans/rfUtilities documentation built on Nov. 12, 2023, 6:52 p.m.