rf.unsupervised: Unsupervised Random Forests
In jeffreyevans/rfUtilities: Random Forests Model Selection and Performance Evaluation

rf.unsupervised

R Documentation

Unsupervised Random Forests

Description

Performs an unsupervised Random Forests for returning clustering, based on dissimilarity, and optional neighbor distance.

Usage

rf.unsupervised(
  x,
  n = 2,
  proximity = FALSE,
  silhouettes = FALSE,
  clara = FALSE,
  ...
)

Arguments

`x`	A matrix/data/frame object to cluster
`n`	Number of clusters
`proximity`	(FALSE/TRUE) Return matrix of neighbor distances based on proximity
`silhouettes`	(FALSE/TRUE) Return adjusted silhouette values
`clara`	(FALSE/TRUE) Use clara partitioning, for large data
`...`	Additional Random Forests arguments

Details

Clusters (k) are derived using the random forests proximity matrix, treating it as dissimilarity neighbor distances. The clusters are identified using a Partitioning Around Medoids where negative silhouette values are assigned to the nearest neighbor.

Value

A vector of clusters or list class object of class "unsupervised", containing the following components:

distances = [Scaled proximity matrix representing dissimilarity neighbor distances]
k = [Vector of cluster labels using adjusted silhouettes]
silhouette.values = [Adjusted silhouette cluster labels and silhouette values]

Author(s)

Jeffrey S. Evans <jeffrey_evans<at>tnc.org>

References

Rand, W.M. (1971) Objective Criteria for the Evaluation of Clustering Methods. Journal of the American Statistical Association, 66:846-850.

Shi, T., Seligson, D., Belldegrun, A.S., Palotie, A., and Horvath, Ss (2005) Tumor Classification by Tissue Microarray Profiling: Random Forest Clustering Applied to Renal Cell Carcinoma. Modern Pathology, 18:547-557.

Examples

 library(randomForest) 
 data(iris)
 n = 4
 clust.iris <- rf.unsupervised(iris[,1:4], n=n, proximity = TRUE, 
                               silhouettes = TRUE)
 clust.iris$k

 mds <- stats:::cmdscale(clust.iris$distances, eig=TRUE, k=n)
   colnames(mds$points) <- paste("Dim", 1:n)
   mds.col <- ifelse(clust.iris$k == 1, rainbow(4)[1],
                ifelse(clust.iris$k == 2, rainbow(4)[2],
 			     ifelse(clust.iris$k == 3, rainbow(4)[3],
 				   ifelse(clust.iris$k == 4, rainbow(4)[4], NA))))
 plot(mds$points[,1:2],col=mds.col, pch=20) 				   
 pairs(mds$points, col=mds.col, pch=20)

jeffreyevans/rfUtilities documentation built on Nov. 12, 2023, 6:52 p.m.