splitters | R Documentation |
Functions to split a data set into two parts.
registerSplitter(tag, description, FUN)
availableSplitters()
findSplit(data, metric = "euclidean", splitter = names(splitters),
LR = c("L", "R"))
evalSplit(split, data, metric, tool = c("sw", "ssq"), N = 100)
tag |
A character string representing a short abbreviation used to identify and eventually call the associated "splitter" functoin that can dichtomize the data. |
description |
A character string containing the full name of the splitting algorithm. Potentially, this can be used in data summaries or plots. |
FUN |
A "splitter"; that is, a function that takes a distance matrix as input and returns a numeric vector that assigns samples in the data set to one of two classes labeled "1" or "2". |
data |
The data set to be split into two parts. |
metric |
A distance metric. This metric should be one of the values supported
by the |
splitter |
A tag that uniquely identifies one of the splitting functions that have been registered. Built-in registered "splitters", with their tags, include: (hc) agglomeratve hierarchical clustering; (km) k-means clustering; (dv) divisive heirarchical clustering, and (ap) affinity propagation clustering. |
LR |
A character vector of length two denoting the laels to be used for the two classes after splitting the data. |
split |
A clustering factor, typically the output of a call to the
|
tool |
The tool/splitter to be used to evalaute the quality of the split. Built-in splitters include the mean silhouette width (sw) or the within-group sum of square distances (ssq). |
N |
Number of random splits used to estimate emoirical p-value. |
The core idea of this package is to construct decision trees that simulataneouly learn how to cluster the items in a data set while also learning a classifier that can predict the cluster assignments of new data. We support four built-in clustering algortihms to learn how to split a single data set into two parts:
agglomerative hierarchical clustering as implemented in
the hclust
function, followed by
cutree
.
kmeans
clustering with k = 2.
divisive hierarchical clustering as implemented in the
diana
function.
affinity propagation clustering, as implemented in the
apcluster
function.
The registerSplitter
funciton invisibly returns the tag
assigned to the new splitter.
The findSplit
function returns a two-level factor with length
equal to the number of columns (samples) in the data set.
The evalSplit
returns a list of length two, containing a
stat
istic (either the men silhouette width or the within-group
sum of square distances, depending on the tool being used) along with
an estimated empirical pv
alue.
Kevin R. Coombes <krc@silicovore.com>
nr <- 200 # features
nc <- 60 # samples
set.seed(80123)
comat <- matrix(rnorm(nr*nc, 0, 1), nrow = nr)
dimnames(comat) <- list(paste0("F", 1:nr),
paste0("S", 1:nc))
splay <- rep(c(2, -2), each = nc/2)
for(J in 1:30) comat[J, ] <- comat[J, ] + splay
C.hc <- findSplit(comat, splitter = "hc")
evalSplit(C.hc, comat, "euclid", "sw")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.