generateSyntheticClass: Generate Synthetic Second Class for Unsupervised Learning

Description Usage Arguments Value References Examples

Description

To use Random Forests for unsupervised learning, the training set x is treated as a single class. This function creates a synthetic second class for classification by sampling at random from the univariate distributions of the original data. This is useful, for example, for clustering.

Usage

1

Arguments

x

A big.matrix, matrix or data.frame containing the predictor variables of the original training set.

...

If x is a big.matrix, these arguments will be passed on to big.matrix to control how the big.matrix for the two-class training set is created.

Value

A list containing the following components:

x

The two-class training set, comprising the original training set and the synthesized second class. It will be an object of the same type as the argument x.

y

A factor vector that labels the two classes in x.

References

Breiman, L. (2001). Random forests. Machine learning, 45(1), 5-32.

Breiman, L. & Cutler, A. (n.d.). Random Forests. Retrieved from http://www.stat.berkeley.edu/~breiman/RandomForests/cc_home.htm.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
# Perform unsupervised learning on the Cars93 data set.

# Load data.
data(Cars93, package="MASS")

# Create second synthetic class for unsupervised learning.
newdata <- generateSyntheticClass(Cars93)

# Select variables with which to train model.
vars <- c(4:22)

# Run model, grow 30 trees.
forest <- bigrfc(newdata$x, newdata$y, ntree=30L, varselect=vars,
                 cachepath=NULL)

aloysius-lim/bigrf documentation built on May 11, 2019, 11:20 p.m.