prototypes-methods: Compute Class Prototypes
In aloysius-lim/bigrf: Big Random Forests: Classification and Regression Forests for Large Data Sets

Description Usage Arguments Details Value Methods References Examples

Compute the prototypes for each class in the training set, which provide a picture of how each variable relates to the classification. They are useful representations of a "typical" example of each class.

1
2
3

## S4 method for signature 'bigcforest,bigrfprox'
prototypes(forest, prox, nprot=1L, x=NULL,
    reuse.cache=FALSE, trace=0L)

`forest`	A random forest of class `"bigcforest"`.
`prox`	A proximity matrix of class `"bigrfprox"`.
`nprot`	The number of prototypes to compute for each class. Default: `1`.
`x`	A `big.matrix`, `matrix` or `data.frame` of predictor variables. The data must not have changed, otherwise unexpected modelling results may occur. If a `matrix` or `data.frame` is specified, it will be converted into a `big.matrix` for computation. Optional if `reuse.cache` is `TRUE`.
`reuse.cache`	`TRUE` to reuse disk caches of the `big.matrix` `x` from the initial building of the random forest, which may significantly reduce initialization time for large data sets. If `TRUE`, the user must ensure that the files ‘x’ and ‘x.desc’ in `forest@cachepath` have not been modified or deleted.
`trace`	`0` for no verbose output. `1` to print verbose output. Default: `0`.

Prototypes are computed using proximities, as follows. For the first prototype for class c, find the example i with the largest number of class c examples among its k nearest neighbors. Among these examples, find the 25th, 50th and 75th percentiles of the numeric variables, and most frequent level of the categorical variables. For the second prototype, the procedure is repeated, considering only examples that are not among the k examples used to compute the first prototype, and so on.

A list with the following components:

nprotfound:: Number of prototypes found for each class.
clustersize:: forest@ynclass by nprot matrix indicating the number of examples used to compute each prototype.
prot:: forest@ynclass by nprot by length(forest@varselect) by 3 array containing the raw prototype values. For numeric variables, the prototypes are represented by the medians, with the 25th and 75th percentiles given as estimates of the prototype stability. For categorical variables, the values are the most frequent level.
prot.std:: forest@ynclass by nprot by length(forest@varselect) by 3 array containing standardized prototype values. Prototype values for numeric variables are subtracted by the 5th percentile, then divided by the difference between the 95th and 5th percentile. Prototype values for categorical variables are divided by the number of levels in that variable.
levelsfreq:: List of length length(forest@varselect) containing, for each categorical variable v, an forest@ynclass by nprot by forest@varnlevels[v] array that indicate the frequency of levels used to compute the prototype level. These are useful for estimating prototype stability for categorical variables.

signature(forest = "bigcforest", prox = "bigrfprox"): Compute prototypes for a classification random forest.

Breiman, L. (2001). Random forests. Machine learning, 45(1), 5-32.

Breiman, L. & Cutler, A. (n.d.). Random Forests. Retrieved from http://www.stat.berkeley.edu/~breiman/RandomForests/cc_home.htm.

# Classify cars in the Cars93 data set by type (Compact, Large,
# Midsize, Small, Sporty, or Van).

# Load data.
data(Cars93, package="MASS")
x <- Cars93
y <- Cars93$Type

# Select variables with which to train model.
vars <- c(4:22)

# Run model, grow 30 trees.
forest <- bigrfc(x, y, ntree=30L, varselect=vars, cachepath=NULL)

# Calculate proximity matrix.
prox <- proximities(forest, cachepath=NULL)

# Compute prototypes.
prot <- prototypes(forest, prox, x=x)

# Plot first prototypes, using one colour for each class.
plot(seq_along(vars), prot$prot.std[1, 1, , 2], type="l", col=1,
     ylim=c(min(prot$prot.std[, 1, , 2]), max(prot$prot.std[, 1, , 2])))
for (i in 2:length(levels(y))) {
    lines(seq_along(vars), prot$prot.std[i, 1, , 2], type="l", col=i)
}

# Plot first prototype for class 1, including quartile values for numeric
# variables.
plot(seq_along(vars), prot$prot.std[1, 1, , 1], type="l", col=1,
     ylim=c(min(prot$prot.std[1, 1, , ]), max(prot$prot.std[1, 1, , ])))
for (i in 2:3) {
    lines(seq_along(vars), prot$prot.std[1, 1, , i], type="l", col=i)
}

aloysius-lim/bigrf documentation built on May 11, 2019, 11:20 p.m.

aloysius-lim/bigrf index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

aloysius-lim/bigrf
Big Random Forests: Classification and Regression Forests for Large Data Sets

prototypes-methods: Compute Class Prototypes
In aloysius-lim/bigrf: Big Random Forests: Classification and Regression Forests for Large Data Sets

Description

Usage

Arguments

Details

Value

Methods

References

Examples

Related to prototypes-methods in aloysius-lim/bigrf...

R Package Documentation

Browse R Packages

We want your feedback!

aloysius-lim/bigrf Big Random Forests: Classification and Regression Forests for Large Data Sets

prototypes-methods: Compute Class Prototypes In aloysius-lim/bigrf: Big Random Forests: Classification and Regression Forests for Large Data Sets

Description

Usage

Arguments

Details

Value

Methods

References

Examples

Related to prototypes-methods in aloysius-lim/bigrf...

R Package Documentation

Browse R Packages

We want your feedback!

aloysius-lim/bigrf
Big Random Forests: Classification and Regression Forests for Large Data Sets

prototypes-methods: Compute Class Prototypes
In aloysius-lim/bigrf: Big Random Forests: Classification and Regression Forests for Large Data Sets