Description Usage Arguments Details Value Methods References Examples
Compute the prototypes for each class in the training set, which provide a picture of how each variable relates to the classification. They are useful representations of a "typical" example of each class.
1 2 3 | ## S4 method for signature 'bigcforest,bigrfprox'
prototypes(forest, prox, nprot=1L, x=NULL,
reuse.cache=FALSE, trace=0L)
|
forest |
A random forest of class |
prox |
A proximity matrix of class |
nprot |
The number of prototypes to compute for each class. Default: |
x |
A |
reuse.cache |
|
trace |
|
Prototypes are computed using proximities, as follows. For the first prototype for class c, find the example i with the largest number of class c examples among its k nearest neighbors. Among these examples, find the 25th, 50th and 75th percentiles of the numeric variables, and most frequent level of the categorical variables. For the second prototype, the procedure is repeated, considering only examples that are not among the k examples used to compute the first prototype, and so on.
A list with the following components:
nprotfound
:Number of prototypes found for each class.
clustersize
:forest@ynclass
by nprot
matrix indicating the number of examples used to compute each prototype.
prot
:forest@ynclass
by nprot
by length(forest@varselect)
by 3
array containing the raw prototype values. For numeric variables, the prototypes are represented by the medians, with the 25th and 75th percentiles given as estimates of the prototype stability. For categorical variables, the values are the most frequent level.
prot.std
:forest@ynclass
by nprot
by length(forest@varselect)
by 3
array containing standardized prototype values. Prototype values for numeric variables are subtracted by the 5th percentile, then divided by the difference between the 95th and 5th percentile. Prototype values for categorical variables are divided by the number of levels in that variable.
levelsfreq
:List of length length(forest@varselect)
containing, for each categorical variable v, an forest@ynclass
by nprot
by forest@varnlevels[v]
array that indicate the frequency of levels used to compute the prototype level. These are useful for estimating prototype stability for categorical variables.
signature(forest = "bigcforest", prox = "bigrfprox")
Compute prototypes for a classification random forest.
Breiman, L. (2001). Random forests. Machine learning, 45(1), 5-32.
Breiman, L. & Cutler, A. (n.d.). Random Forests. Retrieved from http://www.stat.berkeley.edu/~breiman/RandomForests/cc_home.htm.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 | # Classify cars in the Cars93 data set by type (Compact, Large,
# Midsize, Small, Sporty, or Van).
# Load data.
data(Cars93, package="MASS")
x <- Cars93
y <- Cars93$Type
# Select variables with which to train model.
vars <- c(4:22)
# Run model, grow 30 trees.
forest <- bigrfc(x, y, ntree=30L, varselect=vars, cachepath=NULL)
# Calculate proximity matrix.
prox <- proximities(forest, cachepath=NULL)
# Compute prototypes.
prot <- prototypes(forest, prox, x=x)
# Plot first prototypes, using one colour for each class.
plot(seq_along(vars), prot$prot.std[1, 1, , 2], type="l", col=1,
ylim=c(min(prot$prot.std[, 1, , 2]), max(prot$prot.std[, 1, , 2])))
for (i in 2:length(levels(y))) {
lines(seq_along(vars), prot$prot.std[i, 1, , 2], type="l", col=i)
}
# Plot first prototype for class 1, including quartile values for numeric
# variables.
plot(seq_along(vars), prot$prot.std[1, 1, , 1], type="l", col=1,
ylim=c(min(prot$prot.std[1, 1, , ]), max(prot$prot.std[1, 1, , ])))
for (i in 2:3) {
lines(seq_along(vars), prot$prot.std[1, 1, , i], type="l", col=i)
}
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.