predict.randomUniformForest: Predict method for random Uniform Forests objects

View source: R/RandomUniformForestsCPP.R

predict.randomUniformForestR Documentation

Predict method for random Uniform Forests objects

Description

Prediction of test data with random Uniform Forests. Many options are allowed, the default one rendering exactly the same type of variable than the one of training labels.

Usage

## S3 method for class 'randomUniformForest'
predict(object, X, 
	type = c("response", "prob", "votes", "confInt", 
	"ranking", "quantile", "truemajority", "all"),
	classcutoff = c(0,0), 
	conf = 0.95,
	whichQuantile = NULL,
	rankingIDs = NULL,
	threads = "auto", 
	parallelpackage = "doParallel",
	...)

Arguments

object

an object of class randomUniformForest, as one created by the randomUniformForest( ) function.

X

a data frame or matrix containing new data (without response values).

type

one of response, prob, votes, prediction intervals, ranking, quantile, true majority or raw outputs, indicating respectively the type of output: 'predicted values', 'matrix of class probabilities', 'matrix of vote counts', 'prediction interval for each response', 'ranked responses' in case of recommendation scheme (note that it is currently experimental and it, first, requires classification modelling), 'quantile regression', 'predicted values based on nodes votes instead of tree votes', or lastly, 'raw outputs' of a random uniform forest. Note that "confInt" and "quantile" use an estimate of the conditional expectation for each tree parameter (the randomness), which contains many duplicates, due to bootstrap and the nature of ensemble learning, to produce estimates. These ones are usually loose since the forest replicates the distribution of Y (knowing X) to take decisions while one does not need all the X values to have one prediction. A more accurate option is to use the dedicated bCI function. Note also that option "quantile" is better handled when using default value of the 'nodesize' option in the randomUniformForest function.

classcutoff

see randomUniformForest.

conf

if type == "confInt", value of 'conf' (greater than 0 and lesser than 1) is the desired level of confidence for any prediction interval.

whichQuantile

if type == "quantile", value of 'whichQuantile' is the desired quantile (greater than 0 and lesser than 1).

rankingIDs

experimental. If 'type = "ranking"', vector of ID if one wants to rank responses instead of classify them. Ranking usually involves many same users (or objects, or items) whose choices have to be ranked, e.g., by a recommendation engine. For example, if 'user1' is present four times in test sample, its ID gives to the model a way to identify him (or it) and to order response values, beginning by the most relevant. Then, for many users, the process is repeated giving, finally, predictions from most to least relevant, taking into account all users. random Uniform Forests approach currently uses class probabilities to order predictions values. Note that optimizing AUC is one of the methods leading to good ranking methods if choices are binary, and NDCG is a way to assess model.

threads

see randomUniformForest.

parallelpackage

see randomUniformForest.

...

not used currently.

Value

response

predicted values. Default option that returns values in the same way than original training responses.

prob

for classification only. Matrix of class probabilities.

votes

matrix of vote counts. Each row is an observation and each column is tree output.

confInt

for regression only. Matrix where each row is an observation and each column one of the prediction interval bounds.

quantile

for regression only. Vector of predicted quantiles for 'conf' (value of the option) level of confidence.

ranking

a matrix or data frame. Description will be updated soon.

truemajority

predicted values, using raw outputs of trees (not majority vote). This option makes sense if one set 'nodesize' option greater than 1. Hence, aggregation is participative (at the leaf level) and not representative (at the tree level).

all

raw outputs of the model. Not useful, unless further computation is needed, for example in case of Post-processing.

Author(s)

Saip Ciss saip.ciss@wanadoo.fr

See Also

postProcessingVotes, bCI, model.stats, generic.cv

Examples

## same as randomForest example
#  data(iris)
# set.seed(111)
# ind <- sample(2, nrow(iris), replace = TRUE, prob = c(0.8, 0.2))

# iris.ruf <- randomUniformForest(Species ~ ., data = iris[ind == 1,], OOB = FALSE, 
# importance = FALSE, threads = 1)
# iris.pred <- predict(iris.ruf, iris[ind == 2,])

# table(observed = iris[ind == 2, "Species"], predicted = iris.pred)

## get all votes : note that aliases of classes are used internally and, for intermediate
## results, are not converted to their true values
# iris.all.votes <- predict(iris.ruf, iris[ind == 2,], type = "votes")

## get class probabilities
# iris.class.prob <- predict(iris.ruf, iris[ind == 2,], type = "prob")
# iris.class.prob

randomUniformForest documentation built on June 22, 2022, 1:05 a.m.