postProcessingVotes: Post-processing for Regression
In randomUniformForest: Random Uniform Forests for Classification, Regression and Unsupervised Learning

View source: R/RandomUniformForestsCPP.R

postProcessingVotes

R Documentation

Post-processing for Regression

Description

Post-processing use OOB votes and predicted values to build more accurate estimates of Response values. Note that post-processing can not ensure that new estimate will have a lower error. It works for many cases but not all.

Usage

postProcessingVotes(object,
	nbModels = 1,
	idx = 1,
	granularity = 1,
	predObject = NULL,
	swapPredictions = FALSE,
	X = NULL,
	Xtest = NULL,
	imbalanced = FALSE,
	OOB = FALSE,
	method = c("cutoff", "bias", "residuals"),
	keep2ndModel = FALSE, 
	largeData = FALSE,
	...)

Arguments

`object`	a randomUniformForest object with OOB data.
`nbModels`	how many models to build for new estimates. Usually one is enough.
`idx`	how many values to choose in OOB model for each new predicted value. Usually one is enough.
`granularity`	degree of precision needed for each old estimate value. Usually one is enough.
`predObject`	if current model is built with full sample, then using an old model 'predObject' (a randomUniformForest object) that have OOB data can help to reduce error. Must be used with 'swapPredictions = TRUE'
`swapPredictions`	set it to TRUE if two models, current one without OOB data and old one with OOB data, have to be used for trying to reduce prediction error.
`X`	not currently used.
`Xtest`	test data in the case of regression, for a more friendly output of the model.
`imbalanced`	if TRUE, may improve metrics in the case of imbalanced datasets.
`OOB`	if FALSE, does not use OOB informations.
`method`	for classification, if expected bias is enough high, one may use it as a method to improve AUC. Otherwise, use the default one, 'cutoff'. Both tend to get the same results, despite a few tests for now, but 'bias' method seems more robust. 'residuals' is used in regression only as a powerful but computationally intensive method and replaces the default internal one.
`keep2ndModel`	if TRUE, and for regression, keep the model based on residuals for further modelling and predictions.
`largeData`	if TRUE, and for regression, use `rUniformForest.big` to compute model for the residuals.
`...`	arguments to use for the computation of the model from the residuals.

Value

a vector of predicted values.

Author(s)

Saip Ciss saip.ciss@wanadoo.fr

References

Xu, Ruo, Improvements to random forest methodology (2013). Graduate Theses and Dissertations. Paper 13052.

Examples

# Note that post-processing works better with enough trees, at least 100, and enough data
n = 200;  p = 20
# Simulate 'p' gaussian vectors with random parameters between -10 and 10.
X <- simulationData(n,p)

# give names to features
X <- fillVariablesNames(X)

# Make a rule to create response vector
epsilon1 = runif(n,-1,1)
epsilon2 = runif(n,-1,1)

# a rule with many noise (only four variables are significant)
Y = 2*(X[,1]*X[,2] + X[,3]*X[,4]) + epsilon1*X[,5] + epsilon2*X[,6]

# randomize then  make train and test sample
twoSamples <- cut(sample(1:n,n), 2, labels = FALSE)

Xtrain = X[which(twoSamples == 1),]
Ytrain = Y[which(twoSamples == 1)]
Xtest = X[which(twoSamples == 2),]
Ytest = Y[which(twoSamples == 2)]

# compute an accurate model (in this case bagging and log data works best) and predict
rUF.model <- randomUniformForest(Xtrain, Ytrain, xtest = Xtest, ytest = Ytest, 
bagging = TRUE, logX = TRUE, ntree = 60, threads = 2)

# get mean squared error
rUF.model

# post process
newEstimate <- postProcessingVotes(rUF.model)

# get mean squared error
sum((newEstimate - Ytest)^2)/length(Ytest)

## regression do not use all data but sub-samples.
## Comparing, when using full sample (but then, we do not have OOB data)

# rUF.model.fullsample <- randomUniformForest(Xtrain, Ytrain, xtest = Xtest, ytest = Ytest,
# subsamplerate = 1, bagging = TRUE, logX = TRUE, ntree = 60, threads = 2)
# rUF.model.fullsample

## Nevertheless we can use old model with OOB data to fit a new estimate
## newEstimate.fullsample <- postProcessingVotes(rUF.model.fullsample,
# predObject = rUF.model, swapPredictions = TRUE)

## get mean squared error
# sum((newEstimate.fullsample - Ytest)^2)/length(Ytest)

randomUniformForest documentation built on June 22, 2022, 1:05 a.m.

randomUniformForest index

Package overview

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

randomUniformForest
Random Uniform Forests for Classification, Regression and Unsupervised Learning

postProcessingVotes: Post-processing for Regression
In randomUniformForest: Random Uniform Forests for Classification, Regression and Unsupervised Learning

Post-processing for Regression

Description

Usage

Arguments

Value

Author(s)

References

Examples

Related to postProcessingVotes in randomUniformForest...

R Package Documentation

Browse R Packages

We want your feedback!

randomUniformForest Random Uniform Forests for Classification, Regression and Unsupervised Learning

postProcessingVotes: Post-processing for Regression In randomUniformForest: Random Uniform Forests for Classification, Regression and Unsupervised Learning

Post-processing for Regression

Description

Usage

Arguments

Value

Author(s)

References

Examples

Related to postProcessingVotes in randomUniformForest...

R Package Documentation

Browse R Packages

We want your feedback!

randomUniformForest
Random Uniform Forests for Classification, Regression and Unsupervised Learning

postProcessingVotes: Post-processing for Regression
In randomUniformForest: Random Uniform Forests for Classification, Regression and Unsupervised Learning