predict.literanger: Literanger prediction

View source: R/predict.R

predict.literangerR Documentation

Literanger prediction

Description

'literanger' provides different types of prediction that may be used in multiple imputation algorithms with random forests. The usual prediction is the 'bagged' prediction, the most frequent value (or the mean) of the in-bag samples in a terminal node. Doove et al (2014) propose a prediction that better matches the predictive distribution as needed for multiple imputation; take a random draw from the observations in the terminal node from a randomly drawn tree in the forest for each predicted value needed. Alternatively, the usual most-frequent-value or mean of the in-bag responses can be used as in missForest (Stekhoven et al, 2014) or miceRanger https://cran.r-project.org/package=miceRanger and missRanger https://cran.r-project.org/package=missRanger.

Usage

## S3 method for class 'literanger'
predict(
  object,
  newdata = NULL,
  prediction_type = c("bagged", "inbag", "nodes"),
  seed = 1L + sample.int(n = .Machine$integer.max - 1L, size = 1),
  n_thread = 0,
  verbose = FALSE,
  ...
)

Arguments

object

A trained random forest literanger object.

newdata

Data of class data.frame, matrix, or dgCMatrix (Matrix), for the latter two; must have column names; all predictors named in object$predictor_names must be present.

prediction_type

Name of the prediction algorithm; "bagged" is the most-frequent value among in-bag samples for classification, or the mean of in-bag responses for regression; "inbag" predicts by drawing one in-bag response from a random tree for each row; "nodes" (currently unsupported) returns the node keys (ids) of the terminal node from every tree for each row.

seed

Random seed, an integer between 1 and .Machine$integer.max. Default generates the seed from R, set to 0 to ignore the R seed and use a C++ std::random_device.

n_thread

Number of threads. Default is determined by system, typically the number of cores.

verbose

Show computation status and estimated runtime.

...

Ignored.

Details

Forests trained by literanger retain information about the in-bag responses in each terminal node, thus facilitating efficient predictions within a variation on multiple imputation proposed by Doove et al (2014). This type of prediction can be selected by setting prediction_type="inbag", or the usual prediction for classification and regression forests, the most-frequent-value and mean of in bag samples respectively, is given by setting prediction_type="bagged".

A list is returned. The values item contains the predicted classes or values (classification and regression forests, respectively). Factor levels are returned as factors with the levels as per the original training data.

Compared to the original package ranger, literanger excludes certain features:

  • Probability, survival, and quantile regression forests.

  • Support for class gwaa.data.

  • Standard error estimation.

Value

Object of class literanger_prediction with elements:

values

Predicted (drawn) classes/value for classification and regression.

tree_type

Number of trees.

seed

The seed supplied to the C++ library.

Author(s)

stephematician stephematician@gmail.com, Marvin N Wright (original ranger package)

References

  • Doove, L. L., Van Buuren, S., & Dusseldorp, E. (2014). Recursive partitioning for missing data imputation in the presence of interaction effects. Computational Statistics & Data Analysis, 72, 92-104. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1016/j.csda.2013.10.025")}.

  • Stekhoven, D.J. and Buehlmann, P. (2012). MissForest–non-parametric missing value imputation for mixed-type data. Bioinformatics, 28(1), 112-118. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1093/bioinformatics/btr597")}.

  • Wright, M. N., & Ziegler, A. (2017a). ranger: A Fast Implementation of Random Forests for High Dimensional Data in C++ and R. Journal of Statistical Software, 77, 1-17. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.18637/jss.v077.i01")}.

See Also

train

Examples

## Classification forest
train_idx <- sample(nrow(iris), 2/3 * nrow(iris))
iris_train <- iris[ train_idx, ]
iris_test  <- iris[-train_idx, ]
rf_iris <- train(data=iris_train, response_name="Species")
pred_iris_bagged <- predict(rf_iris, newdata=iris_test,
                            prediction_type="bagged")
pred_iris_inbag  <- predict(rf_iris, newdata=iris_test,
                            prediction_type="inbag")
# compare bagged vs actual test values
table(iris_test$Species, pred_iris_bagged$values)
# compare bagged prediction vs in-bag draw
table(pred_iris_bagged$values, pred_iris_inbag$values)


literanger documentation built on Sept. 30, 2024, 9:15 a.m.