predict.rfArb: predict method for Rborst and rfArb result

predict.rfArbR Documentation

predict method for Rborst and rfArb result

Description

Prediction and test using Rborist.

Usage

## S3 method for class 'rfArb'
predict(object, newdata, yTest=NULL,
keyedFrame = FALSE, quantVec=NULL, quantiles = !is.null(quantVec),
ctgCensus = "votes", indexing = FALSE, trapUnobserved = FALSE,
bagging = FALSE, nThread = 0, verbose = FALSE, ...)

Arguments

object

an object of class rfArb, created from a previous invocation of the command Rborist or rfArb to train.

newdata

a design frame or matrix containing new data, with the same signature of predictors as in the training command.

yTest

a response vector against which to test the new predictions.

keyedFrame

whether the columns of newdata may appear in arbitrary order or as a superset of the predictors used to train.

quantVec

a vector of quantiles to predict.

quantiles

whether to predict quantiles.

ctgCensus

whether/how to summarize per-category predictions. "votes" specifies the number of trees predicting a given class. "prob" specifies a normalized, probabilistic summary. "probSample" specifies sample-weighted probabilities, similar to quantile histogramming.

indexing

whether to record the final node index, typically terminal, of tree traversal.

trapUnobserved

reports score for nonterminal upon encountering values not observed during training, such as missing data.

bagging

whether prediction is restricted to out-of-bag samples.

nThread

suggests ans OpenMP-style thread count. Zero denotes default processor setting.

verbose

whether to output progress of prediction.

...

not currently used.

Value

an object of one of two classes:

SummaryReg

summarizing regression, consisting of: predictionan object of class PredictReg consisting of: yPredthe estimated numerical response. qPredquantiles of prediction, if requested. qEstquantile of the estimate, if quantiles requested. indicesfinal index of prediction, if requested. validationif validation requested, an object of class ValidReg consisting of: msethe mean-square error of the estimate. rsqthe r-squared statistic of the estimate. maethe mean absolute error of the estimate. importanceif permution importance requested, an object of class importanceReg, consisting of: namesthe predictor names. msethe per-predictor mean-square error, under permutation.

SummaryCtg

an object of class PredictCtg consisting of: yPredestimated categorical response. censusfactor-valued matrix of the estimate, by category, if requested. probmatrix of estimate probabilities, by category, if requested. indicesfinal index of prediction, if requested. validationif validation requested, an object of class ValidCtg consisting of: confusionthe confusion matrix. mispredictionthe misprediction rate. oobErrorthe out-of-bag error. importanceif permution importance requested, an object of class importanceCtg, consisting of: mispredthe misprediction rate, by predictor. oobErrthe out-of-bag error, by predictor.

Author(s)

Mark Seligman at Suiji.

See Also

codeforestWeight

Examples

## Not run: 
  # Regression example:
  nRow <- 5000
  x <- data.frame(replicate(6, rnorm(nRow)))
  y <- with(x, X1^2 + sin(X2) + X3 * X4) # courtesy of S. Welling.
  rb <- Rborist(x,y)


  # Performs separate prediction on new data:
  xx <- data.frame(replacate(6, rnorm(nRow)))
  pred <- predict(rb, xx)
  yPred <- pred$yPred

  # As above, but also records final indices of each tree walk:
  #
  pred <- predict(rb, xx, indexing=TRUE)
  print(pred$indices[c(1:2), ])


  # As above, but predicts over \code{newdata} with unobserved values.
  # In the case of numerical data, only missing values are considered
  # unobserved.  Missing values are encoded as \code{NaN}, which are
  # incomparable, precipitating \code{false} on every test.  Prediction
  # therefore takes the \code{false} branch when encountering missing
  # values:
  #
  xxMissing <- xx
  xxMissing[6, c(15, 32, 87, 101)] <- NA
  pred <- predict(rb, xxMissing)
  

  # As above, but returns a nonterminal score upon encountering
  # unobserved values. Neither the true nor the false branch from the
  # testing node is taken.  Instead, the score returned is derived
  # from all leaf nodes (terminals) reached by the testing
  # (nonterminal) node.
  #
  pred <- predict(rb, xxMissing, trapUnobserved = TRUE)


  # Performs separate prediction, using original response as test
  # vector:
  pred <- predict(rb, xx, y)
  mse <- pred$mse
  rsq <- pred$rsq


  # Performs separate prediction with (default) quantiles:
  pred <- predict(rb, xx, quantiles="TRUE")
  qPred <- pred$qPred


  # Performs separate prediction with deciles:
  pred <- predict(rb, xx, quantVec = seq(0.1, 1.0, by = 0.10))
  qPred <- pred$qPred


  # Classification examples:
  data(iris)
  rb <- Rborist(iris[-5], iris[5])


  # Generic prediction using training set.
  # Census as (default) votes:
  pred <- predict(rb, iris[-5])
  yPred <- pred$yPred
  census <- pred$census

  # Using the \code{keyedFrame} option allows the columns of
  # \code{newdata} to appear in arbitrary order, so long as the
  # columns present during training appear as a subset:
  #
  pred <- predict(rb, iris[c(2, 4, 3, 1)], keyedFrame=TRUE)


  # As above, but validation census to report class probabilities:
  pred <- predict(rb, iris[-5], ctgCensus="prob")
  prob <- pred$prob


  # As above, but with training reponse as test vector:
  pred <- predict(rb, iris[-5], iris[5], ctgCensus = "prob")
  prob <- pred$prob
  conf <- pred$confusion
  misPred <- pred$misPred

  # As above, but predicts nonterminal when encountering categories
  # not observed during training.  That is, prediction returns a score
  # derived from all terminal nodes (leaves) reached from the
  # (nonterminal) testing node.
  #
  # In this case, "unobserved" refers to categories not present in
  # the subpartition over which a splitting is performed.  As training
  # partitions the data into smaller and smaller regions, a given
  # category becomes less likely to appear in a region.
  #
  # More generally, unobserved data can include missing predictors as
  # well as categories appearing in \code{newdata} which were not
  # present during training.
  #
  pred <- predict(rb, trapUnobserved=TRUE)

## End(Not run)

Rborist documentation built on July 26, 2023, 5:32 p.m.