dvr_ensemble: Dependent Variable Regression Ensemble

Description Usage Arguments Value References Examples

Description

Dependent Variable Regression Ensemble

Usage

1
2
3
dvr_ensemble(formula, data, method = "lm", n_predictions, n_train_points,
  score_set = c("all", "test"), error_agg_fun = mean, scores_only = TRUE,
  ...)

Arguments

formula

a formula interface specifying the model (see help("lm") for more detail)

data

a matrix or data.frame containing variables in model

method

a model function (e.g. "lm", "randomForest")

n_predictions

an integer specifying the number of components in the ensemble. If score_set = "test", set this high enough to ensure all points are predicted a sufficient number of times

n_train_points

an integer or numeric value specifying the number of rows used in the training phase of each ensemble

score_set

one of "all" or "test". If "all", scores all N points in each training iteration. If "test", score out of sample points in each iteration.

error_agg_fun

a function for combining the squared prediction errors. Defaults to mean.

scores_only

logical, if TRUE return a vector of outlier scores. If FALSE, return the error matrix and outlier scores

Value

if scores_only = TRUE, a vector of outlier scores. If FALSE, a list with outlier scores and the ensemble error matrix

References

section 3.2.1 of "Outlier Analysis" (C. C. Aggarwal. Outlier Analyis. Springer, 2017.)

Examples

1
2
3
ensemble_lm(formula = Sepal.Length ~ ., data = iris[,-5],
n_predictions=100, n_train_points=100, error_agg_fun = median, 
scores_only = T)

dannymorris/outsiders documentation built on May 13, 2019, 1:22 p.m.