mondrian_forest: Builds a MondrianForest.

Description Usage Arguments Value Examples

View source: R/mondrian_forest.R

Description

mondrian_forest implements Lakshminarayanan et al's Mondrian Tree algorithms described in [Lakshminarayanan et al. 2014] (https://arxiv.org/abs/1406.2673) with a modification allowing for more space-efficient dummy variable treatment of categorical variables.

Usage

1
2
mondrian_forest(X, y_col_num, lambda, f_scale = 1, ntree = 25,
  verbose = FALSE)

Arguments

X

Data (matrix or data frame) containing features and column of labels.

y_col_num

Numeric length 1 vector of column number of label (defaults to last ncol(X)).

lambda

Budget parameter, see [Lakshminarayanan et al. 2014].

f_scale

Numeric length 1 vector if constant or length equal to the number of categorical variables. f_scale represents the implicit range of the dummy-encoded categorical variables.

ntree

Numeric length 1 vector of number of trees to build in forest.

verbose

Boolean length 1 vector, prints additional information while algorithm is running (currently just time to build trees).

Value

A mondrianforest.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
library(mondrianforest); library(dplyr); library(purrr); library(magrittr)
set.seed(1)
test <- data.frame(x1 = rnorm(1000),
                   x2 = runif(1000),
                   x3 = rbeta(n = 1000, shape1 = 3, shape2 = 8),
                    # x3 is noise
                   x4 = rbinom(n = 1000, size = 4, prob = 0.2)) %>%
  map_df(function(x) (x - min(x))/(max(x) - min(x))) %>%
  mutate(y = x1*x2^2 + exp(x1) - x4,
         x1 = cut_number(sin(x1), n = 10),
         label = as.factor(case_when(y < 1 ~ "A",
                                     y >=1 & y < 1.7 ~ "B",
                                     y >= 1.7 ~ "C",
         ))) %>%
  select(-y)
mf <- mondrian_forest(test[1:750, ], y_col_num = 5, lambda = 3)
table(test$label[751:1000], predict(mf, test[751:1000, ], type = "class"))
# Compare to Random Forest in installed:
# rf <- randomForest::randomForest(test[1:750, -5],
                                   y = test$label[1:750],
                                   ntree = 1000)
# table(test$label[751:1000], predict(rf, test[751:1000, ]))

millerjoey/mondrianforest documentation built on May 25, 2019, 10:30 p.m.