extree: Extensible Trees

Description Usage Arguments Details Value Examples

View source: R/extree.R

Description

Extensible trees provide the basic infrastructure to define tree algorithms via transformation, variable selection, and split point selection functions.

Usage

1
2
extree(data, trafo, control = extree_control(...), 
    converged = NULL, ...)

Arguments

data

an object of class extree_data, see extree_data.

trafo

a function with arguments subset, data, weights, info, estfun and object.

converged

an optional function with arguments subset, weights.

control

list of control arguments generated by extree_control. Among others the variable and split point selection (selectfun, splitfun) are specified here.

...

Additional arguments passed on to extree_fit.

Details

This basic tree algorithm can be used to define your own tree algorithm variants.

trafo defines how you want to preprocess you data for variable and split point selection. As an example, mob computes a model and returns information such as estfun (the empirical estimating functions / score contribution matrix, see also estfun), objfun (value of the minimized objective function, usually negative log-Likelihood), coef (estimated model coefficients), and converged (logical, has the model converged?).

selectfun defines how to select the split variable. splitfun defines how to select the split point. Details in extree_control.

Value

Currently: A list of nodes (an object of class partynode) and trafo (the encapsulated transformation function). This will likely change soon.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
data(airquality, package = "datasets")
airq <- subset(airquality, !is.na(Ozone))
airq_dat <- extree_data(Ozone ~ Wind + Temp,
                        data = airq, yx = "matrix")

### Set up trafo function to preprocess data for variable and split point selection 
trafo_identity <- function(subset, data, weights = NULL, info = NULL, 
                           estfun = TRUE, object = TRUE) {
  
  ### Extract response and "subset"
  y <- extree_variable(data, i = 1, type = "original")  
  y[-subset] <- NA  
  
  ### Return list
  rval <- list(
    estfun = if (estfun) y else NULL,
    unweighted = TRUE,  
    converged = TRUE 
  )
  return(rval)
}

### Set up function to guide variable selection 
### Returns a list with values of test statistics and p-values 
var_select_guide <- function(model, trafo, data, subset, weights, j,
                                    split_only = FALSE, control) {

  estfun <- model$estfun[subset]

  ### categorize estfun if not already a factor
  if(is.factor(estfun)) est_cat <- estfun else {
    breaks <- unique(quantile(estfun, c(0, 0.25, 0.5, 0.75, 1)))
    if(length(breaks) < 5) breaks <- c(min(estfun), mean(estfun), max(estfun))
    est_cat <- cut(estfun, breaks = breaks,
                   include.lowest = TRUE, right = TRUE)
  }

  ### get possible split variable
  sv_cat <- extree_variable(data, i = j, type = "index")[subset]

  ### independence test
  test <- chisq.test(x = est_cat, y = sv_cat)
  res <- list(statistic = test$statistic, p.value = test$p.value)

  return(res)
}


### Set up split selection
### As a split point the median is used of the split variable
split_select_median_numeric <- function(model, trafo, data, subset, weights, 
                                        whichvar, ctrl) {
  
  if (length(whichvar) == 0) return(NULL)
  
  ### split FIRST variable at median 
  j <- whichvar[1]
  x <- extree_variable(data, i = j, type = "original")[subset]
  ret <- partysplit(as.integer(j), breaks = median(x))
  
  return(ret)
}

### Set extree control 
ctrl1 <- extree_control(criterion = "p.value", # split variable selection criterion 
                        logmincriterion = log(1 - 0.05),
                        update = TRUE,
                        selectfun = var_select_guide,
                        splitfun = split_select_median_numeric,
                        svselectfun = NULL, 
                        svsplitfun = NULL,
                        minsplit = 50)


### Call extree 
tr1 <- extree(data = airq_dat, trafo = trafo_identity, 
              control = c(ctrl1, restart = TRUE))
print(tr1$nodes)
ptr1 <- party(tr1$nodes, data = airq_dat$data)
print(ptr1)
plot(ptr1)

partykitx documentation built on Sept. 3, 2020, 3:01 p.m.