Description Usage Arguments Details Value Examples
Extensible trees provide the basic infrastructure to define tree algorithms via transformation, variable selection, and split point selection functions.
1 2 |
data |
an object of class |
trafo |
a function with arguments |
converged |
an optional function with arguments |
control |
list of control arguments generated by
|
... |
Additional arguments passed on to |
This basic tree algorithm can be used to define your own tree algorithm variants.
trafo
defines how you want to preprocess you data for variable and split
point selection. As an example, mob
computes a model and returns
information such as estfun
(the empirical estimating functions / score
contribution matrix, see also estfun
), objfun
(value of the minimized objective function, usually negative log-Likelihood),
coef
(estimated model coefficients), and converged
(logical, has
the model converged?).
selectfun
defines how to select the split variable.
splitfun
defines how to select the split point.
Details in extree_control
.
Currently: A list of nodes
(an object of class partynode
)
and trafo
(the encapsulated transformation function).
This will likely change soon.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 | data(airquality, package = "datasets")
airq <- subset(airquality, !is.na(Ozone))
airq_dat <- extree_data(Ozone ~ Wind + Temp,
data = airq, yx = "matrix")
### Set up trafo function to preprocess data for variable and split point selection
trafo_identity <- function(subset, data, weights = NULL, info = NULL,
estfun = TRUE, object = TRUE) {
### Extract response and "subset"
y <- extree_variable(data, i = 1, type = "original")
y[-subset] <- NA
### Return list
rval <- list(
estfun = if (estfun) y else NULL,
unweighted = TRUE,
converged = TRUE
)
return(rval)
}
### Set up function to guide variable selection
### Returns a list with values of test statistics and p-values
var_select_guide <- function(model, trafo, data, subset, weights, j,
split_only = FALSE, control) {
estfun <- model$estfun[subset]
### categorize estfun if not already a factor
if(is.factor(estfun)) est_cat <- estfun else {
breaks <- unique(quantile(estfun, c(0, 0.25, 0.5, 0.75, 1)))
if(length(breaks) < 5) breaks <- c(min(estfun), mean(estfun), max(estfun))
est_cat <- cut(estfun, breaks = breaks,
include.lowest = TRUE, right = TRUE)
}
### get possible split variable
sv_cat <- extree_variable(data, i = j, type = "index")[subset]
### independence test
test <- chisq.test(x = est_cat, y = sv_cat)
res <- list(statistic = test$statistic, p.value = test$p.value)
return(res)
}
### Set up split selection
### As a split point the median is used of the split variable
split_select_median_numeric <- function(model, trafo, data, subset, weights,
whichvar, ctrl) {
if (length(whichvar) == 0) return(NULL)
### split FIRST variable at median
j <- whichvar[1]
x <- extree_variable(data, i = j, type = "original")[subset]
ret <- partysplit(as.integer(j), breaks = median(x))
return(ret)
}
### Set extree control
ctrl1 <- extree_control(criterion = "p.value", # split variable selection criterion
logmincriterion = log(1 - 0.05),
update = TRUE,
selectfun = var_select_guide,
splitfun = split_select_median_numeric,
svselectfun = NULL,
svsplitfun = NULL,
minsplit = 50)
### Call extree
tr1 <- extree(data = airq_dat, trafo = trafo_identity,
control = c(ctrl1, restart = TRUE))
print(tr1$nodes)
ptr1 <- party(tr1$nodes, data = airq_dat$data)
print(ptr1)
plot(ptr1)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.