Binary and Multiway Splits
Description
A class for representing multiway splits and functions for computing on splits.
Usage
1 2 3 4 5 6 7 8 9 10 11  partysplit(varid, breaks = NULL, index = NULL, right = TRUE,
prob = NULL, info = NULL)
kidids_split(split, data, vmatch = 1:ncol(data), obs = NULL)
character_split(split, data = NULL,
digits = getOption("digits")  2)
varid_split(split)
breaks_split(split)
index_split(split)
right_split(split)
prob_split(split)
info_split(split)

Arguments
varid 
an integer specifying the variable to split in, i.e.,
a column number in 
breaks 
a numeric vector of split points. 
index 
an integer vector containing a contiguous sequence
from one to the number of kid nodes. May contain 
right 
a logical, indicating if the intervals defined by

prob 
a numeric vector representing a probability distribution over kid nodes. 
info 
additional information. 
split 
an object of class 
data 
a 
vmatch 
a permutation of the variable numbers in 
obs 
a logical or integer vector indicating a subset of the
observations in 
digits 
minimal number of significant digits. 
Details
A split is basically a function that maps data,
more specifically a partitioning variable,
to a set of integers indicating the kid nodes to send observations to.
Objects of class partysplit
describe such a function and can
be setup via the partysplit()
constructor.
The variables are available in a list
or data.frame
(here called data
) and varid
specifies the
partitioning variable, i.e., the variable or list element to split in.
The constructor partysplit()
doesn't have access
to the actual data, i.e., doesn't estimate splits.
kidids_split(split, data)
actually partitions the data
data[obs,varid_split(split)]
and assigns an integer (giving the
kid node number) to each observation. If vmatch
is given,
the variable vmatch[varid_split(split)]
is used.
character_split()
returns a character representation
of its split
argument. The remaining functions
defined here are accessor functions for partysplit
objects.
The numeric vector breaks
defines how the range of
the partitioning variable (after coercing to a numeric via
as.numeric
) is divided into intervals
(like in cut
) and may be
NULL
. These intervals are represented by the
numbers one to length(breaks) + 1
.
index
assigns these length(breaks) + 1
intervals to one of at least two kid nodes. Thus, index
is a vector of integers where each element corresponds
to one element in a list kids
containing partynode
objects, see partynode
for details. The vector
index
may contain NA
s, in that case, the corresponding
values of the splitting variable are treated as missings (for
example factor levels that are not present in the learning sample).
Either breaks
or index
must be given.
When breaks
is NULL
, it is assumed that
the partitioning variable itself has storage mode integer
(e.g., is a factor
).
prob
defines a probability distribution over
all kid nodes which is used for random splitting
when a deterministic split isn't possible (due to missing
values, for example).
info
takes arbitrary userspecified information.
Value
The constructor partysplit()
returns an object of class partysplit
:
varid 
an integer specifying the variable to split in, i.e.,
a column number in 
breaks 
a numeric vector of split points, 
index 
an integer vector containing a contiguous sequence from one to the number of kid nodes, 
right 
a logical, indicating if the intervals defined by

prob 
a numeric vector representing a probability distribution over kid nodes, 
info 
additional information. 
kidids_split()
returns an integer vector describing
the partition of the observations into kid nodes.
character_split()
gives a character representation of the
split and the remaining functions return the corresponding slots
of partysplit
objects.
References
Hothorn T, Zeileis A (2015). partykit: A Modular Toolkit for Recursive Partytioning in R. Journal of Machine Learning Research, 16, 3905–3909.
See Also
cut
Examples
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32  data("iris", package = "datasets")
## binary split in numeric variable `Sepal.Length'
sl5 < partysplit(which(names(iris) == "Sepal.Length"),
breaks = 5)
character_split(sl5, data = iris)
table(kidids_split(sl5, data = iris), iris$Sepal.Length <= 5)
## multiway split in numeric variable `Sepal.Width',
## higher values go to the first kid, smallest values
## to the last kid
sw23 < partysplit(which(names(iris) == "Sepal.Width"),
breaks = c(3, 3.5), index = 3:1)
character_split(sw23, data = iris)
table(kidids_split(sw23, data = iris),
cut(iris$Sepal.Width, breaks = c(Inf, 2, 3, Inf)))
## binary split in factor `Species'
sp < partysplit(which(names(iris) == "Species"),
index = c(1L, 1L, 2L))
character_split(sp, data = iris)
table(kidids_split(sp, data = iris), iris$Species)
## multiway split in factor `Species'
sp < partysplit(which(names(iris) == "Species"), index = 1:3)
character_split(sp, data = iris)
table(kidids_split(sp, data = iris), iris$Species)
## multiway split in numeric variable `Sepal.Width'
sp < partysplit(which(names(iris) == "Sepal.Width"),
breaks = quantile(iris$Sepal.Width))
character_split(sp, data = iris)
