PRIM_paste: Pasting-Function

Description Usage Arguments Details Value References Examples

Description

This function is an implementation of the Pasting-Algorithm as suggested by Friedman and Fisher (1999). In each iteration the fraction alpha is pasted to one edge of the current box.

Usage

1
PRIM_paste(fixbox, paste_alpha = 0.01, max_steps = 50, stop_by_dec = TRUE)

Arguments

fixbox

an object of class fixbox, which was defined after the peeling function and now should be used for pasting.

paste_alpha

alpha-fraction that is pasted to the box at each iteration.

max_steps

maximum number of pasting steps the function should make.

stop_by_dec

logical. If TRUE the pasting stops if the target at one step is lower than the target of the last step.

Details

The outcome of this function is also a "peel"-object, because it has basically the same structure as the outcome of the peeling functions. The only difference is, that pasting goes from small supports to bigger ones, while by peeling its the other way round.

Value

PRIM_paste returns an object of class "peel", which is a list containing at least the following components:

f

vector of the target functions evaluated on the box at each pasting step.

beta

vector of the supports beta of the boxes at each pasting step.

box

a data.frame defining the borders of the boxes. Each row belongs to one pasting step. The columns with "min." and "max." describe the lower and upper boundaries of the at least ordinal covariates. Therefore the value taken is the last one that is not included in the current box.

For the nominal variables there are columns for every category they can take. If the category is removed from the box the value FALSE is taken. The names of these columns are structured like: <variable name>.<category>

For each variable with missing values (only if use_NAs = TRUE) there is also a column taking the value FALSE if the NAs of this variable are removed from the current box. The names of these columns are structured like: <variable name>.NA

box_metric, box_nom, box_na

easier to handle definitions of the boxes for other functions

subsets

list of logical vectors indicating the subsets at each pasting step (i.e. the observations that lie in the box)

data_orig

original dataset that is used (extracted from fixbox).

References

Friedman, J. H. and Fisher, N. I., 'Bump hunting in high-dimensional data', Statistics and Computing 9 (2) (1999), 123-143

Ott, A. and Hapfelmeier, A., 'Nonparametric Subgroup Identification by PRIM and CART: A Simulation and Application Study', Computational and Mathematical Methods in Medicine, vol. 2017 (2017), 17 pages, Article ID 5271091

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
# generating random data:
set.seed(123)
n <- 500
x1 <- runif(n = n, min = -1)
x2 <- runif(n = n, min = -1)
x3 <- runif(n = n, min = -1)
cat <- as.factor(sample(c("a","b","c", "d"), size = n, replace = TRUE))
wsk <- (1-sqrt(x1^2+x2^2)/sqrt(2))
y <- as.logical(rbinom(n = n, prob = wsk, size = 1))
dat <- cbind.data.frame(y, x1, x2, x3, cat)
#plot(dat$x1, dat$x2, col=dat$y+1, pch=16)
remove(x1, x2, x3, y, wsk, cat, n)

# apply the PRIM_peel function:
prim <- PRIM_peel(y ~ ., data = dat, beta_min = .01, peel_alpha = .1)
plot(prim)
abline(h=prim$f[17], v=prim$beta[17]) # box decided to paste
fix <- define_fixbox(prim, 17) # define fixbox

# apply the PRIM_paste function:
paste <- PRIM_paste(fix, stop_by_dec = FALSE)
head(cbind(paste$box, paste$f, paste$beta))

ao90/PRIM documentation built on May 5, 2019, 8:01 p.m.