PRIM_peel_bs: Multiple Peeling-Function

Description Usage Arguments Details Value References See Also Examples

Description

This function is an implementation of the multiple Peeling-Algorithm as suggested by Friedman and Fisher (1999). The singular peeling function PRIM_peel is repeated for different alpha's and bootstrap samples out of the original data.

Usage

1
2
3
PRIM_peel_bs(formula, data, peel_alpha = seq(0.01, 0.4, 0.03), B = 0,
  beta_min = 0.01, target = mean, alter_crit = TRUE, use_NAs = TRUE,
  seed, print_position = TRUE)

Arguments

formula

an object of class "formula" with a response but no interaction terms. It indicates the response over which the target function should be maximized and the covariates that are used for the later box definitions.

data

an object of class data.frame containing the variables named in the formula.

peel_alpha

vector of a sequence of different alpha-fractions used for the peelings.

B

number of bootstrap samples on which the peeling is applied to for each alpha. For B = 0 no bootstraps are created.

beta_min

minimum support that one Box should have (stop-criterion).

target

target-function to be maximized. In most cases the mean is a useful target, although other functions like e.g. the median are also possible here.

alter_crit

logical. If TRUE the alternative criterion is used for peeling. I.e. "target/beta" is maximized during peeling instead of "target", so that large subboxes are not prefered to be peeled off. This is important especially in case of nominal covariates.

use_NAs

logical. If TRUE observations with missing values are included in the analysis.

seed

seed to be set before the first iteration. Only useful for B > 0.

print_position

logical. If TRUE the current position of the algorithm is printed out.

Details

The outcome of the formula can either be numeric, logical or a survival object (see Surv). If it is a survival object the target is set to the number of events per amount of time.

The output of this function can become very large because all outputs of the singular peel function PRIM_peel are put together in one output. Therefore it is usefull to remove all the dominated boxes (see remove_dominated).

Value

PRIM_peel_bs returns an object of class "peel", which is a list containing at least the following components:

f

vector of the target functions evaluated on the box at each peeling step.

beta

vector of the supports beta of the boxes at each peeling step.

box

a data.frame defining the borders of the boxes. Each row belongs to one peeling step. The columns with "min." and "max." describe the lower and upper boundaries of the at least ordinal covariates. Therefore the value taken is the last one that is not included in the current box.

For the nominal variables there are columns for every category they can take. If the category is removed from the box the value FALSE is taken. The names of these columns are structured like: <variable name>.<category>

For each variable with missing values (only if use_NAs = TRUE) there is also a column taking the value FALSE if the NAs of this variable are removed from the current box. The names of these columns are structured like: <variable name>.NA

box_metric, box_nom, box_na

easier to handle definitions of the boxes for other functions

subsets

list of logical vectors indicating the subsets at each peeling step (i.e. the observations that lie in the box)

data_orig

original dataset that is used for the peeling.

References

Friedman, J. H. and Fisher, N. I., 'Bump hunting in high-dimensional data', Statistics and Computing 9 (2) (1999), 123-143

Ott, A. and Hapfelmeier, A., 'Nonparametric Subgroup Identification by PRIM and CART: A Simulation and Application Study', Computational and Mathematical Methods in Medicine, vol. 2017 (2017), 17 pages, Article ID 5271091

See Also

remove_dominated, PRIM_peel, PRIM_paste, PRIM

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
# generating random data:
set.seed(123)
n <- 500
x1 <- runif(n = n, min = -1)
x2 <- runif(n = n, min = -1)
x3 <- runif(n = n, min = -1)
cat <- as.factor(sample(c("a","b","c", "d"), size = n, replace = TRUE))
wsk <- (1-sqrt(x1^2+x2^2)/sqrt(2))
y <- as.logical(rbinom(n = n, prob = wsk, size = 1))
dat <- cbind.data.frame(y, x1, x2, x3, cat)
#plot(dat$x1, dat$x2, col=dat$y+1, pch=16)
remove(x1, x2, x3, y, wsk, cat, n)

# apply the PRIM_peel_bs function:
prim <- PRIM_peel_bs(formula=y ~ ., data=dat, beta_min = .01)
plot(prim) # multiple trajectory
head(prim$box) # box definitions

ao90/PRIM documentation built on May 5, 2019, 8:01 p.m.