PRIM_peel_bs: Multiple Peeling-Function
In ao90/PRIM: Patient Rule Induction Method (PRIM)

Description Usage Arguments Details Value References See Also Examples

This function is an implementation of the multiple Peeling-Algorithm as suggested by Friedman and Fisher (1999). The singular peeling function PRIM_peel is repeated for different alpha's and bootstrap samples out of the original data.

1
2
3

PRIM_peel_bs(formula, data, peel_alpha = seq(0.01, 0.4, 0.03), B = 0,
  beta_min = 0.01, target = mean, alter_crit = TRUE, use_NAs = TRUE,
  seed, print_position = TRUE)

`formula`	an object of class "`formula`" with a response but no interaction terms. It indicates the response over which the target function should be maximized and the covariates that are used for the later box definitions.
`data`	an object of class `data.frame` containing the variables named in the formula.
`peel_alpha`	vector of a sequence of different alpha-fractions used for the peelings.
`B`	number of bootstrap samples on which the peeling is applied to for each alpha. For `B = 0` no bootstraps are created.
`beta_min`	minimum support that one Box should have (stop-criterion).
`target`	target-function to be maximized. In most cases the mean is a useful target, although other functions like e.g. the median are also possible here.
`alter_crit`	logical. If `TRUE` the alternative criterion is used for peeling. I.e. "target/beta" is maximized during peeling instead of "target", so that large subboxes are not prefered to be peeled off. This is important especially in case of nominal covariates.
`use_NAs`	logical. If `TRUE` observations with missing values are included in the analysis.
`seed`	seed to be set before the first iteration. Only useful for `B > 0`.
`print_position`	logical. If `TRUE` the current position of the algorithm is printed out.

The outcome of the formula can either be numeric, logical or a survival object (see Surv). If it is a survival object the target is set to the number of events per amount of time.

The output of this function can become very large because all outputs of the singular peel function PRIM_peel are put together in one output. Therefore it is usefull to remove all the dominated boxes (see remove_dominated).

PRIM_peel_bs returns an object of class "peel", which is a list containing at least the following components:

`f`	vector of the target functions evaluated on the box at each peeling step.
`beta`	vector of the supports beta of the boxes at each peeling step.
`box`	a `data.frame` defining the borders of the boxes. Each row belongs to one peeling step. The columns with "`min.`" and "`max.`" describe the lower and upper boundaries of the at least ordinal covariates. Therefore the value taken is the last one that is not included in the current box. For the nominal variables there are columns for every category they can take. If the category is removed from the box the value `FALSE` is taken. The names of these columns are structured like: `<variable name>.<category>` For each variable with missing values (only if `use_NAs = TRUE`) there is also a column taking the value `FALSE` if the `NA`s of this variable are removed from the current box. The names of these columns are structured like: `<variable name>.NA`
`box_metric, box_nom, box_na`	easier to handle definitions of the boxes for other functions
`subsets`	`list` of logical vectors indicating the subsets at each peeling step (i.e. the observations that lie in the box)
`data_orig`	original dataset that is used for the peeling.

Friedman, J. H. and Fisher, N. I., 'Bump hunting in high-dimensional data', Statistics and Computing 9 (2) (1999), 123-143

Ott, A. and Hapfelmeier, A., 'Nonparametric Subgroup Identification by PRIM and CART: A Simulation and Application Study', Computational and Mathematical Methods in Medicine, vol. 2017 (2017), 17 pages, Article ID 5271091

remove_dominated, PRIM_peel, PRIM_paste, PRIM

# generating random data:
set.seed(123)
n <- 500
x1 <- runif(n = n, min = -1)
x2 <- runif(n = n, min = -1)
x3 <- runif(n = n, min = -1)
cat <- as.factor(sample(c("a","b","c", "d"), size = n, replace = TRUE))
wsk <- (1-sqrt(x1^2+x2^2)/sqrt(2))
y <- as.logical(rbinom(n = n, prob = wsk, size = 1))
dat <- cbind.data.frame(y, x1, x2, x3, cat)
#plot(dat$x1, dat$x2, col=dat$y+1, pch=16)
remove(x1, x2, x3, y, wsk, cat, n)

# apply the PRIM_peel_bs function:
prim <- PRIM_peel_bs(formula=y ~ ., data=dat, beta_min = .01)
plot(prim) # multiple trajectory
head(prim$box) # box definitions