Description Usage Arguments Details Value References See Also Examples
Iteratively peels a dataset for bump hunting.
1 |
y |
Numeric vector of response values. |
x |
Numeric or categorical data.frame of input values. |
alpha |
The peeling fraction of the algorithm. A value between 0 and 1 giving the proportion of peeled observations at each step. |
beta.stop |
The stopping support of the algorithm. A value between 0 and 1 giving the proportion of remaining data below which the algorithm stops. |
obj.fun |
The function of |
peeling.side |
A numeric vector for side constraints on the peeling of each input variable. -1 indicates peeling only the 'left' of the box (i.e. increasing the lower limit only), 1 indicate peeling only the 'right' and 0 for no constraint. |
The function peeling
carries out the top-down peeling
which is the first step of the PRIM algorithm. At each iteration
it peels a proportion alpha
of data from one side of the domain
in order to increase the value of the function obj.fun
applied
to the response y
. The algorithm iterates the peeling until
the support of the box (i.e. the proportion of remaining observations)
is below the value beta.stop
.
Many function can be used in obj.fun
including user defined
functions. User defined function should take two arguments: y
and x
representing corresponding variables
and inbox
which is a boolean
vector indicating the observations inside the current box.
Note that a classical function can also be passed to obj.fun
such as mean
, var
or median
. In this case
the function is created internally to fit the above structure.
For more functions more complicated than the basic ones,
it is recommended that the user set its own function as stated
above.
The function also allows directed peeling, i.e. to contraint the peeling
occuring on a single side of some input variables. Thus when
peeling.side = -1
, only the lower part of the variable is peeled
(the "left" of the domain) and when peeling.side = 1
, only the
upper part of the variable is peeled. Note that a vector can be passed,
thus applying different constraints to the input variables.
A prim
object which is a list with the following elements:
npeel |
The number of peeling iteration performed. |
support |
A vector of length |
yfun |
A vector of length |
limits |
A list of length |
x,y |
The input and response data used in the algorithm. |
numeric.vars |
A logical vector indicating, for each input variable, if it was considered as a numeric variable. |
alpha, peeling.side, obj.fun |
The value of the arguments used for peeling. Useful for prim methods. |
npaste |
Number of pasting iteration performed. Should be 0 here,
but useful for |
Note that the first box in a prim
object is the starting box
containing the whole dataset. This is why the limits
,
yfun
and support
elements have length npeel + 1
.
Friedman, J.H., Fisher, N.I., 1999. Bump hunting in high-dimensional data. Statistics and Computing 9, 123-143. https://doi.org/10.1023/A:1008894516817
extract.box
to extract information about a
particular box in a prim
object.
plot_trajectory
and plot_box
to explore
the peeling trajectory. jump.prim
to automatically
choose the best box. predict.prim
to predict if new data
falls into particular boxes. pasting
to carry out the
pasting refining the edges of the chosen box.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 | # A simple bump
set.seed(12345)
x <- matrix(runif(2000), ncol = 2, dimnames = list(NULL, c("x1", "x2")))
y <- 2 * x[,1] + 5 * x[,2] + 10 * (x[,1] >= .8 & x[,2] >= .5) +
rnorm(1000)
# Peeling with alpha = 0.05 and beta.stop = 0.05
peel_res <- peeling(y, x, beta.stop = 0.05)
# Automatically choose the best box
chosen <- jump.prim(peel_res)
# Plot the resulting box
plot_box(peel_res, pch = 16, ypalette = hcl.colors(10),
support = chosen$final.box$support, box.args = list(lwd = 2))
# Examples of directed peeling
set.seed(12345)
x <- matrix(runif(2000), ncol = 2, dimnames = list(NULL, c("x1", "x2")))
y <- 10 * (x[,1] <= .2 & x[,2] <= .2) + 10 * (x[,1] >= .8 & x[,2] >= .8) +
rnorm(1000)
# Left peeling
peel_left <- peeling(y, x, peeling.side = -1)
chosen <- jump.prim(peel_left)
plot_box(peel_left, pch = 16, ypalette = hcl.colors(10),
support = chosen$final.box$support, box.args = list(lwd = 2),
main = "Left peeling")
# Right peeling
peel_right <- peeling(y, x, peeling.side = 1)
chosen <- jump.prim(peel_right)
plot_box(peel_right, pch = 16, ypalette = hcl.colors(10),
support = chosen$final.box$support, box.args = list(lwd = 2),
main = "Right peeling")
# User-defined objective function to minimize the mean
set.seed(3333)
x <- matrix(runif(2000), ncol = 2, dimnames = list(NULL, c("x1", "x2")))
y <- - 10 * (x[,1] <= .2 & x[,2] <= .2) + 10 * (x[,1] >= .8 & x[,2] >= .8) +
rnorm(1000)
peel_res <- peeling(y, x, obj.fun = function(x) -mean(x))
chosen <- jump.prim(peel_res)
plot_box(peel_res, pch = 16, ypalette = hcl.colors(10),
support = chosen$final.box$support, box.args = list(lwd = 2))
# User-defined function maximizing the slope of a linear regression
set.seed(5555)
x <- runif(500)
ym <- 0.5 * x + 5 * (x - 0.7) * (x >= 0.7)
y <- ym + rnorm(500, sd = 0.1)
peel_res <- peeling(y, x, beta.stop = 0.1,
obj.fun = function(y, x, inbox){
dat <- data.frame(y, x)
coef(lm(y ~ x, data = dat[inbox,]))[2]
})
par(mfrow = c(1,2))
plot_trajectory(peel_res, type = "b", pch = 16, col = "cornflowerblue",
support = 0.3, abline.pars = list(lwd = 2, col = "indianred"))
plot_box(peel_res, pch = 16, ypalette = hcl.colors(10),
support = 0.3, box.args = list(lwd = 2))
lines(sort(x), ym[order(x)], col = "red", lwd = 2)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.