compBagplot: Computations for drawing a bagplot

View source: R/compBagplot.R

compBagplotR Documentation

Computations for drawing a bagplot

Description

Computes all elements of the bagplot, a generalisation of the univariate boxplot to bivariate data. The bagplot can be computed based on halfspace depth, projection depth, skewness-adjusted projection depth and directional projection depth. To draw the actual plot, the function bagplot needs to be called on the result of compBagplot.

Usage

compBagplot(x, type = "hdepth", sizesubset = 500,
            extra.directions = FALSE, options = NULL)

Arguments

x

An n by 2 data matrix.

type

Determines the depth function used to construct the bagplot: "hdepth" for halfspace depth, "projdepth" for projection depth, "sprojdepth" for skewness-adjusted projection depth and "dprojdepth" for directional projection depth.
Defaults to "hdepth".

sizesubset

When computing the bagplot based on halfspace depth, the size of the subset used to perform the main computations. See Details for more information.
Defaults to 500.

extra.directions

Logical indicating whether additional directions should be considered in the computation of the fence for the bagplot based on projection depth or skewness-adjusted projection depth. If set to TRUE an additional 250 equispaced directions are added to the directions defined by the points in x themselves and the center. If FALSE only directions determined by the points in x are considered.
Defaults to FALSE.

options

A list of options to pass to the projdepth, sprojdepth or dprojdepth function. In addition the following option may be specified:

  • max.iter
    The maximum number of iterations in the bisection algorithm used to compute the depth contour corresponding to the cutoff. See depthContour for more information.
    Defaults to 100.

Details

The bagplot has been proposed by Rousseeuw et al. (1999) as a generalisation of the boxplot to bivariate data. It is constructed based on halfspace depth. In the original format the deepest point is indicated by a "+" and is contained in the bag which is defined as the depth region containing the 50% observations with largest depth. The fence is obtained by inflating the bag (relative to the deepest point) by a factor of three. The loop is the convex hull of the observations of x inside the fence. Observations outside the fence are flagged as outliers and plotted with a red star. This function only computes all the components constituting the bagplot. The bagplot itself can be drawn using the bagplot function.

The bagplot may also be defined using other depth functions. When using projection depth, skewness-adjusted projection depth or directional projection depth, the bagplot is build as follows. The center corresponds to the observation with largest depth. The bag is constructed as the convex hull of the fifty percent points with largest depth. Outliers are identified as points with a depth smaller than a cutoff value, see projdepth, sprojdepth and dprojdepthfor the precise definition. The loop is computed as the convex hull of the non-outlying points. The fence is approximated by the convex hull of those points that lie on rays from the center through the vertices of the bag and have a depth that equals the cutoff depth. For a better approximation the user can set the input parameter extraDirections to TRUE such that an additional 250 equally spaced directions on the circle are considered.

The computation of the bagplot based on halfspace depth can be time consuming. Therefore it is possible to limit the bulk of the computations to a random subset of the data. Computations of the halfspace median and the bag are then based on this random subset. The number of points in this subset can be controlled by the optional argument sizesubset.

It is first checked whether the data is found to lie on a line. If so, the routine will give a warning, giving back the dimension of the subspace (being 1) together with the normal vector to that line.

Value

A list with components:

center

Center of the data.
When type = "hdepth", this corresponds with the Tukey median. In other cases this point corresponds to the point with maximal depth.

chull

When type = "hdepth", these are the vertices of the region with maximal halfspace depth. In other cases this is a null vector.

bag

The coordinates of the vertices of the bag.

fence

The coordinates of the vertices of the fence.

datatype

An n by 3 matrix. The first two columns correspond with x. The third column indicates the position of each observation of x in the bagplot: 2 for observations in the bag, 1 for the observations in the fence and 3 for outliers.
Note that points may not be in the same order as in x.

flag

A vector of length n wich is 0 for outliers and 1 for regular observations of x.

depth

The depth of the observations of x.

dimension

If the data are lying in a lower dimensional subspace, the dimension of this subspace.

hyperplane

If the data are lying in a lower dimensional subspace, a direction orthogonal to this subspace.

type

Same as the input parameter type.

Author(s)

P. Segaert based on Fortran code by P.J. Rousseeuw, I. Ruts and A. Struyf.

References

Rousseeuw P.J., Ruts I., Tukey J.W. (1999). The bagplot: A bivariate boxplot. The American Statistician, 53, 382–387.

Hubert M., Van der Veeken S. (2008). Outlier detection for skewed data. Journal of Chemometrics, 22, 235–246.

Hubert M., Rousseeuw P.J., Segaert, P. (2015). Rejoinder to 'Multivariate functional outlier detection'. Statistical Methods & Applications, 24, 269–277.

See Also

bagplot, hdepth, projdepth, sprojdepth, dprojdepth.

Examples

data(bloodfat)
# Result <- compBagplot(bloodfat)
# bagplot(Result)

# The sizesubset argument may be used to control the
# computation time when computing the bagplot based on
# halfspace depth. However results may be unreliable when
# choosing a small subset for the main computations.
# system.time(Result1 <- compBagplot(bloodfat))
# system.time(Result2 <- compBagplot(bloodfat, sizesubset = 100))
# bagplot(Result1)
# bagplot(Result2)

# When using any of the projection depth functions,
# a list of options may be passed down to the corresponding
# outlyingness routines.
options <- list(type = "Rotation",
                ndir = 50,
                stand = "unimcd",
                h = floor(nrow(bloodfat)*3/4))
Result <- compBagplot(bloodfat,
                      type = "projdepth", options = options)
bagplot(Result)

# The fence is computed using the depthContour function.
# To get a smoother fence, one may opt to consider extra
# directions.
options <- list(ndir = 500,
                seed = 36)
Result <- compBagplot(bloodfat,
                      type = "dprojdepth", options = options)
bagplot(Result, plot.fence = TRUE)

options <- list(ndir = 500,
                seed = 36)
Result <- compBagplot(bloodfat,
                      type = "dprojdepth", options = options,
                      extra.directions = TRUE)
bagplot(Result, plot.fence = TRUE)

mrfDepth documentation built on May 29, 2024, 5:04 a.m.