getBreaks: Compute break points for categorizing (semi-)continuous...
In simPop: Simulation of Complex Synthetic Data Information

getBreaks

R Documentation

Compute break points for categorizing (semi-)continuous variables

Description

Compute break points for categorizing continuous or semi-continuous variables using (weighted) quantiles. This is a utility function that is useful for writing custom wrapper functions such as simEUSILC.

Usage

getBreaks(
  x,
  weights = NULL,
  zeros = TRUE,
  lower = NULL,
  upper = NULL,
  equidist = TRUE,
  probs = NULL,
  strata = NULL
)

Arguments

`x`	a numeric vector to be categorized.
`weights`	an optional numeric vector containing sample weights.
`zeros`	a logical indicating whether `x` is semi-continuous, i.e., contains a considerable amount of zeros. See “Details” on how this affects the behavior of the function.
`lower`, `upper`	optional numeric values specifying lower and upper bounds other than minimum and maximum of `x`, respectively.
`equidist`	a logical indicating whether the (positive) break points should be equidistant or whether there should be refinements in the lower and upper tail (see “Details”).
`probs`	a numeric vector of probabilities with values in `[0, 1]` giving quantiles to be used as (positive) break points. If supplied, this is preferred over `equidist`.
`strata`	an optional vector specifying a strata variable (e.g household ids). if specified, the mean of `x` (and also of `weights` if specified) is computed within each strata before calculating the breaks.

Details

If equidist is TRUE, the behavior is as follows. If zeros is TRUE as well, the 0%, 10%, ..., 90% quantiles of the negative values and the 10%, 20%, ..., 100% of the positive values are computed. These quantiles are then used as break points together with 0. If zeros is not TRUE, on the other hand, the 0%, 10%, ..., 100% quantiles of all values are used.

If equidist is not TRUE, the behavior is as follows. If zeros is not TRUE, the 1%, 5%, 10%, 20%, 40%, 60%, 80%, 90%, 95% and 99% quantiles of all values are used for the inner part of the data (instead of the equidistant 10%, ..., 90% quantiles). If zeros is TRUE, these quantiles are only used for the positive values while the quantiles of the negative values remain equidistant.

Note that duplicated values among the quantiles are discarded and that the minimum and maximum are replaced with lower and upper, respectively, if these are specified.

The (weighted) quantiles are computed with the function quantileWt.

Value

A numeric vector of break points.

Author(s)

Andreas Alfons and Bernhard Meindl

Examples


data(eusilcS)

# semi-continuous variable, positive break points equidistant
getBreaks(eusilcS$netIncome, weights=eusilcS$rb050)

# semi-continuous variable, positive break points not equidistant
getBreaks(eusilcS$netIncome, weights=eusilcS$rb050,
    equidist = FALSE)

simPop documentation built on May 29, 2024, 5:20 a.m.