lorenz: Lorenz Curve Based Thresholds and Partitions
In psolymos/opticut: Likelihood Based Optimal Partitioning and Indicator Species Analysis

lorenz

R Documentation

Lorenz Curve Based Thresholds and Partitions

Description

Lorenz curve based thresholds and partitions.

Usage

lorenz(x, n = rep(1, length(x)), na.last = TRUE)

## S3 method for class 'lorenz'
quantile(x, probs = seq(0, 1, 0.25),
    type = c("L", "p"), ...)
iquantile(x, ...)
## S3 method for class 'lorenz'
iquantile(x, values,
    type = c("L", "p"),...)

## S3 method for class 'lorenz'
plot(x, type = c("L", "x"),
    tangent = NA, h = NA, v = NA, ...)

## S3 method for class 'summary.lorenz'
print(x, digits, ...)
## S3 method for class 'lorenz'
summary(object, ...)

Arguments

`x`	a vector of nonnegative numbers for `lorenz`, or an object to plot or summarized.
`n`	a vector of frequencies, must be same length as `x`.
`na.last`	logical, for controlling the treatment of `NA`s. If `TRUE`, missing values in the data are put last; if `FALSE`, they are put first; if `NA`, they are removed (see `order`).
`probs`	numeric vector of probabilities with values in [0,1], as in `quantile`.
`values`	numeric vector of values for which the corresponding population quantiles are to be returned.
`type`	character. For the `plot` method it indicates whether to plot the cumulative distribution quantiles (`"L"`) or ordered but not-cumulated values (`"x"`). For the `quantile` and `iquantile` methods it indicates which of the quantiles (`"L"` or `"p"`) to use.
`tangent`	color value for the Lorenz-curve tangent when plotted. The default `NA` value omits the tangent from the plot.
`h`	color value for the horizontal line for the Lorenz-curve tangent when plotted. The default `NA` value omits the horizontal line from the plot.
`v`	color value for the vertical line for the Lorenz-curve tangent when plotted. The default `NA` value omits the vertical line from the plot.
`digits`	numeric, number of significant digits in output.
`object`	object to summarize.
`...`	other arguments passed to the underlying functions.

Details

The Lorenz curve is a continuous piecewise linear function representing the distribution of abundance (income, or wealth). Cumulative portion of the population: p_i = i / m (i=1,...,m), vs. cumulative portion of abundance: L_i = \sum_{j=1}^{i} x_j * n_j / \sum_{j=1}^{n} x_j * n_j. where x_i are indexed in non-decreasing order (x_i <= x_{i+1}). By convention, p_0 = L_0 = 0. n can represent unequal frequencies.

The following charactersitics of the Lorenz curve are calculated: "t": index where tangent (slope 1) touches the curve; "x[t]", "p[t]", and "L[t]" are values corresponding to index t, x_t is the unmodified input. "S": Lorenz asymmetry coefficient (S = p_t + L_t), S = 1 indicates symmetry. "G": Gini coefficient, 0 is perfect equality, values close to 1 indicate high inequality. "J": Youden index is the (largest) distance between the anti-diagonal and the curve, distance is largest at the tangent point (J = max(p - L) = p_t - L_t).

Value

lorenz returns an object of class lorenz. It is a matrix with m+1 rows (m = length(x)) and 3 columns (p, L, x).

The quantile method finds values of x_i corresponding to quantiles L_i or p_i (depending on the type argument). The iquantile (inverse quantile) method finds quantiles of L_i or p_i corresponding to values of x_i.

The plot method draws a Lorenz curve. Because the object is a matrix, lines and points will work for adding multiple lines.

The summary method returns characteristics of the Lorenz curve.

Author(s)

Peter Solymos <psolymos@gmail.com>

References

Damgaard, C., & Weiner, J. (2000): Describing inequality in plant size or fecundity. Ecology 81:1139–1142. <doi:10.2307/177185>

Schisterman, E. F., Perkins, N. J., Liu, A., & Bondell, H. (2005): Optimal cut-point and its corresponding Youden index to discriminate individuals using pooled blood samples. Epidemiology 16:73–81. <doi:10.1097/01.ede.0000147512.81966.ba>

Youden, W. J. (1950): Index for rating diagnostic tests. Cancer 3:32–5. <doi:10.1002/1097-0142(1950)3:1<32::AID-CNCR2820030106>3.0.CO;2-3>

Examples

set.seed(1)
x <- c(rexp(100, 10), rexp(200, 1))

l <- lorenz(x)
head(l)
tail(l)
summary(l)
summary(unclass(l))

(q <- c(0.05, 0.5, 0.95))
(p_i <- quantile(l, probs=q, type="p"))
iquantile(l, values=p_i, type="p")
(p_i <- quantile(l, probs=q, type="L"))
iquantile(l, values=p_i, type="L")

op <- par(mfrow=c(2,1))
plot(l, lwd=2, tangent=2, h=3, v=4)
abline(0, 1, lty=2, col="grey")
abline(1, -1, lty=2, col="grey")
plot(l, type="x", lwd=2, h=3, v=4)
par(op)

## Lorenz-tangent approach to binarize a multi-level problem
n <- 100
g <- as.factor(sort(sample(LETTERS[1:4], n, replace=TRUE, prob=4:1)))
x <- rpois(n, exp(as.integer(g)))
mu <- aggregate(x, list(g), mean)
(l <- lorenz(mu$x, table(g)))
(s <- summary(l))

plot(l)
abline(0, 1, lty=2)
lines(rep(s["p[t]"], 2), c(s["p[t]"], s["L[t]"]), col=2)

psolymos/opticut documentation built on May 25, 2024, 6:59 a.m.