Pareto tail modeling for income distributions

Share:

Description

Fit a Pareto distribution to the upper tail of income data. Since a theoretical distribution is used for the upper tail, this is a semiparametric approach.

Usage

1
2
paretoTail(x, k = NULL, x0 = NULL, method = "thetaPDC", groups = NULL,
  w = NULL, alpha = 0.01, ...)

Arguments

x

a numeric vector.

k

the number of observations in the upper tail to which the Pareto distribution is fitted.

x0

the threshold (scale parameter) above which the Pareto distribution is fitted.

method

either a function or a character string specifying the function to be used to estimate the shape parameter of the Pareto distibution, such as thetaPDC (the default). See “Details” for requirements for such a function and “See also” for available functions.

groups

an optional vector or factor specifying groups of elements of x (e.g., households). If supplied, each group of observations is expected to have the same value in x (e.g., household income). Only the values of every first group member to appear are used for fitting the Pareto distribution.

w

an optional numeric vector giving sample weights.

alpha

numeric; values above the theoretical 1 - alpha quantile of the fitted Pareto distribution will be flagged as outliers for further treatment with reweightOut or replaceOut.

...

addtional arguments to be passed to the specified method.

Details

The arguments k and x0 of course correspond with each other. If k is supplied, the threshold x0 is estimated with the n - k largest value in x, where n is the number of observations. On the other hand, if the threshold x0 is supplied, k is given by the number of observations in x larger than x0. Therefore, either k or x0 needs to be supplied. If both are supplied, only k is used.

The function supplied to method should take a numeric vector (the observations) as its first argument. If k is supplied, it will be passed on (in this case, the function is required to have an argument called k). Similarly, if the threshold x0 is supplied, it will be passed on (in this case, the function is required to have an argument called x0). As above, only k is passed on if both are supplied. If the function specified by method can handle sample weights, the corresponding argument should be called w. Additional arguments are passed via the ... argument.

Value

An object of class "paretoTail" with the following components:

x

the supplied numeric vector.

k

the number of observations in the upper tail to which the Pareto distribution has been fitted.

groups

if supplied, the vector or factor specifying groups of elements.

w

if supplied, the numeric vector of sample weights.

method

the function used to estimate the shape parameter, or the name of the function.

x0

the scale parameter.

theta

the estimated shape parameter.

tail

if groups is not NULL, this gives the groups with values larger than the threshold (scale parameter), otherwise the indices of observations in the upper tail.

alpha

the tuning parameter alpha used for flagging outliers.

out

if groups is not NULL, this gives the groups that are flagged as outliers, otherwise the indices of the flagged observations.

Author(s)

Andreas Alfons

References

A. Alfons and M. Templ (2013) Estimation of Social Exclusion Indicators from Complex Surveys: The R Package laeken. Journal of Statistical Software, 54(15), 1–25. URL http://www.jstatsoft.org/v54/i15/

A. Alfons, M. Templ, P. Filzmoser (2013) Robust estimation of economic indicators from survey samples based on Pareto tail modeling. Journal of the Royal Statistical Society, Series C, 62(2), 271–286.

See Also

reweightOut, shrinkOut, replaceOut, replaceTail, fitPareto

thetaPDC, thetaWML, thetaHill, thetaISE, thetaLS, thetaMoment, thetaQQ, thetaTM

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
data(eusilc)


## gini coefficient without Pareto tail modeling
gini("eqIncome", weights = "rb050", data = eusilc)


## gini coefficient with Pareto tail modeling

# estimate threshold
ts <- paretoScale(eusilc$eqIncome, w = eusilc$db090,
    groups = eusilc$db030)

# estimate shape parameter
fit <- paretoTail(eusilc$eqIncome, k = ts$k,
    w = eusilc$db090, groups = eusilc$db030)

# calibration of outliers
w <- reweightOut(fit, calibVars(eusilc$db040))
gini(eusilc$eqIncome, w)

# winsorization of outliers
eqIncome <- shrinkOut(fit)
gini(eqIncome, weights = eusilc$rb050)

# replacement of outliers
eqIncome <- replaceOut(fit)
gini(eqIncome, weights = eusilc$rb050)

# replacement of whole tail
eqIncome <- replaceTail(fit)
gini(eqIncome, weights = eusilc$rb050)

Want to suggest features or report bugs for rdrr.io? Use the GitHub issue tracker.