# paretoTail: Pareto tail modeling for income distributions In laeken: Estimation of Indicators on Social Exclusion and Poverty

 paretoTail R Documentation

## Pareto tail modeling for income distributions

### Description

Fit a Pareto distribution to the upper tail of income data. Since a theoretical distribution is used for the upper tail, this is a semiparametric approach.

### Usage

``````paretoTail(
x,
k = NULL,
x0 = NULL,
method = "thetaPDC",
groups = NULL,
w = NULL,
alpha = 0.01,
...
)
``````

### Arguments

 `x` a numeric vector. `k` the number of observations in the upper tail to which the Pareto distribution is fitted. `x0` the threshold (scale parameter) above which the Pareto distribution is fitted. `method` either a function or a character string specifying the function to be used to estimate the shape parameter of the Pareto distibution, such as `thetaPDC` (the default). See “Details” for requirements for such a function and “See also” for available functions. `groups` an optional vector or factor specifying groups of elements of `x` (e.g., households). If supplied, each group of observations is expected to have the same value in `x` (e.g., household income). Only the values of every first group member to appear are used for fitting the Pareto distribution. `w` an optional numeric vector giving sample weights. `alpha` numeric; values above the theoretical `1 - ``alpha` quantile of the fitted Pareto distribution will be flagged as outliers for further treatment with `reweightOut` or `replaceOut`. `...` addtional arguments to be passed to the specified method.

### Details

The arguments `k` and `x0` of course correspond with each other. If `k` is supplied, the threshold `x0` is estimated with the ```n - k``` largest value in `x`, where `n` is the number of observations. On the other hand, if the threshold `x0` is supplied, `k` is given by the number of observations in `x` larger than `x0`. Therefore, either `k` or `x0` needs to be supplied. If both are supplied, only `k` is used.

The function supplied to `method` should take a numeric vector (the observations) as its first argument. If `k` is supplied, it will be passed on (in this case, the function is required to have an argument called `k`). Similarly, if the threshold `x0` is supplied, it will be passed on (in this case, the function is required to have an argument called `x0`). As above, only `k` is passed on if both are supplied. If the function specified by `method` can handle sample weights, the corresponding argument should be called `w`. Additional arguments are passed via the ... argument.

### Value

An object of class `"paretoTail"` with the following components:

 `x` the supplied numeric vector. `k` the number of observations in the upper tail to which the Pareto distribution has been fitted. `groups` if supplied, the vector or factor specifying groups of elements. `w` if supplied, the numeric vector of sample weights. `method` the function used to estimate the shape parameter, or the name of the function. `x0` the scale parameter. `theta` the estimated shape parameter. `tail` if `groups` is not `NULL`, this gives the groups with values larger than the threshold (scale parameter), otherwise the indices of observations in the upper tail. `alpha` the tuning parameter `alpha` used for flagging outliers. `out` if `groups` is not `NULL`, this gives the groups that are flagged as outliers, otherwise the indices of the flagged observations.

Andreas Alfons

### References

A. Alfons and M. Templ (2013) Estimation of Social Exclusion Indicators from Complex Surveys: The R Package laeken. Journal of Statistical Software, 54(15), 1–25. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.18637/jss.v054.i15")}

A. Alfons, M. Templ, P. Filzmoser (2013) Robust estimation of economic indicators from survey samples based on Pareto tail modeling. Journal of the Royal Statistical Society, Series C, 62(2), 271–286.

`reweightOut`, `shrinkOut`, `replaceOut`, `replaceTail`, `fitPareto`

`thetaPDC`, `thetaWML`, `thetaHill`, `thetaISE`, `thetaLS`, `thetaMoment`, `thetaQQ`, `thetaTM`

### Examples

``````data(eusilc)

## gini coefficient without Pareto tail modeling
gini("eqIncome", weights = "rb050", data = eusilc)

## gini coefficient with Pareto tail modeling

# estimate threshold
ts <- paretoScale(eusilc\$eqIncome, w = eusilc\$db090,
groups = eusilc\$db030)

# estimate shape parameter
fit <- paretoTail(eusilc\$eqIncome, k = ts\$k,
w = eusilc\$db090, groups = eusilc\$db030)

# calibration of outliers
w <- reweightOut(fit, calibVars(eusilc\$db040))
gini(eusilc\$eqIncome, w)

# winsorization of outliers
eqIncome <- shrinkOut(fit)
gini(eqIncome, weights = eusilc\$rb050)

# replacement of outliers
eqIncome <- replaceOut(fit)
gini(eqIncome, weights = eusilc\$rb050)

# replacement of whole tail
eqIncome <- replaceTail(fit)
gini(eqIncome, weights = eusilc\$rb050)

``````

laeken documentation built on May 29, 2024, 4:42 a.m.