README.md
In freakonometrics/TopIncome: Estimation of Top Incomes Using Pareto Models

Pareto Models for Top Incomes

Arthur Charpentier & Emmanuel Flachaire

Install the `TopIncomes` library

The TopIncomes library can be installed from github,

library(devtools)
devtools::install_github("freakonometrics/TopIncomes")
library(TopIncomes)

Fitting Pareto Models

n <- 1000
set.seed(123)
x <- repd(n,.5,1,-1)
w <- rgamma(n,10,10)

The Pareto type 1 distribution is bounded from below by a threshold u>0: the cumulative distribution function is $\displaystyle{F(x)=1-\left(\frac{x}{u}\right)^{-\alpha}}$ for $x\geq u$ . The function returns the tail index parameter $\alpha$ and also $\xi=1/\alpha$ .

estim <- MLE.pareto1(data=x, weights=w, threshold=1)
estim

## $alpha
## [1] 3.300653
## 
## $xi
## [1] 0.3029704
## 
## $k
## [1] 1000

The Generalized Pareto distribution is bounded from below by a threshold u>0: the cumulative distribution function is $\displaystyle{F(x)=1-\left[1+\left(\frac{x-u}{\sigma}\right)\right]^{-\alpha}}$ for $x\geq u$ . The function returns $\xi=1/\alpha$ and $\beta=\sigma/\alpha$ .

estim <- MLE.gpd(data=x, weights=w, threshold=1)
estim

## $xi
## [1] 0.4892361
## 
## $mu
## [1] 1
## 
## $beta
## [1] 0.2488107
## 
## $k
## [1] 1000

The Extended Pareto distribution is bounded from below by a threshold u>0 : the cumulative distribution function is $\displaystyle{F(x)=1-\left[\frac{x}{u}\left(1+\delta-\delta\left(\frac{x}{u}\right)^\tau\right)\right]^{-\alpha} }$ for $x\geq u$ . The function returns $\kappa=\delta$ , $\tau$ and $\gamma=1/\alpha$

estim <- EPD(data=x, weights=w)
estim

## $k
## [1] 999
## 
## $gamma
## [1] 0.3737252
## 
## $kappa
## [1] 0.1628108
## 
## $tau
## [1] -3.342535

Application to Income

Let us consider a sample of simulated data, obtained from a Singh-Maddala distribution, and some weights:

qsinmad <- function(u,b,a,q) b*((1-u)^(-1/q)-1)^(1/a)   
rsinmad <- function(n,b,a,q) qsinmad(runif(n), b, a, q)
y=rsinmad(10000,1.14,2.07,1.75)
w=rnorm(10000,1,.2)
df <- data.frame(y,w)
Pareto_diagram(data=df$y, weights=df$w)

The Top_Share function can be used to estimate both the tail index $\alpha$ and the top (100p)%-share, for different thresholds to model a Pareto distribution.

Top_Share(data=df$y, weights=df$w, p=.01, q=c(.1,.05,.01), method="epd")

## $index
## [1] 0.05423371 0.05587803 0.05611615
## 
## $alpha
## [1] 3.542328 3.192067 2.977480
## 
## $tau
## [1] -2.993838 -3.312784 -3.286053
## 
## $kappa
## [1] -0.09802974  0.02217434  0.05944143
## 
## $gamma
## [1] 0.2823002 0.3132766 0.3358545
## 
## $share.index
## [1] 0.01
## 
## $share.pareto
## [1] 0.10 0.05 0.01
## 
## $threshold
## [1] 1.805970 2.320460 3.787607
## 
## $k
## [1] 1009  506   99
## 
## $method
## [1] "epd"

For a few different thresholds, we can show the results with Tables.

Let us consider a Pareto distribution fitted on the 10%, 5% and 1% highest observations:

thresholds=c(.1,.05,.01)
res1=Top_Share(df$y,df$w,p=.01,q=thresholds, method="pareto1")
res2=Top_Share(df$y,df$w,p=.01,q=thresholds, method="gpd")
res3=Top_Share(df$y,df$w,p=.01,q=thresholds, method="epd")

For the tail index, we would have the following Table:

res=rbind((res1$share.pareto),res1$alpha,res2$alpha,res3$alpha)
rownames(res) <- c("q","Pareto 1", "GPD", "EPD")
print("Table of top share indices")

## [1] "Table of top share indices"

res

##              [,1]     [,2]     [,3]
## q        0.100000 0.050000 0.010000
## Pareto 1 2.990486 3.307817 3.277460
## GPD      4.239389 3.095112 2.852529
## EPD      3.542328 3.192067 2.977480

For the top 1% share, we would have the following Table:

res=rbind(res1$share.pareto,res1$index,res2$index,res3$index)
rownames(res) <- c("q","Pareto 1", "GPD", "EPD")
print("Table of top share indices")

## [1] "Table of top share indices"

res

##                [,1]       [,2]       [,3]
## q        0.10000000 0.05000000 0.01000000
## Pareto 1 0.05954651 0.05516742 0.05555417
## GPD      0.05440980 0.05583339 0.05590763
## EPD      0.05423371 0.05587803 0.05611615

For a many different thresholds, we can show a Figure, similar to Hill plot for the Hill estimator of the tail index (Pareto 1 case).

Let us consider a Pareto distribution fitted on the 20% to 1% highest observations:

thresholds=seq(.2,.01,-.01)
res1=Top_Share(df$y,df$w,p=.01,q=thresholds, method="pareto1")
res2=Top_Share(df$y,df$w,p=.01,q=thresholds, method="gpd")
res3=Top_Share(df$y,df$w,p=.01,q=thresholds, method="epd")

For the tail index, we would have the following Figure:

plot(res1$k,res1$alpha, col="blue", main="Pareto tail indices", ylim=c(2,5), type="l")
lines(res2$k,res2$alpha, col="green")
lines(res3$k,res3$alpha, col="red")

For the top 1% share, we would have the following Table:

plot(res1$k,res1$index, col="blue", main="Top 1% share indices", ylim=c(0.02,.08), type="l")
lines(res2$k,res2$index, col="green")
lines(res3$k,res3$index, col="red")