knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
View a vector positive vectors as population, calculate discrete GINI inequality measure. This file works out how the ff_dist_gini_vector_pos function works from Fan's REconTools Package. See also the fs_gini_disc from R4Econ. See also the ff_dist_gini_random_var function for GINI for discrete random variables.
There is an vector values (all positive). This could be height information for N individuals. It could also be income information for N individuals. Calculate the GINI coefficient treating the given vector as population. This is not an estimation exercise where we want to estimate population gini based on a sample. The given array is the population. The population is discrete, and only has these N individuals in the length n vector.
See the formula below, note that when the sample size is small, there is a limit to inequality using the formula defined below given each $N$. So for small $N$, can not really compare inequality across arrays with different $N$, can only compare arrays with the same $N$.
Given monotonimcally increasing array $X$, with $x_1,...,x_N$:
The GINI formula used here is: $$ GINI = 1 - \frac{2}{N+1} \cdot \left(\sum_{i=1}^N \sum_{j=1}^{i} x_j\right) \cdot \left( \sum_{i=1}^N x_i \right)^{-1} $$
Derive the formula in the steps below.
$$ \Gamma = \sum_{i=1}^N \frac{1}{N} \cdot \left( \sum_{j=1}^{i} \left( \frac{x_j}{\sum_{\widehat{j}=1}^N x_{\widehat{j}} } \right) \right) $$
With perfect equality $x_i=a$ for all $i$, so need to divide by that.
$$ \Gamma^{\text{equal}} = \sum_{i=1}^N \frac{1}{N} \cdot \left( \sum_{j=1}^{i} \left( \frac{a}{\sum_{\widehat{j}=1}^N a } \right) \right) = \frac{N+1}{N}\cdot\frac{1}{2} $$
As the number of elements of the vecotr increases: $$ \lim_{N \rightarrow \infty}\Gamma^{\text{equal}} = \lim_{N \rightarrow \infty} \frac{N+1}{N}\cdot\frac{1}{2} = \frac{1}{2} $$
Given what we have from above, we obtain the gini formula, divide by total area below 45 degree line.
$$ GINI = 1 - \left(\sum_{i=1}^N \sum_{j=1}^{i} x_j\right) \cdot \left( N \cdot \sum_{i=1}^N x_i \right)^{-1} \cdot \left( \frac{N+1}{N}\cdot\frac{1}{2} \right)^{-1} = 1 - \frac{2}{N+1} \cdot \left(\sum_{i=1}^N \sum_{j=1}^{i} x_j\right) \cdot \left( \sum_{i=1}^N x_i \right)^{-1} $$
Suppose $x_i=0$ for all $i<N$, then:
$$ GINI^{x_i = 0 \text{ except } i=N} = 1 - \frac{2}{N+1} \cdot X_N \cdot \left( X_N \right)^{-1} = 1 - \frac{2}{N+1} $$
$$ \lim_{N \rightarrow \infty} GINI^{x_i = 0 \text{ except } i=N} = 1 - \lim_{N \rightarrow \infty} \frac{2}{N+1} = 1 $$
Note that for small N, for example if $N=10$, even when one person holds all income, all others have 0 income, the formula will not produce gini is zero, but that gini is equal to $\frac{2}{11}\approx 0.1818$. If $N=2$, inequality is at most, $\frac{2}{3}\approx 0.667$.
$$ MostUnequalGINI\left(N\right) = 1 - \frac{2}{N+1} = \frac{N-1}{N+1} $$
The GINI formula just derived is trivial to compute.
There are no package dependencies. This is the ff_dist_gini_vector_pos function. Define the formula here:
# Load Library rm(list = ls()) # Formula, directly implement the GINI formula Following Step 4 above fv_dist_gini_vector_pos_test <- function(ar_pos) { # Check length and given warning it_n <- length(ar_pos) if (it_n <= 100) warning('Data vector has n=',it_n,', max-inequality/max-gini=',(it_n-1)/(it_n + 1)) # Sort ar_pos <- sort(ar_pos) # formula implement fl_gini <- 1 - ((2/(it_n+1)) * sum(cumsum(ar_pos))*(sum(ar_pos))^(-1)) return(fl_gini) }
Generate a number of examples Arrays for testing
# Example Arrays of data ar_equal_n1 = c(1) ar_ineql_n1 = c(100) ar_equal_n2 = c(1,1) ar_ineql_alittle_n2 = c(1,2) ar_ineql_somewht_n2 = c(1,2^3) ar_ineql_alotine_n2 = c(1,2^5) ar_ineql_veryvry_n2 = c(1,2^8) ar_ineql_mostmst_n2 = c(1,2^13) ar_equal_n10 = c(2,2,2,2,2,2, 2, 2, 2, 2) ar_ineql_some_n10 = c(1,2,3,5,8,13,21,34,55,89) ar_ineql_very_n10 = c(1,2^2,3^2,5^2,8^2,13^2,21^2,34^2,55^2,89^2) ar_ineql_extr_n10 = c(1,2^2,3^3,5^4,8^5,13^6,21^7,34^8,55^9,89^10) # Uniform draw testing ar_unif_n1000 = runif(1000, min=0, max=1) # Normal draw testing ar_norm_lowsd_n1000 = rnorm(1000, mean=100, sd =1) ar_norm_lowsd_n1000[ar_norm_lowsd_n1000<0] = 0 ar_norm_highsd_n1000 = rnorm(1000, mean=100, sd =20) ar_norm_highsd_n1000[ar_norm_highsd_n1000<0] = 0 # Beta draw testing ar_beta_mostrich_n1000 = rbeta(1000, 5, 1) ar_beta_mostpoor_n1000 = rbeta(1000, 1, 5) ar_beta_manyrichmanypoor_nomiddle_n1000 = rbeta(1000, 0.5, 0.5)
Now test the example arrays above using the function based no our formula:
# Hard-Code Small N Dist Tests cat('\nSmall N=1 Hard-Code\n') cat('ar_equal_n1:', fv_dist_gini_vector_pos_test(ar_equal_n1), '\n') cat('ar_ineql_n1:', fv_dist_gini_vector_pos_test(ar_ineql_n1), '\n') cat('\nSmall N=2 Hard-Code, converge to 1/3, see formula above\n') cat('ar_ineql_alittle_n2:', fv_dist_gini_vector_pos_test(ar_ineql_alittle_n2), '\n') cat('ar_ineql_somewht_n2:', fv_dist_gini_vector_pos_test(ar_ineql_somewht_n2), '\n') cat('ar_ineql_alotine_n2:', fv_dist_gini_vector_pos_test(ar_ineql_alotine_n2), '\n') cat('ar_ineql_veryvry_n2:', fv_dist_gini_vector_pos_test(ar_ineql_veryvry_n2), '\n') cat('\nSmall N=10 Hard-Code, convege to 9/11=0.8181, see formula above\n') cat('ar_equal_n10:', fv_dist_gini_vector_pos_test(ar_equal_n10), '\n') cat('ar_ineql_some_n10:', fv_dist_gini_vector_pos_test(ar_ineql_some_n10), '\n') cat('ar_ineql_very_n10:', fv_dist_gini_vector_pos_test(ar_ineql_very_n10), '\n') cat('ar_ineql_extr_n10:', fv_dist_gini_vector_pos_test(ar_ineql_extr_n10), '\n') # Uniform Dist Tests cat('\nUNIFORM Distribution\n') cat('ar_unif_n1000:', fv_dist_gini_vector_pos_test(ar_unif_n1000), '\n') # Normal Dist Tests cat('\nNORMAL Distribution\n') cat('ar_norm_lowsd_n1000:', fv_dist_gini_vector_pos_test(ar_norm_lowsd_n1000), '\n') cat('ar_norm_highsd_n1000:', fv_dist_gini_vector_pos_test(ar_norm_highsd_n1000), '\n') # Beta Dist Tests cat('\nBETA Distribution\n') cat('ar_beta_mostpoor_n1000:', fv_dist_gini_vector_pos_test(ar_beta_mostpoor_n1000), '\n') cat('ar_beta_manyrichmanypoor_nomiddle_n1000:', fv_dist_gini_vector_pos_test(ar_beta_manyrichmanypoor_nomiddle_n1000), '\n') cat('ar_beta_mostrich_n1000:', fv_dist_gini_vector_pos_test(ar_beta_mostrich_n1000), '\n') # Example when it does not work ar_unif_n1000_NEGATIVE = runif(1000, min=-1, max=1) cat('\n\nSHOULD/DOES NOT WORK TEST\n ar_unif_n1000_NEGATIVE:', fv_dist_gini_vector_pos_test(ar_unif_n1000_NEGATIVE), '\n')
ff_dist_gini_random_var provides the GINI implementation for a discrete random variable. The procedure is the same as prior, except now each element of the "x" array has element specific weights associated with it. The function can handle unsorted array with non-unique values.
Test and compare ff_dist_gini_random_var provides the GINI implementation for a discrete random variable and ff_dist_gini_vector_pos.
There is a vector of values from 1 to 100, in ascending order. What is the equal-weighted gini, the gini result when smaller numbers have higher weights, and when larger numbers have higher weights?
First, generate the relevant values.
# array ar_x <- seq(1, 100, length.out = 30) # prob array ar_prob_x_unif <- rep.int(1, length(ar_x))/sum(rep.int(1, length(ar_x))) # prob higher at lower values ar_prob_x_lowval_highwgt <- rev(cumsum(ar_prob_x_unif))/sum(cumsum(ar_prob_x_unif)) # prob higher at lower values ar_prob_x_highval_highwgt <- (cumsum(ar_prob_x_unif))/sum(cumsum(ar_prob_x_unif)) # show print(cbind(ar_x, ar_prob_x_unif, ar_prob_x_lowval_highwgt, ar_prob_x_highval_highwgt))
Second, generate GINI values. What should happen?
library(REconTools) ff_dist_gini_vector_pos(ar_x) ff_dist_gini_random_var(ar_x, ar_prob_x_unif) ff_dist_gini_random_var(ar_x, ar_prob_x_lowval_highwgt) ff_dist_gini_random_var(ar_x, ar_prob_x_highval_highwgt)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.