R/fitgroup_da.R
In GB2group: Estimation of the Generalised Beta Distribution of the Second Kind from Grouped Data

Documented in fitgroup.da

#' Estimation of the Dagum distribution from group data
#'
#' The function \code{fitgroup.da} implements the estimation of the Dagum distribution from group
#' data in form of income shares using the equally weighted minimum distance (EWMD) and the 
#' optimally weighted minimum distance (OMD) estimators.
#'
#' @inheritParams fitgroup.gb2
#' @return the function \code{fitgroup.da} returns the following objects:
#'   \itemize{
#'     \item \code{ewmd.estimation} Matrix containing the parameters of the Dagum distribution estimated
#'        by EWMD and, if \code{se.ewmd = TRUE}, their standard errors.
#'     \item \code{ewmd.rss} Residual sum of squares of the EWMD estimation.
#'     \item \code{omd.estimation} Matrix containing the parameters of the Dagum distribution estimated
#'        by OMD and, if \code{se.omd = TRUE}, their standard errors.
#'     \item \code{omd.rss} Weighted residual sum of squares of the OMD estimation.
#'     \item \code{gini.estimation} Vector with the survey Gini index and the estimated Gini
#'      indices using EWMD and OMD estimates whenever possible.
#'   }
#' @details The Generalised Beta of the Second Kind (GB2) is a general class of distributions that is
#' acknowledged to provide an accurate fit to income data (McDonald 1984; McDonald and Mantrala,1995).
#' The Dagum distribution is a particular case of this model with \eqn{q = 1}, defined in terms of
#' the cumulative distribution function as follows:
#' \deqn{F(x; a, b, p) = \bigg(1+\bigg(\frac{x}{b}\bigg)^{-a}\bigg)^{-p} }
#'
#' where \eqn{b} is the scale parameter and \eqn{a, p} are the shape parameters that define the
#' heaviness of the tail and the skewness of the distribution.
#'
#' The function \code{fitgroup.da} estimates the parameters of the Dagum distribution using grouped data in form of
#' income shares. These data must have been generated by setting the proportion of observations in each
#' group before sampling, so that the population proportions are fixed, whereas income shares are random
#' variables. Examples of this type of data can be found in the largest datasets of grouped data,
#' includingThe World Income Inequality Database (UNU-WIDER, 2017), PovcalNet (World Bank, 2018) or the World Wealth
#' and Income Database (Alvaredo et al., 2018).
#'
#' For EWMD, numerical optimisation is achieved using the Levenberg-Marquardt Algorithm via \code{\link[minpack.lm]{nlsLM}}
#' Conventionally, moment estimates of a restricted model are taken as initial values. A potential
#' limitation of this method is that, as the dimensionality of the parameter
#' space increases, it is more difficult to achieve global convergence. Although it seems
#' quite intuitive that the moment estimates of the restricted model might be a good starting
#' point, the optimization could converge to
#' a local minimum, which might lead to inaccurate estimates of the parameters.
#'
#' To provide different non-arbitrary combinations of starting values, we propose to
#' define a sequence of numbers (provided by \code{grid}). For each value in this
#' sequence, the moment estimate of one of the parameters is obtained using the survey Gini index,
#' assuming that the other one is equal to the grid value. Using this procedure,
#' we end up with as many combinations of initial values as values in the grid,
#' which are used to obtain different sets of estimates, keeping the one with the smallest
#' residual sum of squares. Although we cannot ensure that our estimates belong to
#' the global minimum, this procedure covers a larger proportion of the parameter
#' space than just using the moment estimates of a
#' particular sub-model. See Jorda et al. (2018) for details.
#'
#' This method, however, does not provide
#' an estimate for the scale parameter because the Lorenz curve is independent to scale. The scale
#' parameter is estimated by equating the sample mean, specified by \code{pc.inc}, to the population
#' mean of the Dagum distribution. Because EWMD does not use the optimal
#' covariance matrix of the moment conditions, the standard errors of the parameters
#' are obtained by Monte Carlo simulation. Please be aware that the estimation of the standard errors
#' might take a long time, especially if the sample size is large.
#'
#' \code{fitgroup.da} also implements a two-stage OMD estimator. In the first stage, EWMD estimates
#' are obtained as described above, which are used to compute a first stage estimator
#' of the weighting matrix. The weighting matrix is used in the second stage to obtain optimally
#' weighted estimates of the parameters. The numerical optimisation is performed using
#' \code{\link{optim}} with the BFGS method. If \code{optim} reports an error, the L-BFGS method
#' is used. EWMD estimates are used as initial values for the optimisation algorithm. The OMD estimation
#'  incorporates the optimal weight matrix, thus making possible to derive the asymptotic standard
#'  errors of the parameters using results from Beach and Davison(1983) and Hajargasht and
#'  Griffiths (2016). As in the EWMD estimation, the scale parameter is obtained by matching the
#'  population mean of the Dagum distribution to the sample mean. Hence, the standard error of the scale
#'  parameter is estimated by Monte Carlo simulation.
#'
#' The Gini index of the Dagum distribution is computed using the function
#' \code{simgini.da} which makes use of \code{gini.d}.
#' If this function reports NaN, the Gini index is estimated by Monte
#' Carlo simulation of 10^6 samples of size N = 10^6.
#'
#'
#' @references
#'  Alvaredo, F., A. Atkinson, T. Piketty, E. Saez, and G. Zucman. The World Wealth and Income Database.
#'
#' Beach, C.M. and R. Davidson (1983): Distribution-free statistical inference with
#' Lorenz curves and income shares, \emph{The Review of Economic Studies}, 50, 723 - 735.
#'
#'  Hajargasht, G. and W.E. Griffiths (2016): Inference for Lorenz Curves, Tech. Rep.,
#'  The University of Melbourne.
#'
#'  Jorda, V., Sarabia, J.M., & Jäntti, M. (2018). Estimation of income inequality from grouped data.
#'  arXiv preprint arXiv:1808.09831.
#'
#'  McDonald, J.B. (1984): Some Generalized Functions for the Size Distribution of Income,
#'  \emph{Econometrica}, 52, 647 - 665.
#'
#'  McDonald, J.B. and A. Mantrala (1995): The distribution of personal income: revisited,
#'  \emph{Journal of Applied Econometrics}, 10, 201 - 204.
#'
#'  UNU-WIDER (2018). World Income Inequality Database (WIID3.4).
#'
#'  World Bank (2018). PovcalNet Data Base. Washington, DC: World Bank. \url{http://iresearch.worldbank.org/PovcalNet/home.aspx}.
#'
#' @export
#' @examples
#' fitgroup.da(y = c(9, 13, 17, 22, 39), gini.e = 0.29)
#'
#' @importFrom minpack.lm nlsLM nls.lm.control
#' @importFrom stats resid coef uniroot optim
#' @export
#'
fitgroup.da <- function(y, x = rep(1 / length(y), length(y)), gini.e, pc.inc = NULL, se.omd = FALSE, se.ewmd = FALSE, se.scale = FALSE, N = NULL, nrep = 10^3, grid = 1:20, rescale = 1000, gini = FALSE) {
  if(length(y) != length(x)) {
    stop("x and y must be of the same length")
  }
  if(length(y) < 4) {
    stop("At least four points of the Lorenz curve are required to perform the estimation")
  }
  if(!is.numeric(gini.e)) {
    stop("Gini index is not numeric")
  }
  if(gini.e < 0 | gini.e > 1) {
    stop("Gini index must be between 0 and 1")
  }
  if(!is.numeric(pc.inc) & !is.null(pc.inc)) {
    stop("Per capita income is not numeric")
  }
  if(is.numeric(pc.inc)) {
    if(pc.inc <= 0){
      stop("Per capita GDP should be positive")
    }
  }
  if(!is.numeric(rescale)) {
    stop("Rescale is not numeric")
  }
  if(rescale <= 0) {
    stop("Rescale must be positive")
  }
  if(sum(grid <= 0) != 0) {
    stop("Grid must be a secuence of positive numbers")
  }
  share <- as.vector(y[!is.na(y)])
  share <- as.numeric(share)/sum(share)
  share <- cumsum(share)[-length(share)]
  cprob <- as.vector(x[!is.na(x)])
  cprob <- as.numeric(cprob)/sum(cprob)
  cprob <- cumsum(cprob)[-length(cprob)]

  par.a  <- rep(0, length(grid))
  par.p  <- rep(0, length(grid))
  rss <- rep(1000, length(grid))

  for (j in (1:length(grid))) {
    ginit <- function(a) {
      gini.d(c(a, grid[j])) - gini.e
    }
    free.p <- try(uniroot(ginit, c(0.5, 1000)), silent = TRUE)
    if('try-error'%in%class(free.p)) {
      ginit <- function(a) {
        gini.d(c(grid[j], a)) - gini.e
      }
      free.p <- try(uniroot(ginit, c(0.5, 1000)), silent = TRUE)
      if(!'try-error'%in%class(free.p)) {
        regress <- try(suppressWarnings(nlsLM(share ~ (lc.gb2(c(A, 1, P, 1), cprob)), algorithm ="port",
          start = list(A = grid[j], P = free.p$root),lower=c(0, 0), control = nls.lm.control(maxiter = 1000))),
          silent = TRUE)
        if(!'try-error'%in%class(regress)) {
          par.a[j] <- coef(regress)[1]
          par.p[j] <- coef(regress)[2]
          rss[j] <- sum(resid(regress)^2)
        }
      }
    }
    else{
      regress <- try(suppressWarnings(nlsLM(share ~ (lc.gb2(c(A, 1, P, 1), cprob)), algorithm ="port",
        start = list(A = free.p$root, P = grid[j]), lower = c(0, 0), control = nls.lm.control(maxiter = 1000))),
        silent = TRUE)
      if(!'try-error'%in%class(regress)){
        par.a[j] <- coef(regress)[1]
        par.p[j] <- coef(regress)[2]
        rss[j] <- sum(resid(regress)^2)
      }
    }
  }
  rg <- which.min(rss)
  nls.rss <- min(rss)
  nls.a <- par.a[rg]
  nls.p <- par.p[rg]
  if(1 <= 1 / nls.a) {
    temp.b <- NA
    print("Unable to compute the scale parameter and the OMD estimation. Condition for the existence of the first moment violated: 1 <= 1 / a")
  }
  if(is.null(pc.inc)) {
    temp.b <- NA
    print("Unable to compute the scale parameter and the OMD estimation. Per capita GDP not provided")
  }
  if(!is.null(pc.inc) & (1 > 1 / nls.a)) {
    incpc <- pc.inc / rescale
    temp.b <- scale.gb2(c(nls.a, nls.p, 1), incpc)
  }
  if(1 <= 2 / nls.a){
    print("Unable to compute OMD estimates of the parameters. Condition for the existence of the second moment violated: 1 <= 2 / a")
  }
  if(1 <= 2 / nls.a | is.na(temp.b)) {
    gmm.coef <- matrix(NA, 1, 3)
    gmm.se <- matrix(NA, 1, 3)
    gmm.rss <- NA
  }
  else{
    regress <- try(opt.gmm.da(cprob, share, init.est = c(nls.a, temp.b, nls.p), cons.est = c(nls.a, temp.b, nls.p)))
    if('try-error'%in%class(regress$opt1)) {
      print("Unable to compute OMD estimates of the parameters. The weight martrix cannot be inverted. Try changing the value of rescale")
      gmm.coef <- matrix(NA, 1, 3)
      gmm.se <- matrix(NA, 1, 3)
      gmm.rss <- NA
    }
    else {
      gmm.rss <- regress$opt1$value
      gmm.coef <- regress$opt1$par
      gmm.coef[2] <- scale.gb2(c(gmm.coef[1], gmm.coef[3], 1), pc.inc)
      gmm.se <- matrix(NA, 1, 3)
    }
  }
  nls.estimation <- matrix(NA, 2, 3)
  nls.coef <- matrix(c(nls.a, temp.b * rescale, nls.p), 1, 3)
  nls.se <- matrix(NA, 1, 3)

  if (se.ewmd == TRUE) {
    if (is.null(N)) {
      print("Unable to compute the standard errors. Please provide the sample size")
    }
    else {
      se.calc <- simsd.da(x = x, theta = nls.coef, N = N, nrep = nrep, se.scale = se.scale, factor.r = rescale)
      nls.se <- se.calc$nls.se
      if(!is.na(temp.b)) {
        if(se.omd == TRUE){
          gmm.se <- gmmse.da(gmm.coef, cprob, N)
        }
        if(se.scale == TRUE){
          gmm.se[2] <- se.calc$sd.scale
        }
      }
    }
  }
  if (se.ewmd == FALSE & se.omd == TRUE) {
    if (is.null(N)) {
      print("Unable to compute the standard errors. Please provide the sample size")
    }
    else {
      if(!is.na(temp.b)) {
        gmm.se <- gmmse.da(gmm.coef, cprob, N)
        if(se.scale == TRUE){
          gmm.se[2] <- simsd.da(x = x, theta = nls.coef, N = N, nrep = nrep, se.scale = se.scale, factor.r = rescale)$sd.scale
        }
      }
    }
  }
  nls.estimation[1, ] <- nls.coef
  nls.estimation[2, ] <- nls.se
  colnames(nls.estimation) <- c("a", "b", "p")
  row.names(nls.estimation) <- c("Coef.", "se")

  gmm.estimation <- matrix(NA, 2, 3)
  gmm.estimation[1, ] <- gmm.coef
  gmm.estimation[2, ] <- gmm.se
  colnames(gmm.estimation) <- c("a", "b", "p")
  row.names(gmm.estimation) <- c("Coef.", "se")

  grouped.data <- rbind(share, cprob)
  row.names(grouped.data) <- c("Income", "Population")

  if (gini == TRUE) {
    if (!is.na(gmm.rss)) {
      gmm.gini <- simgini.da(gmm.coef)
    }
    else {gmm.gini <- NA}
    nls.gini <- simgini.da(nls.coef)
    gini.estimation <- matrix(NA, 1, 3)
    colnames(gini.estimation) <- c("Survey", "EWMD estimate", "OMD estimate")
    gini.estimation[1] <- gini.e
    gini.estimation[2] <- nls.gini
    gini.estimation[3] <- gmm.gini
    out2 <- list(grouped.data = grouped.data, distribution = "Dagum", ewmd.estimation = nls.estimation, ewmd.rss = nls.rss, omd.estimation = gmm.estimation, omd.rss = gmm.rss,
      gini.estimation = gini.estimation)
  }
  else {
    out2 <- list(grouped.data = grouped.data, distribution = "Dagum", ewmd.estimation = nls.estimation, ewmd.rss = nls.rss, omd.estimation = gmm.estimation, omd.rss = gmm.rss)
  }
  return(out2)
}

Any scripts or data that you put into this service are public.

GB2group documentation built on Jan. 26, 2021, 5:06 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

GB2group
Estimation of the Generalised Beta Distribution of the Second Kind from Grouped Data

R/fitgroup_da.R
In GB2group: Estimation of the Generalised Beta Distribution of the Second Kind from Grouped Data

Defines functions fitgroup.da

Documented in fitgroup.da

Try the GB2group package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

GB2group Estimation of the Generalised Beta Distribution of the Second Kind from Grouped Data

R/fitgroup_da.R In GB2group: Estimation of the Generalised Beta Distribution of the Second Kind from Grouped Data

Defines functions fitgroup.da

Documented in fitgroup.da

Try the GB2group package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

GB2group
Estimation of the Generalised Beta Distribution of the Second Kind from Grouped Data

R/fitgroup_da.R
In GB2group: Estimation of the Generalised Beta Distribution of the Second Kind from Grouped Data