Introduction

As briefly described in the README.md file, the decr package provides functions to decompose observed differences in distributional statistics of a numeric variable (y) between two groups. An example of such analysis is the observed difference between the average wages of men and women.

The decomposition proposed here is of the Blinder (1973) and Oaxaca (1973) type, whose aim is to separate the observed difference between average wages of two groups in two components:

The decompositions in decr are performed nonparametrically with the reweighting approach of Di Nardo, Fortin and Lemieux (DFL, 1996) and, for what concerns the common support, it is possible to decompose the difference of any distributional statistics (not only the mean). In the case of the difference between the averages (of y) of two groups, a decomposition in 4 components is done, as in Nopo (2008). The reweighting factors, in the common support, are estimated directly from the ratio of the relative frequencies observed in the strata for the two groups. In this sense, the estimation approach is very similar to coarsened exact matching (Iacus, King and Porro 2011), except that here it is possible to consider, if any, sampling weights (the references are given in the README.md file).

In the following section we present a short example with the invented_wages dataset, included in the package.

Example

First of all, we load the decr package and the invented_wages dataset.

library(decr)
data(invented_wages)
str(invented_wages)

Every row of the dataset consists in a fake/invented individual worker. For every individual there is his/her gender, the economic sector in which he/she works, his/her level of education and his/her wage. Furthermore there is a column with the sampling weights.

In order to perform a decomposition of the observed difference in a distributional statistic of a numeric variable (y) between men and women, we procede with the following steps:

The steps described above can be done with the reweight_strata_all2 function:

# Establishment of common support between two groups, based on
# the distributions of their characteristics (variables);
# computes counterfactual weights (w_BA and w_AB), that can be used
# to balance the joint distribution of characteristics of one group
# to that of the other group
r01 <- reweight_strata_all2(
  data = invented_wages,
  treatment = "gender", 
  variables = c("sector", "education"),
  y = "wage", 
  weights = "sample_weights")

str(r01)

The result of reweight_strata_all2 is a data frame with the same number of observations of the starting data. The reweighting factors $\widehat{\Psi}_{AB}(X_i)$ are stored in the column rw_AB. Note that these reweighting factors are estimated for group A (men) in the common support; for all other observations, rw_AB is set equal to 1. The reweighting factors are then multiplied by the sampling weights and the result is stored in the column w_AB. If we estimate the joint distribution of characteristics of men, weighted by w_AB, we obtain the same frequencies as those observed for women.

At this point, we have all the elements to estimate a so called counterfactual distribution of wages, that is, the distribution of wages of men as if they had the same characteristics of women. This distribution mixes the characteristics of women with the wage structure of men and can also be interpreted as the wage distribution of women as if they were paid according to the wage structure of men.

The counterfactual distribution allows the estimation of a counterfactual distributional statistic, which is the fundamental quantity to perform the decomposition of the observed difference.

As a first example, we do the decomposition of the difference of the observed means of wages between men and women with the functions nopodec_mean and nopodec:

# nopodec_mean: estimates all the elements to perform a decomposition
# of the difference of the average of an outcome variable (y)
# between the two groups in four components (see later...)
s01 <- nopodec_mean(r01)
s01


# nopodec: decomposition of the observed difference between averages 
# of y of group A and B in 4 components, as in Nopo (2008)
n01_AB <- nopodec(s01, counterfactual = "AB")
n01_AB

delta_tot is the observed difference between the average wages of men and women (the observed average wage of men is greater than that of women). This observed difference is decomposed in the following four components:

Note that the observed difference is always computed as group A minus group B average wage. This can be checked with margin_mean, which estimates the observed average wages for the two groups:

m01 <- margin_mean(r01)
m01

The column ybar reports the observed average wages of men and women, whose difference is the same as that reported in the output of nopodec as delta_tot.

As a second example, we perform the decomposition of the difference of quantiles of wages between men and women. This can be done with the functions dec_quantile followed by dec_:

d01 <- dec_quantile(r01, probs = 0.5)
d01


# dec_: decomposition of the quantile difference (of wages) between two groups
# (in 2 components) in the common support. 
d01_p50_AB <- dec_(d01, counterfactual = "AB")
d01_p50_AB

Note that in the quantile case, unlike in the average/mean case, the decomposition is done only in the common support. The example above decomposes the difference between the median wage of men and women in the common support in two components (delta_X and delta_S). The total difference between the average wages is delta_tot, while the difference observed only in the common support is delta_tot_CS (which in this case corresponds to the total observed difference, delta_tot). The residual part of the difference, that can be explained by the fact that the two groups have some combinations of characteristics not comparable, is reported in delta_AB, which is zero in this case. Unlike in the average/mean case, this component can not be further decomposed in delta_A and delta_B.
The decomposition can be done for any other quantile level (between 0 and 1), by using the argument probs in the function dec_quantile. For example, for the 25th percentile:

d02 <- dec_quantile(r01, probs = 0.25)
d02

d02_p25_AB <- dec_(d02, counterfactual = "AB")
d02_p25_AB
# Check that there are no missings (NA) in the common support
with(r01, table(common_support, useNA = "always"))


gibonet/decr documentation built on Jan. 5, 2024, 7:26 a.m.