discreteQ: Uniform inference on quantile and quantile effect functions...

Description Usage Arguments Details Value References See Also Examples

View source: R/discreteQ.R

Description

The function discreteQ provides uniform confidence bands for the unconditional quantile function, the quantile treatment effect function or the decomposition of the observed difference between the quantile function of an outcome for two groups. This function implements the algorithms suggested in Chernozhukov et al. (2019). See also the vignette available with vignette("discreteQ", package="discreteQ").

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
discreteQ(
  y,
  d = NULL,
  x = NULL,
  w = NULL,
  decomposition = FALSE,
  q.range = c(0.05, 0.95),
  method = NULL,
  bsrep = 200,
  alpha = 0.05,
  ys = NULL,
  cl = NULL,
  cluster = NULL,
  old.res = NULL,
  return.boot = FALSE,
  list_of_seeds = NULL,
  return.seeds = FALSE,
  estim.glm = fastglm::fastglm,
  par.estim = NULL
)

Arguments

y

outcome (vector of length n).

d

treatment or group variable (binary vector of length n).

x

matrix of regressors (n x p matrix). x must include a constant if appropriate (i.e. the constant is NOT added automatically). The function model.matrix can be used to create the matrix x if factor variables or interaction terms are present.

w

sampling weights (vector of length n).

decomposition

logical, indicating if the decomposition of the observed difference between the control and treated quantile functions should be performed. By default, inference about the quantile treatment effect function is performed.

q.range

vector of length 2 that provides the lowest and highest quantile indexes. The uniform bands will cover the whole QF and QE function in this range. Default is c(0.05,0.95).

method

link function for the distribution regression model. Possible values: "logit" (the default), "probit", "cloglog", "lpm" (linear probability model, i.e. OLS estimation of binary regressions), "cauchit", "drp" (incomplete gamma link function suggested in Chernozhukov, Fernandez-Val, Melly and Wütrich (2019). The maximum likelihood estimator of the fully parametric Poisson regression model is used if method = "poisson". This argument is relevant only if there are regressors in x.

bsrep

number of bootstrap replications. Default: 200.

alpha

confidence level. Default: 0.05.

ys

specifies the thresholds at which the cumulative distribution function will be estimated. This argument can be specified either as a scalar that will be interpreted as the number of thresholds or as a vector that will contain the values of the thresholds. By default, the cdf is estimated at all distinct observed values of the outcome in the sample if there are less than 100 unique values and at the 99 empirical percentiles of the outcomes if there are more than 100 distinct values.

cl

a cluster object as returned by the function makeCluster. Parallel computing is not used if this argument is not specified.

cluster

vector that specifies to which group each observation belongs. The cluster weighted bootstrap is used if this argument is specified. Otherwise, simple random sampling is assumed.

old.res

a discreteQ object (obtained with the argument return.boot set to TRUE). This argument allows for instance to change the size alpha or the quantile range q.range without recomputing the estimates.

return.boot

logical scalar. The results of the bootstrap are return in the matrix F.b when this argument is set to TRUE.

list_of_seeds

list of seeds for L'Ecuyer RNG. The length of this list must be the same as the value of the argument bsrep.

return.seeds

logical scalar. The list of seeds is returned by the function if this argument is set to TRUE.

estim.glm

function used to estimate the binary regressions if method is "logit", "probit", "cloglog" or "cauchit". The default is the function fastglm() from the package fastglm. Tested alternatives: glm.fit, glm2::glm.fit2, speedglm::speedglm.wfit.

par.estim

arguments to be passed to the function selected by estim.glm. For instance, the arguments method, tol and maxit of fastglm can be set.

Details

The function discreteQ can be used in three different ways:

  1. First, if no treatment variable is specified in the argument d, then the the command will provide uniform bands for the unconditional quantile and distribution functions of y.

  2. Second, if a treatment variable is specified in the argument d but decomposition=FALSE, then the command will provide uniform bands that cover the quantile and distribution functions of the treated and control outcomes and the quantile treatment effect function (the difference between both quantile functions).

  3. Third, if a treatment variable is specified in the argument d and decomposition=TRUE, then the command provides uniform bands that cover both distribution functions, both quantile functions as well as the decomposition of their difference into an explained (by the regressors x) and unexplained component.

The output of the function is a long list of step functions (see below for the details). We recommend to use the plot.discreteQ and summary.discreteQ functions to analyze the results. For further details see the vignette available with vignette("discreteQ", package="discreteQ").

Value

discreteQ returns an object of class "discreteQ". There are methods available for plotting ("plot", see plot.discreteQ) and summarizing ("summary", see summary.discreteQ) "discreteQ" objects. We recommend using them to analyze the results.

The components contained by an object of class "discreteQ" depend on the way the function has been called. We can distinguish the same three cases as above:

  1. If no treatment variable is specified in the argument d, then discreteQ contains the following components:

    Q

    The empirical quantile function of the outcome. This is a function of class stepfun.

    lb.Q

    The lower bound of the uniform confidence band for the unconditional quantile function of the outcome. This is a function of class stepfun.

    ub.Q

    The upper bound of the uniform confidence band for the unconditional quantile function of the outcome. This is a function of class stepfun.

    F

    The empirical distribution function of the outcome. This is a function of class stepfun.

    lb.F

    The lower bound of the uniform confidence band for the unconditional distribution function of the outcome. This is a function of class stepfun.

    ub.F

    The upper bound of the uniform confidence band for the unconditional distribution function of the outcome. This is a function of class stepfun.

    q.range

    Vector of length 2 that contains the lowest and highest quantile indexes. The uniform bands for the quantile function cover the true quantile function in this quantile range. The uniform bands for the distribution function cover the true function in the range of values of the outcome that are between the quantiles corresponding to these indexes.

    ys

    Vector containing the thresholds at which the cumulative distribution has been estimated.

    bsrep

    Scalar containing the number of performed bootstrap replications.

    model

    String scalar that takes the value "univariate" in this case.

    method

    String scalar that takes the value "empirical" in this case.

    F.b

    Matrix with length(ys) rows and bsrep columns. Each columns contains the estimated distribution function for the corresponding bootstrap replication. This object, which can be voluminous, is returned only if return.boot = TRUE.

    seeds

    List of length bsrep containing the seeds used for L'Ecuyer's RNG in the bootstrap replications. This object is returned only if return.seeds = TRUE.

  2. If a treatment variable is specified in the argument d but decomposition=FALSE, then discreteQ contains the components below. Note that the uniform bands jointly cover the true Q0, Q1, F0, F1 and QTE functions with probability 1-alpha.

    Q0

    The estimated unconditional quantile function of the control outcome. This is the quantile function of the outcome that we would observe if all observations had d = 0. If x = NULL, this is simply the empirical quantile function of the outcome for the subsample with d = 0. If regressors have been provided in the argument x, then the estimated conditional distribution of the outcome in the sample with d = 0 is integrated over the distribution of the covariates in the whole sample. This is a function of class stepfun.

    lb.Q0

    The lower bound of the uniform confidence band for the unconditional quantile function of the control outcome. This is a function of class stepfun.

    ub.Q0

    The upper bound of the uniform confidence band for the unconditional quantile function of the control outcome. This is a function of class stepfun.

    Q1

    The estimated unconditional quantile function of the treated outcome. This is the quantile function of the outcome that we would observe if all observations had d = 1. If x = NULL, this is simply the empirical quantile function of the outcome for the subsample with d = 1. If regressors have been provided in the argument x, then the estimated conditional distribution of the outcome in the sample with d = 1 is integrated over the distribution of the covariates in the whole sample. This is a function of class stepfun.

    lb.Q1

    The lower bound of the uniform confidence band for the unconditional quantile function of the treated outcome. This is a function of class stepfun.

    ub.Q1

    The upper bound of the uniform confidence band for the unconditional quantile function of the treated outcome. This is a function of class stepfun.

    QTE

    The estimated quantile treatment effect function: QTE = Q1 - Q0. This is a function of class stepfun.

    lb.QTE

    The lower bound of the uniform confidence band for the quantile treatment effect. This is a function of class stepfun.

    ub.QTE

    The upper bound of the uniform confidence band for the quantile treatment effect. This is a function of class stepfun.

    F0

    The estimated unconditional distribution function of the control outcome. This is the distribution function of the outcome that we would observe if all observations had d = 0. If x = NULL, this is simply the empirical distribution function of the outcome for the subsample with d = 0. If regressors have been provided in the argument x, then the estimated conditional distribution of the outcome in the sample with d = 0 is integrated over the distribution of the covariates in the whole sample. This is a function of class stepfun.

    lb.F0

    The lower bound of the uniform confidence band for the unconditional distribution function of the control outcome. This is a function of class stepfun.

    ub.F0

    The upper bound of the uniform confidence band for the unconditional distribution function of the control outcome. This is a function of class stepfun.

    F1

    The estimated unconditional distribution function of the treated outcome. This is the distribution function of the outcome that we would observe if all observations had d = 1. If x = NULL, this is simply the empirical distribution function of the outcome for the subsample with d = 1. If regressors have been provided in the argument x, then the estimated conditional distribution of the outcome in the sample with d = 1 is integrated over the distribution of the covariates in the whole sample. This is a function of class stepfun.

    lb.F1

    The lower bound of the uniform confidence band for the unconditional distribution function of the treated outcome. This is a function of class stepfun.

    ub.F1

    The upper bound of the uniform confidence band for the unconditional distribution function of the treated outcome. This is a function of class stepfun.

    q.range

    Vector of length 2 that contains the lowest and highest quantile indexes. The uniform bands for the quantile functions cover the true quantile function in this quantile range. The uniform bands for the distribution functions cover the true function in the range of values of the outcome that are between the quantiles corresponding to thes indexes.

    ys0

    Vector containing the thresholds at which the cumulative distribution of the control outcome has been estimated.

    ys1

    Vector containing the thresholds at which the cumulative distribution of the treated outcome has been estimated.

    bsrep

    Scalar containing the number of performed bootstrap replications.

    model

    String scalar that takes the value "qte" in this case.

    method

    String scalar. Name of the method used to estimate the conditional distribution functions.

    F.b

    Matrix with length(ys0) + length(ys1) rows and bsrep columns. Each columns contains the estimated distribution functions for the corresponding bootstrap replication. The first length(ys0) rows contains the estimated distribution function for the control outcome F0. The remaining length(ys1) rows contains the estimated distribution function for the treated outcome F1. This object, which can be voluminous, is returned only if return.boot = TRUE.

    seeds

    List of length bsrep containing the seeds used for L'Ecuyer's RNG in the bootstrap replications. This object is returned only if return.seeds = TRUE.

  3. If a treatment variable is specified in the argument d and decomposition=TRUE, then discreteQ contains the components below. Note that the uniform bands jointly cover the true functions Q0, Q1, Qc, F0, F1, and Fc as well as the difference between any two of these functions with probability 1-alpha.

    Q0

    The empirical quantile function of the outcome for the group with d = 0. This is a function of class stepfun.

    lb.Q0

    The lower bound of the uniform confidence band for the unconditional quantile function of outcome in the group with d = 0. This is a function of class stepfun.

    ub.Q0

    The upper bound of the uniform confidence band for the unconditional quantile function of outcome in the group with d = 0. This is a function of class stepfun.

    Q1

    The empirical quantile function of the outcome for the group with d = 1. This is a function of class stepfun.

    lb.Q1

    The lower bound of the uniform confidence band for the unconditional quantile function of outcome in the group with d = 1. This is a function of class stepfun.

    ub.Q1

    The upper bound of the uniform confidence band for the unconditional quantile function of outcome in the group with d = 1. This is a function of class stepfun.

    Qc

    The estimated counterfactual quantile function of the outcome that we would observe if the distribution of the covariates was the same as that of the group with d = 0 and the conditional distribution of the outcome given the covariates was the same as that of the group with d = 1. This is a function of class stepfun.

    lb.Qc

    The lower bound of the uniform confidence band for the counterfactual quantile function. This is a function of class stepfun.

    ub.Qc

    The upper bound of the uniform confidence band for the counterfactual quantile function. This is a function of class stepfun.

    F0

    The empirical distribution function of the outcome for the group with d = 0. This is a function of class stepfun.

    lb.F0

    The lower bound of the uniform confidence band for the unconditional distribution function of outcome in the group with d = 0. This is a function of class stepfun.

    ub.F0

    The upper bound of the uniform confidence band for the unconditional distribution function of outcome in the group with d = 0. This is a function of class stepfun.

    F1

    The empirical distribution function of the outcome for the group with d = 1. This is a function of class stepfun.

    lb.F1

    The lower bound of the uniform confidence band for the unconditional distribution function of outcome in the group with d = 1. This is a function of class stepfun.

    ub.F1

    The upper bound of the uniform confidence band for the unconditional distribution function of outcome in the group with d = 1. This is a function of class stepfun.

    Fc

    The estimated counterfactual distribution function of the outcome that we would observe if the distribution of the covariates was the same as that of the group with d = 0 and the conditional distribution of the outcome given the covariates was the same as that of the group with d = 1. This is a function of class stepfun.

    lb.Fc

    The lower bound of the uniform confidence band for the counterfactual distribution function. This is a function of class stepfun.

    ub.Fc

    The upper bound of the uniform confidence band for the counterfactual distribution function. This is a function of class stepfun.

    observed

    The difference between the observed quantile function for the group d = 1 and and the observed quantile function for the group with d = 0: Q1 - Q0. This is a function of class stepfun.

    lb.observed

    The lower bound of the uniform confidence band for Q1 - Q0. This is a function of class stepfun.

    ub.observed

    The upper bound of the uniform confidence band for Q1 - Q0. This is a function of class stepfun.

    composition

    The difference between the observed quantile function for the group with d = 1 and the counterfactual quantile function: Q1 - Qc. This is a function of class stepfun.

    lb.composition

    The lower bound of the uniform confidence band for Q1 - Qc. This is a function of class stepfun.

    ub.composition

    The upper bound of the uniform confidence band for Q1 - Qc. This is a function of class stepfun.

    unexplained

    The difference between the counterfactual quantile function and the quantile function for the group with d = 0: Qc - Q0. This is a function of class stepfun.

    lb.unexplained

    The lower bound of the uniform confidence band for Qc - Q0. This is a function of class stepfun.

    ub.unexplained

    The upper bound of the uniform confidence band for Qc - Q0. This is a function of class stepfun.

    q.range

    Vector of length 2 that contains the lowest and highest quantile indexes. The uniform bands for the quantile functions cover the true quantile function in this quantile range. The uniform bands for the distribution functions cover the true function in the range of values of the outcome that are between the quantiles corresponding to thes indexes.

    ys0

    Vector containing the thresholds at which the cumulative distribution of the outcome for the group with d = 0 has been estimated.

    ys1

    Vector containing the thresholds at which the cumulative distribution of the outcome for the group with d = 1 and the counterfactual distribution function have been estimated.

    bsrep

    Scalar containing the number of performed bootstrap replications.

    model

    String scalar that takes the value "distribution" in this case.

    method

    String scalar. Name of the method used to estimate the conditional distribution functions.

    F.b

    Matrix with length(ys0) + 2 * length(ys1) rows and bsrep columns. Each columns contains the estimated distribution functions for the corresponding bootstrap replication. The first length(ys0) rows contains the estimated distribution function for group 0, F0. The next length(ys1) rows contains the estimated distribution function for group 1, F1. The remaining length(ys1) rows contains the estimated counterfactual distribution, Fc. This object, which can be voluminous, is returned only if return.boot = TRUE.

    seeds

    List of length bsrep containing the seeds used for L'Ecuyer's RNG in the bootstrap replications. This object is returned only if return.seeds = TRUE.

References

Chernozhukov, Victor, Iván Fernández-Val, Blaise Melly, and Kaspar Wüthrich. 2019. “Generic Inference on Quantile and Quantile Effect Functions for Discrete Outcomes.” arXiv Preprint, https://arxiv.org/abs/1608.05142.

See Also

For continous outcomes, see also counterfactual from the package Counterfactual.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
##Example 1: univariate quantile function
#Generate the data
set.seed(1234)
outcome <- rpois(100, 3)
#Estimate the functions and the confidence bands
results1 <- discreteQ(outcome)
#Table containing the estimated quantile function with its confidence band
summary(results1)
#Plot the estimated quantile function with its confidence band
plot(results1)

##Example 2: quantile treatment effect function (QTE)
#Generate the data
set.seed(1234)
treatment <- c(rep(0,100), rep(1,100))
reg <- rbinom(200, 1, 0.4 + treatment*0.2)
outcome <- rpois(200, lambda = 1+reg)
#Estimate the functions and the confidence bands (takes about 1 minute)
results2 <- discreteQ(outcome, treatment, cbind(1, reg))
#Table containing the estimated QTE function with its confidence band
summary(results2)
#Plot the QTE with its confidence band
plot(results2)
#Plot the quantile function of the control outcome
plot(results2, which="Q0")
#Add the quantile function of the treated outcome
plot(results2, which="Q1", add=TRUE, shift=0.2, col.l="dark green", col.b="light green")

##Example 3: decomposition
#Generate the data
set.seed(1234)
group <- c(rep(0,100), rep(1,100))
reg <- rbinom(200, 1, 0.2 + group*0.6)
outcome <- rpois(200, lambda = exp(-3+4*reg))
#Estimate the functions and the confidence bands (takes about 30 seconds)
results3 <- discreteQ(outcome, group, cbind(1, reg), decomposition=TRUE)
#Table containing the unexplained component with its confidence band
summary(results3)
#Table containing the difference between the observed quantile functions
summary(results3, which="observed")
#Plot the observed quantile functions and their decomposition
plot(results3)
#Plot only the composition component
plot(results3, which="composition")

bmelly/discreteQ documentation built on May 22, 2021, 7:55 a.m.