rdcqr: Local composite quantile estimation in regression...

Description Usage Arguments Details Value Examples

View source: R/rdcqr.R

Description

This function computes the local composite quantile regression (LCQR) estimator of treatment effect for both sharp and fuzzy regression discontinuity (RD) designs. It also computes the bias-corrected estimator and adjusts its standard error by incorporating the variability due to bias-correction.

Usage

1
2
3
4
rdcqr(y, x, fuzzy = NULL, t0 = 0, cutoff = 0, q = 5, bandwidth =
  "rot", kernel.type = "triangular", maxit = 100, tol = 1.0e-4, parallel =
  TRUE, numThreads = "default", grainsize = 1, llr.residuals = TRUE,
  ls.derivative = TRUE, fixed.n = FALSE)

Arguments

y

A vector of treatment outcomes.

x

A vector of covariates.

fuzzy

A vector of treatment assignments in a fuzzy RD. Defaults to NULL in a sharp RD. 1 for receiving the treatment and 0 otherwise.

t0

Treatment effect under the null. Defaults to 0.

cutoff

Cutoff for treatment assignment. Defaults to 0.

q

Number of quantiles to be used in estimation. Defaults to 5. It needs to be an odd number.

bandwidth

In a sharp RD, if the supplied bandwidth is a numeric vector of length two, the first element is the bandwidth for data below the cutoff and the second element is the bandwidth for data above the cutoff. In a fuzzy RD, the supplied bandwidth vector needs to have four elements in it: the first two bandwidths are for treatment outcomes below and above the cutoff and the last two bandwidths are for treatment assignments below and above the cutoff. If it is a string, the following types of bandwidth selector are provided:

  1. rot: Rule-of-thumb bandwidth selector. Two bandwidths are used on each side of the cutoff, each of which is a transform of the rule-of-thumb bandwidth for the local linear regression.

  2. adj.mseone: One bandwidth based on the adjusted MSE function.

  3. adj.msetwo: Two bandwidths based on the adjusted MSE function.

  4. msetwo: Two bandwidths based on the MSE function, each of which is the MSE-optimal bandwidth.

kernel.type

Kernel type that includes

  1. triangular: triangular kernel. kernID = 0.

  2. biweight: biweight kernel. kernID = 1.

  3. epanechnikov: Epanechnikov kernel. kernID = 2.

  4. gaussian: Gaussian kernel. kernID = 3.

  5. tricube: tricube kernel. kernID = 4.

  6. triweight: triweight kernel. kernID = 5.

  7. uniform: uniform kernel kernID = 6.

maxit

Maximum iteration number in the MM algorithm for quantile estimation. Defaults to 100.

tol

Convergence criterion in the MM algorithm. Defaults to 1.0e-4.

parallel

A logical value specifying whether to use parallel computing. Defaults to TRUE.

numThreads

Number of threads used in parallel computation. The option auto uses all threads and the option default uses the number of cores minus 1. Defaults to numThreads = "default".

grainsize

Minimum chunk size for parallelization. Defaults to 1.

llr.residuals

Whether to use residuals from the local linear regression as the input to compute the LCQR standard errors and the corresponding bandwidths. Defaults to TRUE. If this option is set to TRUE, the treatment effect estimate and the bias-correction is still done in LCQR. We use the same kernel function used in LCQR in the local linear regression to obtain the residuals and use them to compute the unadjusted and adjusted asymptotic standard errors and the bandwidths. This option will improve the speed. One can use this option to get a quick estimate of the standard errors when the sample size is large. To use residuals from the LCQR method, set llr.residuals = FALSE.

ls.derivative

Whether to use a global quartic and quintic polynomial to estimate the second and third derivatives of the conditional mean function. Defaults to TRUE.

fixed.n

Whether to compute the fixed-n results instead of asymptotic results. If this option is turned on, all bias-correction and standard error calculation are based on the fixed-n (nonasymptotic) approach. Defaults to FALSE.

Details

This is the main function of the package and it estimates the treatment effect for both sharp and fuzzy RD designs. The LCQR estimate is obtained from an iterative algorithm and the estimation speed is slow compared to that of the local linear regression. Most computation time is spend on the calculation of the standard errors. If residuals from LCQR are used, i.e., llr.residuals = FALSE, the code to compute the standard error and bandwidth is paralleled and the argument numThreads is set to the number of physical cores minus one by default. The options parallel, numThreads, and grainsize are relevant when llr.residuals = FALSE.

To further speed up computation, use the option llr.residuals = TRUE. This is particularly suitable when the sample size is large.

The two arguments maxit and tol have an impact on the computation speed. For example, using maxit = 500 and tol = 1e-6 will take much longer to complete compared to the default setting, though the results are more precise. Our limited experience with some of the popular RD data suggests that the treatment effect can usually be estimated precisely with low computation cost while the standard errors may have non-negligible change when one changes maxit and tol. This certainly depends on the data. One should experiment with different settings during estimation.

In estimating the bandwidths adj.mseone, adj.msetwo, and msetwo, we need an estimate for the second and third derivative of the conditional mean function. By default, the second derivative is estimated by a global quartic polynomial and the third derivative is estimated by a global quintic polynomial. The option ls.derivative, when set to FLASE, uses the LCQR method for derivative estimation. Sometimes it can be difficult for a nonparametric method such as LCQR to estimate derivatives of higher order, which is also true for local linear regression.

Value

rdcqr returns a list with the following components:

estimate

Treatment effect estimate and the bias-corrected treatment estimate

se

Asymptotic standard error and the adjusted asymptotic standard error

bws

Bandwidths used in estimation. There are two and four bandwidths for sharp and fuzzy RD, respectively. In a fuzzy RD, the first two bandwidths are associated with the treatment outcome variable below and above the cutoff. The last two bandwidths are associated with the treatment assignment variable below and above the curoff.

nr_t

The null-restricted t statistic to mitigate weak identification in a fuzzy RD. The second element is the bias-corrected and s.e.-adjusted version of this test.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
## Not run: 
# An example of using the Head Start data.
data(headstart)
y = headstart$mortality
x = headstart$poverty

# Use the defaul rule-of-thumb bandwidth in estimation.
# Also use the residuals from a local linear regression to estimate the
# standard errors of LCQR.
rdcqr(y, x, bandwidth = "rot", llr.residuals = TRUE)

# Supply bandwidths to data below and above the cutoff 0.
# The poverty (x) variable is preprocessed to have a cutoff equal to 0.
rdcqr(y, x, bandwidth = c(10, 3), llr.residuals = TRUE)

# Try the MSE-optimal bandwidths for data below and above the cutoff.
rdcqr(y, x, bandwidth = "msetwo", llr.residuals = TRUE)

# Use residuals from a LCQR estimation when computing the standardd errors
# It is slow for large data sets but can be more accurate. By default, the option
# parallel = TRUE is used.
rdcqr(y, x, llr.residuals = FALSE)

## End(Not run)

xhuang20/rdcqr documentation built on July 1, 2021, 5:22 a.m.