npiv: Nonparametric Instrumental Variable Estimation and Inference

View source: R/npiv.R

npivR Documentation

Nonparametric Instrumental Variable Estimation and Inference

Description

npiv performs nonparametric a structural function h0 and its derivatives using a B-spline sieve. It also constructs uniform confidence bands for h0 and its derivative.

Sieve dimensions are determined in a data-dependent way if not provided by the user, via the methods described in Chen, Christensen, and Kankanala (2024). This data-driven choice of sieve dimension ensures estimators of h0 and its derivatives converge at the optimal sup-norm rate. The resulting uniform confidence bands for h0 and its derivatives also converge at the minimax rate up to log factors; see Chen, Christensen, and Kankanala (2024).

If sieve dimensions are provided by the user, npiv implements the bootstrap-based procedure of Chen and Christensen (2018) to construct uniform confidence bands based on undersmoothing for h0 and its derivatives.

The methods in npiv apply to estimation and inference on a nonparametric regression function as a special case.

Usage

npiv(...)

## S3 method for class 'formula'
npiv(formula,
     data=NULL,
     newdata=NULL,
     subset=NULL,
     na.action="na.omit",
     call,
     ...)

## Default S3 method:
npiv(Y,
     X,
     W,
     X.eval=NULL,
     X.grid=NULL,
     alpha=0.05,
     basis=c("tensor","additive","glp"),
     boot.num=99,
     check.is.fullrank=FALSE,
     deriv.index=1,
     deriv.order=1,
     grid.num=50,
     J.x.degree=3,
     J.x.segments=NULL,
     K.w.degree=4,
     K.w.segments=NULL,
     K.w.smooth=2,
     knots=c("uniform","quantiles"),
     progress=TRUE,
     ucb.h=TRUE,
     ucb.deriv=TRUE,
     W.max=NULL,
     W.min=NULL,
     X.min=NULL,
     X.max=NULL,
     ...)

Arguments

formula

a symbolic description of the model to be fit.

data

an optional data frame containing the variables in the model.

newdata

an optional data frame in which to look for variables with which to predict (i.e., predictors in X passed in X.eval which must contain identically named variables).

subset

an optional vector specifying a subset of observations to be used in the fitting process (see additional details about how this argument interacts with data-dependent bases in the ‘Details’ section of the model.frame documentation).

na.action

a function which indicates what should happen when the data contain NAs. The default is set by the na.action setting of options, and is na.fail if that is unset. The ‘factory-fresh’ default is na.omit. Another possible value is NULL, no action. Value na.exclude can be useful.

call

the original function call (this is passed internally by npiv). It is not recommended that the user set this.

Y

dependent variable vector.

X

matrix of endogenous regressors.

W

matrix of instrumental variables. Set W=X for nonparametric regression.

X.eval

optional matrix of evaluation data for the endogenous regressors.

X.grid

optional vector of grid points for X when determining model complexity. Default (X.grid=NULL) uses 50 equally spaced points (can be changed in grid.num) over the support of each X variable.

alpha

nominal size of the uniform confidence bands. Default is 0.05 for 95% uniform confidence bands.

basis

basis type (if X or W are multivariate), a character string. Options are:

tensor tensor product basis. Default option.

additive additive basis for additively separable models.

glp generalized B-spline polynomial basis.

boot.num

number of bootstrap replications.

check.is.fullrank

check that X and W have full rank. Default is FALSE.

deriv.index

integer indicating the column of X for which to compute the derivative.

deriv.order

integer indicating the order of derivative to be computed.

grid.num

number of grid points for each X variable if X.grid is not provided.

J.x.degree

B-spline degree (integer or vector of integers of length ncol(X)) for approximating the structural function. Default is degree=3 (cubic B-spline).

J.x.segments

B-spline number of segments (integer or vector of integers of length ncol(X)) for approximating the structural function. Default is NULL. If either J.x.segments=NULL or K.w.segments=NULL, these are both chosen automatically using npiv_choose_J.

K.w.degree

B-spline degree (integer or vector of integers of lenth ncol(W)) for estimating the nonparametric first-stage. Default is degree=4 (quartic B-spline).

K.w.segments

B-spline number of segments (integer or vector of integers of length ncol(W)) estimating the nonparametric first stage. Defulat is NULL. If either J.x.segments=NULL or K.w.segments=NULL, these are both chosen automatically using npiv_choose_J.

K.w.smooth

non-negative integer. Basis for the nonparametric first-stage uses 2^{K.w.smooth} more B-spline segments for each instrument than the basis approximating the structural function. Default is 2. Setting K.w.smooth=0 uses the same number of segments for X and W.

knots

knots type, a character string. Options are:

quantiles interior knots are placed at equally spaced quantiles (equal number of observations lie in each segment).

uniform interior knots are placed at equally spoaced intervals over the support of the variable. Default option.

progress

whether to display progress bar or not. Default is TRUE.

ucb.h

whether to compute a uniform confidence band for the structural function. Default is TRUE.

ucb.deriv

whether to compute a uniform confidence band for the derivative of the structural function. Default is TRUE.

W.min

lower bound on the support of each W variable. Default is min(W).

W.max

upper bound on the support of each W variable. Default is max(W).

X.min

lower bound on the support of each X variable. Default is min(X).

X.max

upper bound on the support of each X variable. Default is max(X).

...

optional arguments

Details

npiv estimates and constructs uniform confidence bands for a nonparametric structural function h_0 and its derivatives in the model Y=h_0(X)+U,\quad E[U|W]=0\quad{(\rm almost\, surely).} Estimation is performed using nonparametric two-stage least-squares with a B-spline sieve. The key tuning parameter is the dimension J of the sieve used to approximate h_0. The dimension is tuned via modifying the number and placement of interior knots in the B-spline basis (equivalently, the number of segments of the basis). Sieve dimensions can be user-provided or data-determined using the procedure of Chen, Christensen, and Kankanala (2024).

Typical usages mirror ivreg (see above and below for a list of options and the example at the bottom of this document)

    foo <- npiv(y~x|w)
    foo <- npiv(y~x1+x2|w1+w2)
    foo <- npiv(Y=y,X=x,W=w)
  

npiv can be used in two ways:

1. Data-driven sieve dimension is invoked if either K.w.segments or J.x.segments are unspecified or NULL (the default). Sieve dimensions are chosen automatically using npiv_choose_J. Uniform confidence bands for h_0 and its derivatives are constructed using the data-driven method of Chen, Christensen, and Kankanala (2024).

2. The user may specify the sieve dimensions of both bases by specifying values for K.w.segments and J.x.segments. Uniform confidence bands for h_0 and its derivatives are constructed using the method of Chen and Christensen (2018).

npiv can also be used for estimation and inference on a nonparametric regression function by setting W=X.

Value

npiv returns a npiv object. The generic function fitted extracts the estimated values for the sample (or evaluation data, if provided), while the generic function residuals extracts the sample residuals. The generic function summary provides a simple model summary. The generic function plot also plots the estimated function and derivative, together with uniform confidence bands.

The function npiv returns a list with the following components:

h

estimated structural function evaluated at the sample data (or evaluation data, if provided).

residuals

residuals for the sample data.

deriv

estimated derivative of the structural function evaluated at the sample data (or evaluation data, if provided).

asy.se

pre-asymptotic standard errors for the estimator of the structural function evaluated at the sample data (or evaluation data, if provided)

deriv.asy.se

pre-asymptotic standard errors for the estimator of the derivative of the structural function evaluated at the sample data (or evaluation data, if provided).

deriv.index

index for the estimated derivative.

deriv.order

order of the estimated derivative.

K.w.degree

value of K.w.degree used.

K.w.segments

value of K.w.segments used (will be data-determined if not provided).

J.x.degree

value of J.x.degree used.

J.x.segments

value of J.x.segments used (will be data-determined if not provided).

beta

vector of estimated spline coefficients.

Author(s)

Jeffrey S. Racine <racinej@mcmaster.ca>, Timothy Christensen <timothy.christensen@yale.edu>

References

Chen, X. and T. Christensen (2018). “Optimal Sup-norm Rates and Uniform Inference on Nonlinear Functionals of Nonparametric IV Regression.” Quantitative Economics, 9(1), 39-85. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.3982/QE722")}

Chen, X., T. Christensen and S. Kankanala (2024). “Adaptive Estimation and Uniform Confidence Bands for Nonparametric Structural Functions and Elasticities.” Review of Economic Studies, forthcoming. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1093/restud/rdae025")}

See Also

npiv_choose_J

Examples

## load data
data("Engel95", package = "npiv")

## sort on logexp (the regressor) for plotting purposes
Engel95 <- Engel95[order(Engel95$logexp),] 
attach(Engel95)

## Estimate the Engel curve for food using logwages as an instrument
fm1 <- npiv(food ~ logexp | logwages)

## Plot the estimated Engel curve and data-driven uniform confidence bands
plot(logexp,food,
     ylab="Food Budget Share",
     xlab="log(Total Household Expenditure)",
     xlim=c(4.75, 6.25),
     ylim=c(0, 0.4),
     main="",
     type="p",
     cex=.5,
     col="lightgrey")
lines(logexp,fm1$h,col="blue",lwd=2,lty=1)
lines(logexp,fm1$h.upper,col="blue",lwd=2,lty=2)
lines(logexp,fm1$h.lower,col="blue",lwd=2,lty=2)

## Estimate the Engel curve using pre-specified sieve dimension 
## (dimension 5 for logexp, dimension 9 for logwages)
fm2 <- npiv(food ~ logexp | logwages,
            J.x.segments = 2,
            K.w.segments = 5)

## Plot uniform confidence bands based on undersmoothing
lines(logexp,fm2$h.upper,col="red",lwd=2,lty=2)
lines(logexp,fm2$h.lower,col="red",lwd=2,lty=2)

## Plot pointwise confidence bands based on pre-asymptotic standard errors
lines(logexp,fm2$h+1.96*fm2$asy.se,col="red",lwd=2,lty=3)
lines(logexp,fm2$h-1.96*fm2$asy.se,col="red",lwd=2,lty=3)

legend("topright",
       legend=c("Data-driven Estimate",
                "Data-driven UCBs",
                "Undersmoothed UCBs",
                "Pointwise CBs"),
       col=c("blue","blue","red","red"),
       lty=c(1,2,2,3),
       lwd=c(2,2,2,2))

## Plot the data-driven estimate of the derivative of the Engel curve
plot(logexp,fm1$deriv,col="blue",lwd=2,lty=1,type="l",
     ylab="Derivative of Food Budget Share",
     xlab="log(Total Household Expenditure)",
     xlim=c(4.75, 6.25),
     ylim=c(-1,1))

## Plot data-driven uniform confidence bands for the derivative
lines(logexp,fm1$h.upper.deriv,col="blue",lwd=2,lty=2)
lines(logexp,fm1$h.lower.deriv,col="blue",lwd=2,lty=2)

## Plot uniform confidence bands based on undersmoothing
lines(logexp,fm2$h.upper.deriv,col="red",lwd=2,lty=2)
lines(logexp,fm2$h.lower.deriv,col="red",lwd=2,lty=2)

## Plot pointwise confidence bands based on pre-asymptotic standard errors
lines(logexp,fm2$deriv+1.96*fm2$deriv.asy.se,col="red",lwd=2,lty=3)
lines(logexp,fm2$deriv-1.96*fm2$deriv.asy.se,col="red",lwd=2,lty=3)

legend("topright",
       legend=c("Data-driven Estimate",
                "Data-driven UCBs",
                "Undersmoothed UCBs",
                "Pointwise CBs"),
       col=c("blue","blue","red","red"),
       lty=c(1,2,2,3),
       lwd=c(2,2,2,2))

JeffreyRacine/npiv documentation built on Jan. 17, 2025, 8:29 p.m.