options(prompt = 'R> ', continue = '+ ') knitr::opts_chunk$set(cache=TRUE,dev='tikz') library(npiv)
Nonparametric instrumental variables (NPIV) methods are well-established and have been adopted in a range of empirical fields. While there is a long tradition of their use in demand estimation in economics [@BCK; @BHP; @BerryHaile2014], recent applications include causal inference in biostatistics [@WangZhiTT2018], reinforcement learning [@chen2022well; @grettonRL2021], and causal machine learning [@causalML]. In this paper, we introduce the \proglang{R} [@R] package \pkg{npiv} [@NPIV]. This package implements several estimation and inference methods for NPIV models, including with data-driven choice of tuning parameters, thereby making the methods newly accessible to \proglang{R} users. The package \pkg{npiv} is available from the Comprehensive R Archive Network (CRAN) at \url{https://CRAN.R-project.org/package=npiv}.
We are interested in estimating a nonparametric structural function, denoted $h_0$, linking a scalar outcome variable $Y$ and a vector of regressors $X$ through the relation \begin{equation}\label{eq:npiv} Y = h_0(X) + u, \end{equation} where $u$ is a disturbance term. The function $h_0$ is not necessarily the conditional mean of $Y$ given $X$ (as in nonparametric regression) but rather has a particular structural or causal interpretation. Equivalently, $X$ is endogenous in the sense that $\E[u|X]$ is not necessarily zero.^[To simplify exposition, we shall omit the "almost surely" qualifier when specifying conditional moment restrictions.] NPIV methods allow for the estimation of $h_0$ using a vector of instrumental variables $W$ for which $\E[u|W] = 0$. For instance, in a demand estimation context the variable $Y$ denotes the quantity demanded, $X$ denotes the price, and $W$ is a "cost-shifter" that causes variation in the marginal cost of production. Derivatives of $h_0$ may also be of interest as they have important "structural" interpretations as elasticities or other marginal effects.
This article discusses the \proglang{R} package \pkg{npiv}, which implements novel, fully data-driven NPIV estimation and inference methods proposed in @CCK. The methods are developed in the context of sieve NPIV estimators, which are implemented in a very simple manner using two-stage least-squares (TSLS) [@AC; @NP].
The main methods implemented in \pkg{npiv} are (i) a data-driven choice of sieve dimension for sieve NPIV estimators and (ii) data-driven uniform confidence bands (UCBs) for the structural function $h_0$ and its derivatives. As shown in @CCK, the single data-driven choice of sieve dimension is minimax sup-norm rate adaptive for estimating both $h_0$ and derivatives of $h_0$. Moreover, the data-driven uniform confidence bands contain the entire true structural function and its derivatives with desired asymptotic coverage probability and contract at the optimal rate (up to log terms).
Additional functionality of the \pkg{npiv} package allows for constructing (non-data driven) UCBs based on the methods proposed in @CCQE as well as pointwise confidence intervals for $h_0(x)$ or its derivatives at a point $x$ based on the method of @ChenPouzo. All methods in \pkg{npiv} subsume nonparametric regression as a special case.
The package is itself a novel contribution: all of the above methods proposed in @CCK, @CCQE, and @ChenPouzo are implemented for the first time in the \proglang{R} package \pkg{npiv}. There do not exist implementations with a similar scope in any other software languages, either. The \proglang{R} packages \pkg{np} [@NPPkg] and \pkg{crs} [@CRS] have functionality for estimation of $h_0$ and its derivative in NPIV models. However, these packages implement different estimators than \pkg{npiv}, namely kernel and regression spline methods, respectively, using Tikhonov and Landweber--Fridman regularization. Further, \pkg{npiv} is the first to implement methods for constructing uniform confidence bands for NPIV models (both data-driven and based on undersmoothing).
The remainder of this article is organized as follows. Section 2 reviews sieve NPIV estimators and describes the data-driven methods from @CCK and the UCB constructions from @CCQE. Section 3 gives a description of the main functionality of the package \pkg{npiv} and describes its implementation. Section 4 then illustrates the methods in the context of Engel curve estimation based on household consumption data.
The exposition in this section closely follows @CCK. We being by briefly describing sieve NPIV estimators. Consider approximating an unknown structural function $h_0$ using a linear combination of $J$ basis functions, denoted $\psi_{J1},\ldots,\psi_{JJ}$: \begin{equation}\label{eq:h0_approx} h_0(x) \approx \sum_{j=1}^J c_{Jj} \psi_{Jj}(x) = (\psi^J(x))'c_J\,, \end{equation} where $\psi^J(x) = (\psi_{J1}(x),\ldots,\psi_{JJ}(x))^\top$ is a vector of basis functions and $c_J = (c_{J1},\ldots,c_{JJ})^\top$ is a vector of coefficients. Combining (\ref{eq:npiv}) and (\ref{eq:h0_approx}), we obtain [ Y = (\psi^J(X))^\top c_J + \mathrm{bias}J + u\,, ] where $\mathrm{bias}_J = h_0(X) - (\psi^J(X))^\top c_J$ and $u = Y - h_0(X)$. Provided the bias term is "small" relative to $u$, we have an approximate linear IV model where $\psi^J(X)$ is a $J\times 1$ vector of "endogenous variables" and $c_J$ is a vector of unknown "parameters". One can then estimate $c_J$ using TSLS using a vector of $K \geq J$ transformations $b^K(W)= (b{K1}(W),\ldots,b_{KK}(W))^\top$ of $W$ as instruments.
Given data $(X_i,Y_i,W_i){i=1}^n$, the TSLS estimator of $c_J$ is simply [ \hat c_J = \left(\mathbf \Psi_J^\top \mathbf P_K^{\phantom \top} \mathbf \Psi_J^{\phantom \top} \right)^{-} \mathbf \Psi_J^\top \mathbf P_K^{\phantom \top} \mathbf Y \,, ] where $\mathbf \Psi_J = (\psi^J({X_1}),\ldots,\psi^J({X_n}))^\top$ and $\mathbf B_K = (b^K({W_1}),\ldots,b^K({W_n}))^\top$ are $n \times J$ and $n \times K$ matrices, $\mathbf P_K = \mathbf B_K^{\phantom \prime} (\mathbf B_K^\top \mathbf B_K^{\phantom \top})^{-} \mathbf B_K^\top$ is a $K \times K$ matrix,^[We let $A^-$ denote the generalized (or Moore--Penrose) inverse of a matrix $A$.] and $\mathbf Y = (Y_1,\ldots,Y_n)^\top$ is an $n \times 1$ vector. Let [ \partial^a h(x) = \frac{\partial^{|a|} h(x)}{\partial^{a_1} x_1 \ldots \partial^{a_d} x_d} \,, ] for $a = (a_1,\ldots,a_d) \in (\mathbb{N}_0)^d$ with $|a| = a_1 + \ldots + a_d$. Estimators of $h_0$ and its derivative $\partial^a h_0$ are given by [ \hat h_J(x) = (\psi^J(x))^\top \hat c_J \,,~~~~\partial^a \hat h_J(x) =(\partial^a \psi^J(x))^\top \hat c_J \,, ] where $\partial^a \psi^J(x) = (\partial^a \psi{J1}(x),\ldots,\partial^a \psi_{JJ}(x))^\top$. The estimator $\hat h_J$ collapses to a nonparametric series regression estimator [ \hat h_J(x) = \psi^J(x)^\top(\mathbf \Psi_J^\top\mathbf \Psi_J^{\phantom \top})^- \mathbf \Psi_J^\top \mathbf Y ] in the special case in which $W = X$, $K = J$, and $(b_{K1},\ldots,b_{KK}) = (\psi_{J1},\ldots,\psi_{JJ})$. The data-driven methods for choosing $J$ and for constructing UCBs for $h_0$ and its derivatives therefore carry over to nonparametric regression as a special case.
In practice, a researcher must choose bases $\psi_{J1},\ldots,\psi_{JJ}$ and $b_{K1},\ldots,b_{KK}$ and the tuning parameters $J$ and $K$. @CCQE showed that B-spline and Cohen--Daubiches--Vial wavelet bases lead to estimators of $h_0$ and its derivatives that converge at the minimax sup-norm rate under an appropriate choice of $J$ and $K$. These bases also lead to estimators that converge at the minimax rate in the special case of nonparametric regression [@BCCK; @CC15Reg]. We therefore use B-spline bases in what follows as these are easier to compute than wavelet bases. Bases for multivariate $X$ or $W$ are generally constructed by taking tensor products of univariate bases.^[The exception is when $X$ is multivariate and we wish to impose an additively separable structure on $h_0$.]
@CCK introduce a data-driven choice of sieve dimension that leads to estimators of both $h_0$ and its derivatives that converge at the optimal sup-norm rate. The choice of $K$ is pinned down by $J$ in their procedure, so we write $K(J)$, $b^{K(J)}(W)$, $\mathbf B_{K(J)}$ and $\mathbf P_{K(J)}$ in what follows.
We first introduce some notation. Let $\mathcal T \subset \mathbb N_0$ denote the set of candidate values of $J$. This set depends on the choice of basis. If $X$ is $d$-dimensional, then $\mathcal T = {(2^l + r)^d : l \in \mathbb N_0 }$ for a tensor-product basis of B-splines of order $r$. For example, with univariate cubic B-splines ($r = 4$) we have $\mathcal T = {4,5,7, 11, ...}$. Let $J^+ = \min{j \in \mathcal T : j > J}$ be the smallest sieve dimension in $\mathcal T$ exceeding $J$. Let $\mathbf M_J = (\mathbf \Psi_J^\top \mathbf P_{K(J)}^{\phantom \top} \mathbf \Psi_J^{\phantom \top} )^{-} \mathbf \Psi_J^\top \mathbf P_{K(J)}^{\phantom \top}$. For $J,J_2 \in \mathcal T$ with $J_2 > J$, define the contrast [ D_{J}(x)-D_{J_2}(x) = (\psi^J(x))^\top \mathbf M_J \hat{\mathbf u}J - (\psi^{J_2}(x))^\top \mathbf M{J_2} \hat{\mathbf u}{J_2} \,. ] Its variance is estimated by [ \hat \sigma{J,J_2}^2(x) := \hat \sigma_{J}^2(x) + \hat \sigma_{J_2}^2(x) - 2 \tilde \sigma_{J,J_2}(x), ] where [ \hat \sigma_{J}^2(x) = (\psi^J(x))^\top \mathbf M_J^{\phantom \top} \widehat{\mathbf U}{J,J}^{\phantom \top} \mathbf M_J^\top \psi^J(x) ] and [ \tilde \sigma{J,J_2}(x) = (\psi^J(x))^\top \mathbf M_J^{\phantom \top} \widehat{\mathbf U}{J,J_2}^{\phantom \top} \mathbf M{J_2}^\top \psi^{J_2}(x), ] with $\widehat{\mathbf U}{J,J_2}$ an $n\times n$ diagonal matrix whose $i$th diagonal entry is $\hat u{i,J} \hat u_{i,J_2}$ for $\hat u_{i,J} = Y_i - \hat h_J(X_i)$. A bootstrap version of the contrast is given by [ D_{J}^(x)-D_{J_2}^(x) = (\psi^J(x))^\top\mathbf M_J^{\phantom \top} \hat{\mathbf u}J^ - (\psi^{J_2}(x))^\top \mathbf M_{J_2}^{\phantom \top} \hat{\mathbf u}_{J_2}^, ] with $\hat{\mathbf u}_J^* = (\hat u{1,J}\varpi_1,\ldots,\hat u_{n,J}\varpi_n)^\top$ denoting a multiplier bootstrap version of $\hat{\mathbf u}J$, where $(\varpi_i){i=1}^n$ are IID $N(0,1)$ draws independent of the data. Finally, let $\hat s_J$ be the smallest singular value of $(\mathbf B_{K(J)}^\top\mathbf B_{K(J)}^{\phantom \top})^{-1/2} (\mathbf B_{K(J)}^\top \mathbf \Psi_J^{\phantom \top}) (\mathbf \Psi_J^\top \mathbf \Psi_J^{\phantom \top})^{-1/2}$.
The data-driven procedure from @CCK for choosing $J$ in NPIV models is summarized in the following three steps:
\begin{enumerate} \item Compute \begin{align} \hat{J}{\max} & = \min \bigg { J \in \mathcal T : J \sqrt{\log J} \hat{s}_J^{-1} \leq 10 \sqrt{n} < J^{+} \sqrt{\log J^{+}} \hat{s}{J^{+}}^{-1} \bigg } \,, \label{eq:J_hat_max} \ \hat{\mathcal J} & = \left{ J \in \mathcal T : 0.1 ( \log \hat J_{\max})^2 \leq J \leq \hat J_{\max}\right} \,. \label{eq:index_set} \end{align} \item Let $\hat \alpha = \min{ 0.5 , (\log(\hat{J}{\max})/\hat{J}{\max})^{1/2}}$. For each independent draw of $(\varpi_i){i=1}^n$, compute \begin{equation}\label{eq:sup-stat} \sup{{ (x,J,J_2) \in \mathcal{X} \times \hat{\mathcal J} \times \hat{\mathcal J} : J_2 > J }} \left| \frac{D_{J}^(x)-D_{J_2}^(x)}{\hat \sigma_{J,J_2}(x)} \right|. \end{equation} Let $\theta^{1-\hat \alpha}$ denote the $(1- \hat \alpha )$ quantile of the sup statistic (\ref{eq:sup-stat}) across a large number of independent draws of $(\varpi_i){i=1}^n$ (conditional on the data). \item Let $\hat J_n = \max{J \in \hat{\mathcal J} : J < \hat J_{\max}}$ and \begin{equation}\label{eq:J_lepski} \hat{J} = \min \left { J \in \hat{\mathcal J} : \sup_{(x, J_2) \in \mathcal{X} \times \hat{\mathcal{J}} : J_2 > J } \left| \frac{D_{J}(x)-D_{J_2}(x)}{\hat \sigma_{J,J_2}(x)} \right| \leq 1.1 \theta^_{1 - \hat \alpha} \right } \,. \end{equation} The data-driven choice of sieve dimension is \begin{equation} \label{eq:J-choice} \tilde{J} = \min{\hat{J},\hat J_n}\,. \end{equation} \end{enumerate}
@CCK show that the single data-driven choice $\tilde J$ leads to estimators $\hat h_{\tilde J}$ and $\partial^a \hat h_{\tilde J}$ of $h_0$ and $\partial^a h_0$ that converge at the minimax sup-norm rate for NPIV estimation. They also show that these adaptivity guarantees carry over to nonparametric regression models with $W = X$, $K = J$, and $(b_{K1},\ldots,b_{KK}) = (\psi_{J1},\ldots,\psi_{JJ})$. In that case, the procedure is implemented largely as above, but with the following slight modifications:
\begin{enumerate} \item[$1.^{\prime}$] Compute \begin{equation} \label{eq:J_hat_max_regression} \hat{J}{\max} = \min \bigg { J \in \mathcal T : J \sqrt{\log J} \upsilon_n \leq 10 \sqrt n < J^{+} \sqrt{\log J^{+}} \upsilon_n \bigg } \end{equation} with $\upsilon_n = \max{1, (0.1 \log n)^4}$, then compute $\hat{\mathcal J}$ as in (\ref{eq:index_set}) with this choice of $\hat J{\max}$. \item[$2.^{\prime}$] As above. \item[$3.^{\prime}$] Take $\tilde J = \hat J$ for $\hat J$ defined in (\ref{eq:J_lepski}). \end{enumerate}
We now describe the method for constructing UCBs for $h_0$ and its derivatives proposed in @CCK. The approach builds on the data-driven choice of $J$ introduced in the previous subsection. Recall $\hat{\mathcal J}$ and $\tilde J$ from above. Let $\hat A = \log \log \tilde J$ and [ \hat{\mathcal J}_{-} = \begin{cases} {J \in \hat{\mathcal J} : J < \hat J_n} & \mbox{~if $\tilde J = \hat J$}, \ \hat{\mathcal J} & \mbox{~if $\tilde J = \hat J_n$}. \end{cases} ]
UCBs for the structural function $h_0$ in NPIV models are constructed in the following steps:
\begin{enumerate} \item[4.] For each independent draw of $(\varpi_i){i=1}^n$, compute \begin{equation} \label{eq:z_star-UCB} \sup{(x,J) \in \mathcal{X} \times \hat{\mathcal J}{-}} \left| \frac{D_J^(x)}{\hat \sigma_J(x)} \right|. \end{equation} Let $z_{1-\alpha}^$ denote the $(1-\alpha )$ quantile of the sup statistic (\ref{eq:z_star-UCB}) across a large number of independent draws of $(\varpi_i){i=1}^n$ (conditional on the data). \item[5.] Construct the 100$(1-\alpha)$\% UCB \begin{equation} \label{band} C_n(x) = \bigg[ \hat{h}{\tilde{J}}(x) - \left( z{1-\alpha}^ + \hat A \theta^{1-\hat \alpha} \right ) \hat \sigma{\tilde J}(x) , ~ \hat{h}{\tilde{J}}(x) + \left( z{1-\alpha}^ + \hat A \theta^{1-\hat \alpha} \right) \hat \sigma{\tilde J}(x) \bigg] \,. \end{equation} \end{enumerate}
Let [ D_J^{a}(x) = (\partial^a \psi^J(x))' \mathbf M_J^{\phantom \prime} \hat{\mathbf u}_J^\,, ] and [ \hat \sigma_{J}^{a2}(x) =(\partial^a \psi^J(x))' \mathbf M_J^{\phantom \prime} \widehat{\mathbf U}_{J,J}^{\phantom \prime} \mathbf M_J' (\partial^a \psi^J(x))\,. ] UCBs for $\partial^a h_0$ in NPIV models are constructed analogously:
\begin{enumerate} \item[$4.^{\prime}$] For each independent draw of $(\varpi_i){i=1}^n$, compute \begin{equation} \label{eq:z_star-UCB-derivative} \sup{(x,J) \in \mathcal{X} \times \hat{\mathcal J}{-}} \left| \frac{D_J^{a}(x)}{\hat \sigma_J^a(x)} \right|. \end{equation} Let $z_{1-\alpha}^{a}$ denote the $(1-\alpha )$ quantile of sup statistic (\ref{eq:z_star-UCB-derivative}) across a large number of independent draws of $(\varpi_i){i=1}^n$ (conditional on the data). \item[$5.^{\prime}$] Construct the 100$(1-\alpha)$\% UCB \begin{equation} \label{band-derivative} C_n^a(x) = \bigg[ \partial^a \hat{h}{\tilde{J}}(x) - \left(! z{1-\alpha}^{a} + \hat A \theta^{1-\hat \alpha} ! \right) \hat \sigma{\tilde J}^a(x) ,~ \partial^a \hat{h}{\tilde{J}}(x) + \left( !z{1-\alpha}^{a} + \hat A \theta^{1-\hat \alpha} ! \right ) \hat \sigma{\tilde J}^a(x) \bigg] . \end{equation} \end{enumerate}
The above UCB constructions apply equally to nonparametric regression models in the special case in which $W = X$, $K = J$, and $(b_{K1},\ldots,b_{KK}) = (\psi_{J1},\ldots,\psi_{JJ})$.
@CCK establish uniform asymptotic coverage guarantees for these UCBs for both NPIV and nonparametric regression. They also show that the UCBs contract at the optimal sup-norm rate (up to logarithmic factors).
An alternative UCB construction based on a deterministic (i.e., non-data-driven) choice of sieve dimension is proposed by @CCQE for NPIV models. The construction also applies to nonparametric regression in the special case in which $W = X$, $K = J$, and $(b_{K1},\ldots,b_{KK}) = (\psi_{J1},\ldots,\psi_{JJ})$. The theoretical justification for this approach is based on undersmoothing. That is, the sieve dimension $J$ should be larger than what would be optimal for estimation, so that bias is of smaller order than sampling uncertainty. Note that this approach requires a user-supplied choice of sieve dimension $J$.
We now briefly summarize the @CCQE procedure using the notation introduced above. UCBs for $h_0$ and $\partial^a h_0$ are constructed as follows:
\begin{enumerate} \item For each independent draw of $(\varpi_i){i=1}^n$, compute \begin{equation} \label{eq:z_star-J-UCB} \sup{x\in \mathcal{X}} \left| \frac{D_J^(x)}{\hat \sigma_J(x)} \right|~\mbox{for $h_0$, or}~~~~\sup_{x \in \mathcal{X}} \left| \frac{D_J^{a}(x)}{\hat \sigma_J^a(x)} \right|~\mbox{for $\partial^a h_0$}. \end{equation} Let $z_{1-\alpha,J}^$ and $z_{1-\alpha,J}^{a}$ denote the $(1-\alpha )$ quantile of these sup statistics across a large number of independent draws of $(\varpi_i){i=1}^n$ (conditional on the data). \item Construct the $100(1-\alpha)\%$ UCBs [ C{n,J}(x) = \bigg[ \hat{h}{J}(x) - z{1-\alpha,J}^ \hat \sigma_{J}(x) , ~ \hat{h}{J}(x) + z{1-\alpha,J}^ \hat \sigma_{J}(x) \bigg] ~\mbox{for $h_0$}, ] or [ C_{n,J}^a(x) = \bigg[ \partial^a \hat{h}{J}(x) - z{1-\alpha,J}^{a} \hat \sigma_{J}^a(x) ,~ \partial^a \hat{h}{J}(x) + z{1-\alpha,J}^{a} \hat \sigma_{J}^a(x) \bigg] ~\mbox{for $\partial^a h_0$}. ] \end{enumerate}
Finally, we note that it may sometimes be of interest to construct confidence intervals (CIs) for $h_0(x)$ or $\partial^a h_0(x)$ at a certain point $x$. To this end, one may use the following constructions for $100(1-\alpha)\%$ CIs: [ \bigg[ \hat{h}{J}(x) - z{1-\alpha/2} \hat \sigma_{J}(x) , ~ \hat{h}{J}(x) + z{1-\alpha/2} \hat \sigma_{J}(x) \bigg] ~\mbox{for $h_0(x)$}, ] or [ \bigg[ \partial^a \hat{h}{J}(x) - z{1-\alpha/2} \hat \sigma_{J}^a(x) ,~ \partial^a \hat{h}{J}(x) + z{1-\alpha/2} \hat \sigma_{J}^a(x) \bigg] ~\mbox{for $\partial^a h_0(x)$}, ] where $z_{1-\alpha/2}$ is the $1-\alpha/2$ quantile of the $N(0,1)$ distribution; see @ChenPouzo and Appendix D of @CCQE for a formal justification for these constructions. These CIs can be (and sometimes are) plotted alongside estimates of $h_0$ or $\partial^a h_0$, but it should be understood that they are only valid pointwise. That is, they will generally be too narrow to contain the entire function $h_0$ or $\partial^a h_0$ with desired asymptotic coverage probability. By contrast, the above UCB constructions will contain the true structural function or its derivatives with desired asymptotic coverage probability.
The contributed \proglang{R} package \pkg{npiv} is used to estimate and construct UCBs for a nonparametric structural function $h_0$ and its derivatives $\partial^a h_0$ in both NPIV and nonparametric regression models. In this section, we describe the main functionality of this package. The package \pkg{npiv} is available from CRAN at \url{https://CRAN.R-project.org/package=npiv}.
The main function is npiv()
, which can be called with the following syntax:
npiv(formula, data, newdata=NULL, basis=c("tensor","additive","glp"), J.x.degree=3, J.x.segments=NULL, K.w.degree=4, K.w.segments=NULL, K.w.smooth=2, ...)
The required arguments include
formula
: a symbolic formula specifying the model to be estimated. The syntax is the same as the package \pkg{ivreg} [@IVREG]. For example, y ~ x1 + x2 | x1 + z1 + z2
where y
is the dependent variable, x1
is an exogenous regressor, x2
is an endogenous regressor, and z1
and z2
are instrumental variables.
data
: an optional data frame containing the variables in the model.
newdata
: optional data frame collecting the values of x
variables at which to evaluate the estimator (e.g. for plotting). Will evaluate at sample data points if NULL
.
basis
: type of basis to use if x
variables are multivariate. Default is tensor
for a tensor-product basis. Can also use additive
for an additively-separable structural function or glp
for generalized B-spline polynomial bases (for further details, see documentation for the package \pkg{crs} [@CRS]).
J.x.degree
: degree of the B-spline used to approximate the structural function. Default is 3 (cubic spline).
J.x.segments
: number of segments (interior knots plus 1) for the B-spline used to approximate the structural function. Note that integers for both J.x.segments
and K.w.segments
must be passed to estimate the model with a deterministic $J$ (and $K$) and return undersmoothed UCBs. Te default is NULL
, in which case both J.x.segments
and K.w.segements
are data-determined using the procedure described above and data-driven UCBs are returned.
K.w.degree
: degree of the B-spline used to approximate the nonparametric first stage. Default is 4 (quartic spline).
K.w.segments
: number of segments (interior knots plus 1) for the B-spline used to approximate the structural function. Note that integers for both J.x.segments
and K.w.segments
must be passed to estimate the model with a deterministic $J$ (and $K$) and return undersmoothed UCBs. The default is NULL
, in which case both J.x.segments
and K.w.segements
are data-determined using the procedure described above and data-driven UCBs are returned.
K.w.smooth
: a non-negative integer. The basis for the nonparametric first-stage uses 2^{K.w.smooth}
more B-spline segments for each instrument than the basis approximating the structural function. Default is 2. Setting K.w.smooth=0
uses the same number of segments for x
and w
.
knots
: method for placing knots over the support of covariates and instruments. Default is uniform
. Can also use quantiles
to place knots at the quantiles of the data, so that an equal number of observations lie in each segment.
The output of the function npiv()
is a 'npiv
' object. The summary()
method prints a brief summary of the estimated model. The npiv
object includes the following components:
h
: the estimated structural function evaluated at the sample data (or evaluation data, if provided).
h.upper
: the upper UCB for the structural function evaluated at the sample data (or evaluation data, if provided).
h.lower
: the lower UCB for the structural function evaluated at the sample data (or evaluation data, if provided).
deriv
: estimated derivative of the structural function evaluated at the sample data (or evaluation data, if provided).
h.upper.deriv
: the upper UCB for the derivative of the structural function evaluated at the sample data (or evaluation data, if provided).
h.lower.deriv
: the lower UCB for the derivative of the structural function evaluated at the sample data (or evaluation data, if provided).
asy.se
: pre-asymptotic standard errors for the estimator of the structural function evaluated at the sample data (or evaluation data, if provided). These may be used to construct (pointwise) confidence intervals for $h_0(x)$ based on undersmoothing when a deterministic sieve dimension is used.
deriv.asy.se
: pre-asymptotic standard errors for the estimator of the derivative of the structural function evaluated at the sample data (or evaluation data, if provided). These may be used to construct (pointwise) confidence intervals for $\partial^a h_0(x)$ based on undersmoothing when a deterministic sieve dimension is used.
K.w.segments
: value of K.w.segments
used (will be data-determined if not provided).
J.x.segments
: value of J.x.segments
used (will be data-determined if not provided).
beta
: vector of estimated spline coefficients $\hat c_J$.
In addition, the plot()
method produces a plot of the estimated function and 95\% UCBs. The optional argument showdata = TRUE
will overlay the original data points. A plot of the estimated derivative of the structural function and 95\% UCBs can be produced by calling plot()
with the optional argument type = "deriv"
.
The command npiv()
may also be used for data-driven choice of sieve dimension and UCB construction in nonparametric regression models. In this case, formula
should pass the same explanatory variables as instrumental variables (and in the same order). Calling
npiv(y ~ x1 + x2 | x1 + x2)
will perform nonparametric regression of y
on x1
and x2
, with a data-driven choice of sieve dimension. Note that this choice of sieve dimension will be rate-optimal for estimation of $h_0$ and its derivatives under sup-norm loss. This choice may differ from the choice of sieve dimension selected by other methods that target mean-square error (i.e., $L^2$-norm loss), such as cross validation, because the optimal choices of sieve dimension for estimation under sup-norm and $L^2$-norm losses diverge at different rates. As before, data-driven UCBs for the conditional mean function and its derivative will also be computed by default using the procedure of @CCK.
Estimators based on a deterministic choice of sieve dimension and undersmoothed UCBs can be computed by passing J.x.segments
. Both K.w.segments
and K.w.smooth
are redundant and do not need to be specified. Undersmoothed UCBs are constructed using the procedure of @CCQE; see also @BCCK. Standard errors asy.se
and deriv.asy.se
may also be used to construct (pointwise) confidence intervals for $h_0(x)$ and $\partial^a h_0(x)$ based on undersmoothing, using the method described in Section 2.5.
By default, npiv
constructs B-spline bases with knots placed uniformly over the support of covariates. This can cause the algorithm to choose too high a sieve dimension for nonparametric regression when the distribution of covariates is far from uniform and, consequently, the number of observations within each segment is far from uniform. Good performance can be restored by including knots = "quantiles"
.
By default, npiv
will perform estimation of the structural function $h_0$ and construct 95\% UCBs for $h_0$ and $\frac{d h_0}{dx}$ if $x$ is scalar or
[
\frac{\partial h_0(x_1,\ldots,x_d)}{\partial x_1}
]
if $x$ is multivariate. The following optional arguments can be used to modify the UCB constructions:
alpha
: nominal size of the uniform confidence bands. Default is 0.05 for 95% uniform confidence bands.
deriv.index
: integer indicating for which regressor to compute the derivative. Default is 1
for differentiation with respect to $x_1$.
deriv.order
: integer indicating the order of derivative to be computed. Default is 1
.
ucb.h
: whether to compute a uniform confidence band for the structural function. Default is TRUE.
ucb.deriv
: whether to compute a uniform confidence band for the derivative of the structural function. Default is TRUE
.
Run time can be improved by passing ucb.h = FALSE
or ucb.deriv = FALSE
when a UCB for $h_0$ or its derivative are not required.
We now illustrate the usage of npiv
with an empirical application to nonparametric Engel curve estimation [@BCK]. Engel curves explain how consumers' expenditure shares on different categories of goods and services vary as a function of total household expenditure. Understanding consumers' consumption and substitution patterns across expenditure categories is important for conducting policy analysis and for modeling demand. Engel curve estimation is therefore the subject of a large amount of empirical work in economics. There is also a long tradition of estimating Engel curves via "flexible" (i.e., nonparametric or semi-parametric) methods which recognize correctly that consumers' expenditure shares vary nonlinearly with total expenditure.^[Interestingly, the original empirical exercise performed by Engel over 150 years ago used precursors of nonparametric statistical methods [@CM].]
We wish to estimate the Engel curve $h_0$ for a household's expenditure share on a particular category, say food. The model is [ Y_i = h_0(X_i) + u_i ] where $Y_i$ is expenditure share for household $i$ on food and $X_i$ is log total household expenditure. The "structural" interpretation of the model is that the household would spend the fraction $h_0(x)$ of its total expenditure on food if log total expenditure was exogenously specified as $x$. As is well known (see, e.g., @BBL), total household expenditure is endogenous: [ \E[Y_i | X_i] \neq h_0(X_i), ] which follows from the fact that the household's consumption preferences determines its expenditure across categories and therefore its total expenditure. To deal with this endogeneity, we shall follow @BBL, @BCK, and several other works and use log gross earnings of the household head ($W_i$, hereafter log wages) as an instrument for log total expenditure.
We use data from the 1995 British Family Expenditure Survey as in @BCK. The data consist of 1655 married or cohabitating couples with the head of household employed and aged between 20 and 55 and with zero, one, or two children. The data are available as part of \pkg{npiv} in the data frame Engel95
.
data("Engel95", package = "npiv") head(Engel95)
The first seven columns of Engel95
list household budget shares across different expenditure categories. The variables logexp
and logwages
are log total household expenditure and log wages The final variable nkids
indicates whether the household has zero children (if nkids = 0
; 628 households in total), or 1-2 two children (if nkids = 1
; 1027 households in total).
We first estimate the Engel curve for food. To try to ensure a relatively homogeneous set of households, we focus on the subset of households with 1-2 children.
Engel.kids <- subset(Engel95, nkids == 1) attach(Engel.kids)
To use a data-driven choice of sieve dimension and construct data-driven UCBs for $h_0$ and its derivative, we first call npiv()
without specifying the sieve dimensions. We also pass a grid of points spanning $[4.75, 6.25]$ for plotting the estimator and confidence bands:
newdata <- data.frame(logexp = seq(4.75, 6.25, length.out = 1000)) fm.food.dd <- npiv(food ~ logexp | logwages, newdata = newdata) summary(fm.food.dd)
As $X$ and $W$ are both scalar and we use the default cubic B-spline basis for $X$ and quartic B-spline basis for $W$, the data-driven choice of sieve dimension are therefore $J = 4$ and $K = 8$ (the sum of segments and degree for each basis). The run time represents the total time to perform data-driven choice of sieve dimension and to construct the data-driven UCBs for $h_0$ and its derivative.
To compare data-driven and undersmoothed UCBs, we estimate $h_0$ and its derivative again, this time using a deterministic choice of $J$ and $K$ by passing both J.x.segments
and K.w.segments
. As the justification for the UCBs is undersmoothing, we must choose a $J$ and $K$ larger than the optimal value so that bias is of smaller order than sampling uncertainty. We choose to estimate with J.x.segments = 2
and K.w.segments = 5
, which corresponds to $J = 5$ and $K = 9$.
fm.food.det <- npiv(food ~ logexp | logwages, newdata = newdata, J.x.segments = 2, K.w.segments = 5)
## Plot the estimated Engel curve and data-driven UCBs plot(logexp, food, ylab = "Food Budget Share", xlab = "log(Total Household Expenditure)", xlim = c(4.75, 6.25), ylim = c(0, 0.4), main = "", type = "p", cex = .5, col = "lightgrey") lines(newdata$logexp, fm.food.dd$h,col="blue",lwd=2,lty=1) lines(newdata$logexp, fm.food.dd$h.upper,col="blue",lwd=2,lty=2) lines(newdata$logexp, fm.food.dd$h.lower,col="blue",lwd=2,lty=2) legend("topright", legend=c("Estimate", "95\\% UCBs"), col=c("blue","blue"), lty=c(1,2), lwd=c(2,2), bty="n")
## Plot the estimated Engel curve and undersmoothed UCBs plot(logexp, food, ylab = "Food Budget Share", xlab = "log(Total Household Expenditure)", xlim = c(4.75, 6.25), ylim = c(0, 0.4), main = "", type = "p", cex = .5, col = "lightgrey") lines(newdata$logexp, fm.food.det$h, col = "red", lwd = 2, lty = 1) lines(newdata$logexp, fm.food.det$h.upper, col = "red", lwd = 2, lty = 2) lines(newdata$logexp, fm.food.det$h.lower, col = "red", lwd = 2, lty = 2) lines(newdata$logexp, fm.food.det$h + 1.96 * fm.food.det$asy.se, col = "red", lwd = 2, lty = 3) lines(newdata$logexp, fm.food.det$h - 1.96 * fm.food.det$asy.se, col = "red", lwd = 2, lty = 3) legend("topright", legend=c("Estimate", "95\\% UCBs", "95\\% PCIs"), col=c("red","red","red"), lty=c(1,2,3), lwd=c(2,2,2), bty="n")
The estimated Engel curves for food are plotted below, together with their UCBs (using h.upper
and h.lower
). Both the data-driven and undersmoothed estimates are downward-sloping. This seems reasonable from an economic standpoint: food is a necessary good, and so it should decrease as a proportion of total household expenditure as total household expenditure increases. Note, however, that the data-driven estimate looks close to linear whereas the undersmoothed estimate is more wiggly. The undersmoothed UCBs are also wider over much of the support of total expenditure because they use a sub-optimally large choice of sieve dimension and are therefore less "efficient" than the data-driven bands. We also plot 95\% pointwise confidence intervals (PCIs) based on undersmoothing (using asy.se
and a $N(0,1)$ critical value). As expected, these are narrower than the UCBs based on undersmoothing as they are only required to contain the true function at each point, rather than over the entirety of its support. So while PCIs can be used for drawing inference about the value of the function at a given point, they cannot be used for inferring the true shape of the function.
## Plot the estimated Engel curve derivative and data-driven UCBs plot(newdata$logexp, fm.food.dd$deriv, ylab = "Change in Food Budget Share", xlab = "log(Total Household Expenditure)", xlim = c(4.75, 6.25), ylim = c(-0.5, 0.5), col = "blue", type = "l", lwd=2, lty=1) lines(newdata$logexp, fm.food.dd$h.upper.deriv,col="blue",lwd=2,lty=2) lines(newdata$logexp, fm.food.dd$h.lower.deriv,col="blue",lwd=2,lty=2) abline(0, 0, col = "black", lty = 1, lwd = 2) legend("topright", legend=c("Estimate", "95\\% UCBs"), col=c("blue","blue"), lty=c(1,2), lwd=c(2,2), bty="n")
## Plot the estimated Engel curve derivative and undersmoothed UCBs plot(newdata$logexp, fm.food.det$deriv, ylab = "Change in Food Budget Share", xlab = "log(Total Household Expenditure)", xlim = c(4.75, 6.25), ylim = c(-0.5, 0.5), col = "red", type = "l", lwd=2, lty=1) lines(newdata$logexp, fm.food.det$h.upper.deriv,col="red",lwd=2,lty=2) lines(newdata$logexp, fm.food.det$h.lower.deriv,col="red",lwd=2,lty=2) abline(0, 0, col = "black", lty = 1, lwd = 2) lines(newdata$logexp, fm.food.det$deriv + 1.96 * fm.food.det$deriv.asy.se, col = "red", lwd = 2, lty = 3) lines(newdata$logexp, fm.food.det$deriv - 1.96 * fm.food.det$deriv.asy.se, col = "red", lwd = 2, lty = 3) legend("topright", legend=c("Estimate", "95\\% UCBs", "95\\% PCIs"), col=c("red","red","red"), lty=c(1,2,3), lwd=c(2,2,2), bty="n")
These differences between the data-driven and undersmoothed methods are also apparent when we compare the estimated derivatives of the Engel curves and their corresponding UCBs (using h.lower.deriv
and h.upper.deriv
). The zero function lies significantly above the data-driven UCBs over a sub-interval of total expenditure, indicating that the true Engel curve is significantly downwards-sloping over this region. Therefore, one may conclude from the data-driven UCBs that the Engel curve for food differs significantly from a constant.
Conversely, the UCBs based on undersmoothing are wider, and appear to contain the zero function over the full range of total expenditure. Hence, one cannot reject a flat Engel curve using the UCBs based on undersmoothing. We also report PCIs for the derivative based on undersmoothing (using on deriv.asy.se
and $N(0,1)$ critical values). These are useful for testing hypotheses about the derivative at a given point, but that is a different problem from testing whether the derivative is significantly different from zero at any point over its support.
We repeat the above exercise, this time estimating an Engel curve for household budget share on fuel:
fm.fuel.dd <- npiv(fuel ~ logexp | logwages, newdata = newdata) summary(fm.fuel.dd)
## Plot the estimated Engel curve and data-driven UCBs plot(logexp, fuel, ylab = "Fuel Budget Share", xlab = "log(Total Household Expenditure)", xlim = c(4.75, 6.25), ylim = c(0, 0.25), main = "", type = "p", cex = .5, col = "lightgrey") lines(newdata$logexp, fm.fuel.dd$h,col="blue",lwd=2,lty=1) lines(newdata$logexp, fm.fuel.dd$h.upper,col="blue",lwd=2,lty=2) lines(newdata$logexp, fm.fuel.dd$h.lower,col="blue",lwd=2,lty=2) legend("topright", legend=c("Estimate", "95\\% UCBs"), col=c("blue","blue"), lty=c(1,2), lwd=c(2,2), bty="n")
## Plot the estimated Engel curve derivative and data-driven UCBs plot(newdata$logexp, fm.fuel.dd$deriv, ylab = "Change in Fuel Budget Share", xlab = "log(Total Household Expenditure)", xlim = c(4.75, 6.25), ylim = c(-0.5, 0.5), col = "blue", type = "l", lwd=2, lty=1) lines(newdata$logexp, fm.fuel.dd$h.upper.deriv,col="blue",lwd=2,lty=2) lines(newdata$logexp, fm.fuel.dd$h.lower.deriv,col="blue",lwd=2,lty=2) abline(0, 0, col = "black", lty = 1, lwd = 2) legend("topright", legend=c("Estimate", "95\\% UCBs"), col=c("blue","blue"), lty=c(1,2), lwd=c(2,2), bty="n")
We see from the estimated Engel curve and its derivative that budget shares on fuel decrease significantly at low total expenditure levels and then flatten out completely at moderate to high total expenditure levels. As there is much less variation in households' fuel expenditure than their food expenditure, these curves are estimated with greater precision and the UCBs are correspondingly much narrower.
\newpage
The package \pkg{npiv} makes available a set of novel, fully-data driven estimation and inference methods for NPIV models. The methods include (i) a data-driven choice of sieve dimension that leads to estimators of the structural function and its derivative that converge at the minimax sup-norm rate, and (ii) data-driven uniform confidence bands for the structural function and its derivative that contract at the minimax rate (up to log terms). Additional functionality allows for the construction of uniform confidence bands and pointwise confidence intervals based on undersmoothing. The design is user-friendly with syntax similar to the widely used package \pkg{ivreg} [@IVREG] for linear IV models.
Overall, the methods should be useful to researchers engaged in demand estimation, causal inference, and reinforcement learning to name a few. We welcome suggestions from the community for additional features to be implemented.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.