knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.path = "README-" )
The goal of rbounds is to estimate bounds on the location of parameters in models in which the value of the parameter can only be partially identified.
The package was developed to estimate the conditional mean of a
binary outcome with non-random missing outcomes.
At the moment the package contains one function which estimates
bounds on a conditional mean using non-parametric kernel regression
estimators from the np
package. In the future I hope to extend the
package to cover treatment effect estimation, and to work with some
additional assumptions such as monotone regression and monotone
treatment response.
Imagine that we are interested in estimating $E(y | x)$ where $y \in {0,1}$. Unfortunately, we are missing some observations of $y$. Whether an observation is missing is denoted by a dummy variable $z$, where when $z=1$ $y$ is observed and otherwise the value of $y$ is not observed. We have some covariates $x$ whose value we observe for all observations.
By the law of total probability it follows that
[E(y | x) = E(y | x, z = 1)P(z = 1 | x) + E(y | x, z = 0)P(z = 0 | x).]
Each of the pieces of this equation can be estimated from the available data with the exception of $E(y|x, z=0)$, which is the conditional expectation of the outcome for the cases in which the outcome value is unobserved.
However, since $0 \leq y \leq 1$, the unobserved function can take on the value of at most $1$ -- if all the unobserved outcomes were 1, or at least $0$. Thus the true conditional mean must fall within the bounds:
[ E(y | x, z = 1)P(z = 1 | x) \leq E(y | x) \leq E(y | x, z = 1)P(z = 1 | x ) + (1 - P(z = 1 | x)). ]
More generically, if we denote the minimum value the conditional outcome can take on as $\gamma_0$ and the maximum by $\gamma_1$, then the conditional mean can be bounded by
[ E(y | x, z = 1)P(z = 1 | x) + \gamma_0P(z=0|x) \leq E(y | x) \leq E(y | x, z = 1)P(z = 1 | x ) + \gamma_1 P(z = 0 | x). ]
The conditional expectations in these expressions can be estimated parametrically or non-parametrically, and confidence intervals can be derived for the locations of the endpoints.
Using simulated data we present a simple example of the pidoutcomes()
function
which we use to estimate partially identified conditional conditional means.
## We generate some fake data library(rbounds) set.seed(42) N <- 100 x <- rnorm(N) z <- rbinom(N, 1, pnorm(x)) y <- rbinom(N, 1, pnorm(x)) df <- data.frame(y, x, z) res <- pidoutcomes(y~x, z, df) res # We can plot it with a generic plotting method plot(res)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.