knitr::opts_chunk$set( collapse = TRUE, comment = "#>", eval = FALSE )
Often, in an applied setting, it is interesting to compute estimates (and quantify its associated uncertainty) of socioeconomic and demographic variables for different administrative regions within the same city. Computing estimates is trivial when applying the Areal Interpolation [@goodchild1980areal] approach. However, a common criticism regarding this method is that it does not provide uncertainty measures associated with the estimates [@bradley2016bayesian]. Our package provides the simple areal interpolation along with an approximation and an upper bound for the variance of the computed estimates when the variance associated with the observed data is known, which is a common fact when working with survey data. To illustrate how does it work, we are going to show how to estimate a variable observed at census tracts in the Public Use Microdata Areas (PUMAs) in NYC.
To load the data and the packages needed for this tutorial run the chunk of code below.
library(sf) library(smile) library(ggplot2) data(nyc_surv) data(nyc_comd)
Since the nyc_surv
data proves a "margin of error" instead of a variance, we
are going to use the piece of code below to approximate the variance. We are
assuming the margin of error was calculated using a 95% confidence interval
based on the Normal distribution.
nyc_surv <- transform(nyc_surv, my_var = moe / qnorm(p = .975)) nyc_surv <- transform(nyc_surv, my_var = my_var * my_var)
Now, to estimate a variable observed at the nyc_surv
dataset into the
nyc_comd
data we can run the following command:
estimate_comd <- ai(source = nyc_surv, target = nyc_comd, vars = "estimate")
where source
is the "source" dataset, i.e., the one on which we observe the
variable we wish to estimate at the target
dataset. The argument vars
is a
character
(scalar or vector) containing the name of the variable (or
variables) at the source
that are going to be estimated at the target
regions. This function does not quantify the uncertainty associated with the
estimates, though. In order to get a variability measure associated with the
estimates, we need to use the function ai_var
(see the chunk of code
below). The function is quite similar to the before mentioned ai
. The
differences are: 1) vars
must be a scalar; 2) the argument my_var
is another
length 1 character
indicating the name of the variable that contains the
variance of the observed variable in the source
dataset; and 3) the argument
var_method
indicates whether we are going to use the Moran's I (var_method =
"MI"
) to approximate the autocorrelation between areas and, therefore,
approximate the variance of the estimates or the Cauchy--Schwartz (var_method =
"CS"
) inequality to calculate an upper bound for such variances. The output of
the ai_var
function is the target
(sf data.frame
) with two additional
columns, est
and se_est
, representing the estimated variable and its
corresponding (approximate) standard error.
estimate_comd <- ai_var(source = nyc_surv, target = nyc_comd, vars = "estimate", vars_var = "my_var", var_method = "MI")
The next piece of code uses ggplot2
to plot the observed and estimated
variable at NYC census tracts and community districts, respectively.
viz_dt <- rbind( transform(nyc_surv, source = "Observed", est = estimate)[c("source", "est")], transform(estimate_comd, source = "Predicted")[c("source", "est")] ) ggplot(data = viz_dt, aes(fill = est / 1000)) + geom_sf(color = 1, lwd = .1) + scale_fill_viridis_c(option = "H", name = "median \n income", label = \(x) paste0(x, "K")) + theme_bw() + facet_wrap( ~ source) + theme(axis.text = element_blank(), axis.ticks = element_blank())
viz_dt <- rbind( transform(nyc_surv, source = "Observed", est = estimate)[c("source", "est")], transform(nyc_comd, source = "Predicted")[c("source", "est")] ) ggplot(data = viz_dt, aes(fill = est / 1000)) + geom_sf(color = 1, lwd = .1) + scale_fill_viridis_c(option = "H", name = "median \n income", label = \(x) paste0(x, "K")) + theme_bw() + facet_wrap( ~ source) + theme(axis.text = element_blank(), axis.ticks = element_blank())
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.