View source: R/analyze_fusionACS.R
analyze_fusionACS | R Documentation |
For fusionACS internal use only. Calculation of point estimates and associated uncertainty (margin of error) for analyses using ACS and/or fused donor survey variables.
Efficiently computes means, medians, sums, proportions, and counts, optionally across population subgroups.
The use of native ACS weights or ORNL UrbanPop synthetic population weights is automatically determined given the requested geographic resolution.
Requires a local /fusionData
directory in the working directory path with assumed file structure and conventions.
analyze_fusionACS(
analyses,
year,
respondent = "household",
by = NULL,
area = NULL,
fun = NULL,
M = Inf,
R = Inf,
cores = 1,
version_up = 2,
force_up = FALSE
)
analyses |
List. Specifies the desired analyses. Each analysis is a formula. See Details and Examples. |
year |
Integer. One or more years for which microdata are pooled to compute |
respondent |
Character. Should the |
by |
Character. Optional variable(s) that collectively define the set of population subgroups for which each analysis is computed. Can be a mix of geographic (e.g. census tract) and/or socio-demographic microdata variables (e.g. poverty status); the latter may be existing variables on disk or custom variables created on-the-fly via |
area |
Call. Optional unquoted call specifying a geographic area within which to compute the |
fun |
Function. Optional function for creating custom microdata variables that cannot be accommodated in |
M |
Integer. The first |
R |
Integer. The first |
cores |
Integer. Number of cores used for multithreading in |
version_up |
Integer. Use |
force_up |
Logical. If |
Allowable geographic units of analysis specified in by
are currently limited to: region, division, state, cbsa10, puma10, county10, cousubfp10 (county subdivision), zcta10 (zip code), tract10 (census tract), and bg10 (block group).
The final point estimates are the mean estimates across implicates. The final margin of error is derived from the pooled standard error across implicates, calculated using Rubin's pooling rules (1987). The within-implicate standard error's are calculated using the replicate weights.
Each entry in the analyses
list is a formula
of the format Z ~ F(E)
, where Z
is an optional, user-friendly name for the analysis, F
is an allowable “outer function”, and E
is an “inner expression” containing one or more microdata variables. For example:
mysum ~ mean(Var1 + Var2)
In this case, the outer function is mean(). Allowable outer functions are: mean(), sum(), median(), sd(), and var(). When the inner expression contains more than one variable, it is first evaluated and then F()
is applied to the result. In this case, an internal variable X = Var1 + Var2
is generated across all observations, and then mean(X)
is computed.
If no inner expression is desired, the analyses
list can use the following convenient syntax to apply a single outer function to multiple variables:
mean = c("Var1", "Var2")
The inner expression can also utilize any function that takes variable names as arguments and returns a vector with the same length as the inputs. This is useful for defining complex operations in a separate function (e.g. microsimulation). For example:
myfun = function(Var1, Var2) {Var1 + Var2}
mysum ~ mean(myfun(Var1, Var2))
The use of sum() or mean() with an inner expression that returns a categorical vector automatically results in category-wise weighted counts and proportions, respectively. For example, the following analysis would fail if evaluated literally, since mean() expects numeric input but the inner expression returns character. But this is interpreted as a request to return weighted proportions for each categorical outcome.
myprop ~ mean(ifelse(Var1 > 10 , 'Yes', 'No'))
analyze_fusionACS()
uses "fast" versions of the allowable outer functions, as provided by fast-statistical-functions
in the collapse
package. These functions are highly optimized for weighted, grouped calculations. In addition, outer functions mean(), sum(), and median() enjoy the use of platform-independent multithreading across columns when cores > 1
. Analyses with numerical inner expressions are processed using a series of calls to collap
with unique observation weights. Analyses with categorical inner expressions utilize a series of calls to fsum
.
A tibble reporting analysis results, possibly across subgroups defined in by
. The returned quantities include:
Optional analysis name; the "left hand side" of the analysis formula.
The "right hand side" of the analysis formula.
Type of analysis: sum, mean, median, prop(ortion) or count.
Factor levels for categorical analyses; NA otherwise.
Mean number of valid microdata observations across all implicates and replicates; i.e. the sample size used to construct the estimate.
Point estimate; mean estimate across all implicates and replicates.
Margin of error associated with the 90% confidence interval.
Standard error of the estimate.
Degrees of freedom used to calculate the margin of error.
Coefficient of variation; conventional scale-independent measure of estimate reliability. Calculated as: 100 * moe / 1.645 / est
Rubin, D.B. (1987). Multiple imputation for nonresponse in surveys. Hoboken, NJ: Wiley.
# Analysis using ACS native weights for year 2017, by PUMA, in South Atlantic Census Division
# Uses all available implicates and replicate weights
test <- analyze_fusionACS(analyses = list(high_burden ~ mean(dollarel / hincp > 0.05)),
year = 2017,
by = "puma10",
area = division == "South Atlantic")
# Analysis using UrbanPop 2015-2019 weights, by tract, in Utah (actually Salt Lake City metro given current UrbanPop data)
# Uses 5 (of possible 20) fusion implicates for RECS "dollarel" variable
# Uses 5 (of possible 10) UrbanPop replicate weights
test <- analyze_fusionACS(analyses = list(median_burden ~ median(dollarel / hincp)),
year = 2015:2019,
by = "tract10",
area = state_name == "Utah",
M = 5,
R = 5)
# User function to create custom variables from microdata
# Variables explicitly referenced in my_fun() are automatically loaded into 'data' within analyze_fusionACS()
# Variables returned by my_fun() may be used in 'by' or inner expressions of 'analyses'
my_fun <- function(data) {
require(tidyverse, quietly = TRUE)
data %>%
mutate(elderly = agep >= 65,
energy_expend = dollarel + dollarfo + dollarlp + dollarng,
energy_burden = energy_expend / hincp,
energy_burden = ifelse(hincp < 5000, NA, energy_burden)) %>%
select(elderly, energy_burden, energy_expend)
}
# Analysis using UrbanPop 2015-2019 weights, by zip code and elderly head of household, in Atlanta CBSA
test <- analyze_fusionACS(analyses = list(energy_burden ~ mean(energy_burden),
at_risk ~ mean(energy_burden > 0.075 | acequipm_pub == "No air conditioning")),
year = 2015:2019,
by = c("zcta10", "elderly"),
area = cbsa10 == "12060",
fun = my_fun,
M = 5,
R = 5)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.