validate | R Documentation |
Performs internal validation analyses on fused microdata to estimate how well the simulated variables reflect patterns in the dataset used to train the underlying fusion model (i.e. observed/donor data). This provides a standard approach to validating fusion output and associated models. See Examples for recommended usage.
validate(
observed,
implicates,
subset_vars,
weight = NULL,
min_size = 30,
plot = TRUE,
cores = 1
)
observed |
Data frame. Observed data against which to validate the |
implicates |
Data frame. Implicates of synthetic (fused) variables. Typically generated by fuse. The implicates should be row-stacked and identified by integer column "M". |
subset_vars |
Character. Vector of columns in |
weight |
Character. Name of the observation weights column in |
min_size |
Integer. Subsets with less than |
plot |
Logical. If TRUE (default), |
cores |
Integer. Number of cores used. Only applicable on Unix systems. |
The objective of validate
is to confirm that the fusion output is sensible and help establish the utility of the synthetic data across myriad analyses. Utility here is based on comparison of point estimates and confidence intervals derived using multiple-implicate synthetic data with those derived using the original donor data.
The specific analyses tested include variable levels (means and proportions) across population subsets of varying size. This allows estimates of how each of the synthetic variables perform in analyses with real-world relevance, at varying levels of complexity. In effect, validate()
performs a large number of analyses of the kind that the analyze
function is designed to do on a one-by-one basis.
Most users will want to use the default setting plot = TRUE
to simultaneously return visualization (plots) of the validation results. Plot creation is detailed in plot_valid
.
If plot = FALSE
, a data frame containing complete validation results. If If plot = FALSE
, a list containing full results as well as additional lot objects as described in plot_valid
.
# Build a fusion model using RECS microdata
# Note that "fusion_model.fsn" will be written to working directory
fusion.vars <- c("electricity", "natural_gas", "aircon")
predictor.vars <- names(recs)[2:12]
fsn.path <- train(data = recs,
y = fusion.vars,
x = predictor.vars,
weight = "weight")
# Fuse back onto the donor data (multiple implicates)
sim <- fuse(data = recs,
fsn = fsn.path,
M = 20)
# Calculate validation results
valid <- validate(observed = recs,
implicates = sim,
subset_vars = c("income", "education", "race", "urban_rural"))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.