fuse | R Documentation |
Fuse variables to a recipient dataset using a .fsn model produced by train
. Output can be passed to analyze
and validate
.
fuse(
data,
fsn,
fsd = NULL,
M = 1,
retain = NULL,
kblock = 10,
margin = 2,
cores = 1
)
data |
Data frame. Recipient dataset. All categorical variables should be factors and ordered whenever possible. Data types and levels are strictly validated against predictor variables defined in |
fsn |
Character. Path to fusion model file (.fsn) generated by |
fsd |
Character. Optional fusion output file to be created ending in |
M |
Integer. Number of implicates to simulate. |
retain |
Character. Names of columns in |
kblock |
Integer. Fixed number of nearest neighbors to use when fusing variables in a block. Must be >= 5 and <= 30. Not applicable for variables fused on their own (i.e. no block). |
margin |
Numeric. Safety margin used when estimating how many implicates can be processed in memory at once. Set higher if |
cores |
Integer. Number of cores used. LightGBM prediction is parallel-enabled on all systems if OpenMP is available. |
TO UPDATE.
If fsd = NULL
, a data.table
with number of rows equal to M * nrow(data)
. Integer column "M" indicates implicate assignment of each observation. Note that the ordering of recipient observations is consistent within implicates, so do not change the row order if using with analyze
.
If fsd
is specified, the path to .fsd file where results were written. Metadata for column classes and factor levels are stored in the column names. read_fsd
should be used to load files saved via the fsd
argument.
# Build a fusion model using RECS microdata
# Note that "fusion_model.fsn" will be written to working directory
?recs
fusion.vars <- c("electricity", "natural_gas", "aircon")
predictor.vars <- names(recs)[2:12]
fsn.path <- train(data = recs, y = fusion.vars, x = predictor.vars)
# Generate single implicate of synthetic 'fusion.vars',
# using original RECS data as the recipient
recipient <- recs[predictor.vars]
sim <- fuse(data = recipient, fsn = fsn.path)
head(sim)
# Calling fuse() again produces different results
sim <- fuse(data = recipient, fsn = fsn.path)
head(sim)
# Generate multiple implicates
sim <- fuse(data = recipient, fsn = fsn.path, M = 5)
head(sim)
table(sim$M)
# Optionally, write results directly to disk
# Note that "results.fsd" will be written to working directory
sim <- fuse(data = recipient, fsn = fsn.path, M = 5, fsd = "results.fsd")
sim <- read_fsd(sim)
head(sim)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.