read_fsd: Read fusion output from disk
In ummel/fusionModel: Data fusion and analysis of synthetic data in R

read_fsd

R Documentation

Read fusion output from disk

Description

Efficiently read fusion output that was written to disk, optionally returning a subset of rows and/or columns. Since a .fsd file is simply a fst file under the hood, this function also works on any .fst file.

Usage

read_fsd(
  path,
  columns = NULL,
  M = 1,
  df = NULL,
  cores = max(1, parallel::detectCores(logical = FALSE) - 1)
)

Arguments

`path`	Character. Path to a `.fsd` (or `.fst`) file, typically produced by `fuse`.
`columns`	Character. Column names to read. The default is to return all columns.
`M`	Integer. The first `M` implicates are returned. Set `M = Inf` to return all implicates. Ignored if `M` column not present in data.
`df`	Data frame. Data frame used to identify a subset of rows to return. Default is to return all rows.
`cores`	Integer. Number of cores used by `fst`.

Details

If df is provided and the file size on disk is less than 100 MB, then a full read and inner join is performed. For larger files, a manual read of the required rows is performed, using fmatch for the matching operation.

Value

A data.table; keys are preserved if present in the on-disk data. When path points to a .fsd file, it includes an integer column "M" indicating the implicate assignment of each observation (unless explicitly ignored by columns).

Examples

# Build a fusion model using RECS microdata
# Note that "fusion_model.fsn" will be written to working directory
?recs
fusion.vars <- c("electricity", "natural_gas", "aircon")
predictor.vars <- names(recs)[2:12]
fsn.path <- train(data = recs, y = fusion.vars, x = predictor.vars)

# Write fusion output directly to disk
# Note that "results.fsd" will be written to working directory
recipient <- recs[predictor.vars]
sim <- fuse(data = recipient, fsn = fsn.path, M = 5, fsd = "results.fsd")

# Read the fusion output saved to disk
sim <- read_fsd(sim)
head(sim)

ummel/fusionModel documentation built on June 1, 2025, 11 p.m.