write_immundata: Save ImmunData to disk

View source: R/io_immundata_write.R

write_immundataR Documentation

Save ImmunData to disk

Description

Serializes the essential components of an ImmunData object to disk for efficient storage and later retrieval. It saves the core annotation data (idata$annotations) as a compressed Parquet file and accompanying metadata (including receptor/repertoire schemas and package version) as a JSON file within a specified directory.

Usage

write_immundata(
  idata,
  output_folder = NULL,
  tag = NULL,
  rehome = FALSE,
  compression = "zstd",
  compression_level = 9
)

Arguments

idata

The ImmunData object to save. Must be an R6 object of class ImmunData containing at least the ⁠$annotations⁠ table and schema information (⁠$schema_receptor⁠, optionally ⁠$schema_repertoire⁠).

output_folder

Character(1) or NULL. Path to the directory where the output files will be written. If NULL, a snapshot directory is created as ⁠home_path/snapshots/<tag>/vNNN⁠, where home_path is read from internal ImmunData provenance.

tag

Character(1) or NULL. Snapshot tag used only when output_folder = NULL (for example, "baseline"). If NULL, defaults to "default" for auto-snapshots.

rehome

Logical(1). If TRUE, and output_folder is explicitly provided, this folder becomes the new snapshot home for future auto-snapshots. Default: FALSE.

compression

Character(1) or NULL. Parquet compression codec passed through to DuckDB (via duckplyr::compute_parquet(options = ...)). Defaults to "zstd". Set NULL to let DuckDB choose.

compression_level

Numeric(1) or NULL. Compression level passed through to DuckDB for codecs that support levels (for example, Zstandard). Defaults to 9. Set NULL to let DuckDB choose.

Details

The function performs the following actions:

  1. Validates the input idata object and write options.

  2. Resolves the destination folder:

    • uses output_folder when explicitly provided, or

    • creates an auto-snapshot folder under ⁠home_path/snapshots/<tag>/vNNN⁠ when output_folder = NULL.

  3. Constructs metadata including schemas, snapshot_id, lineage, and provenance paths.

  4. Writes metadata to metadata.json within the resolved output folder.

  5. Writes the idata$annotations table (a duckplyr_df or similar) to annotations.parquet within output_folder.

    • By default, uses compression = "zstd" and compression_level = 9.

    • A common choice is compression = "snappy" for faster reads/writes with larger files.

    • Another common choice is compression = "zstd" for smaller files, often with higher CPU cost.

    • compression_level usually trades speed for size (higher levels: smaller output but slower processing).

    • Compatibility note: for duckplyr version ⁠1.2.0⁠, compute_parquet() does not accept extra options due to a known issue. In that version, compression-related arguments are ignored and DuckDB defaults are used.

  6. Uses internal helper imd_files() to determine the standard filenames (metadata.json, annotations.parquet).

The receptor data itself (if stored separately in future versions) is not saved by this function; only the annotations linking to receptors are saved, along with the schema needed to reconstruct/interpret them.

Value

Invisibly returns the input idata object, saved to disk. In other words, this allows you to create snapshots of the data in the output_folder. Mind that by saving the object, you execute all the stored computations, so this operations can take longer than expected. Read more about snapshots on our website in the "Concept" section.

See Also

read_immundata() for loading the saved data, read_repertoires() which uses this function internally, ImmunData class definition.

Examples

## Not run: 
# Assume 'my_idata' is an ImmunData object created previously
# my_idata <- read_repertoires(...)

# Define an output directory
save_dir <- tempfile("saved_immundata_")

# Save the ImmunData object
write_immundata(my_idata, save_dir)

# Auto-snapshot under <home>/snapshots/baseline/vNNN
write_immundata(my_idata, tag = "baseline")

# Optional: request a specific parquet compression setup
write_immundata(my_idata, save_dir, compression = "zstd", compression_level = 9)

# Optional: let DuckDB choose both settings
write_immundata(my_idata, save_dir, compression = NULL, compression_level = NULL)

# Check the created files
list.files(save_dir) # Should show "annotations.parquet" and "metadata.json"

# Clean up
unlink(save_dir, recursive = TRUE)

## End(Not run)

immundata documentation built on April 4, 2026, 9:09 a.m.