View source: R/io_immundata_write.R
| write_immundata | R Documentation |
Serializes the essential components of an ImmunData object to disk for
efficient storage and later retrieval. It saves the core annotation data
(idata$annotations) as a compressed Parquet file and accompanying metadata
(including receptor/repertoire schemas and package version) as a JSON file
within a specified directory.
write_immundata(
idata,
output_folder = NULL,
tag = NULL,
rehome = FALSE,
compression = "zstd",
compression_level = 9
)
idata |
The |
output_folder |
Character(1) or |
tag |
Character(1) or |
rehome |
Logical(1). If |
compression |
Character(1) or |
compression_level |
Numeric(1) or |
The function performs the following actions:
Validates the input idata object and write options.
Resolves the destination folder:
uses output_folder when explicitly provided, or
creates an auto-snapshot folder under
home_path/snapshots/<tag>/vNNN when output_folder = NULL.
Constructs metadata including schemas, snapshot_id, lineage, and
provenance paths.
Writes metadata to metadata.json within the resolved output folder.
Writes the idata$annotations table (a duckplyr_df or similar) to
annotations.parquet within output_folder.
By default, uses compression = "zstd" and compression_level = 9.
A common choice is compression = "snappy" for faster reads/writes
with larger files.
Another common choice is compression = "zstd" for smaller files, often
with higher CPU cost.
compression_level usually trades speed for size (higher levels: smaller
output but slower processing).
Compatibility note: for duckplyr version 1.2.0, compute_parquet()
does not accept extra options due to a known issue. In that version,
compression-related arguments are ignored and DuckDB defaults are used.
Uses internal helper imd_files() to determine the standard filenames
(metadata.json, annotations.parquet).
The receptor data itself (if stored separately in future versions) is not saved by this function; only the annotations linking to receptors are saved, along with the schema needed to reconstruct/interpret them.
Invisibly returns the input idata object, saved to disk.
In other words, this allows you to create snapshots of the data in the
output_folder. Mind that by saving the object, you execute all the
stored computations, so this operations can take longer than expected.
Read more about snapshots on our website in the "Concept" section.
read_immundata() for loading the saved data, read_repertoires()
which uses this function internally, ImmunData class definition.
## Not run:
# Assume 'my_idata' is an ImmunData object created previously
# my_idata <- read_repertoires(...)
# Define an output directory
save_dir <- tempfile("saved_immundata_")
# Save the ImmunData object
write_immundata(my_idata, save_dir)
# Auto-snapshot under <home>/snapshots/baseline/vNNN
write_immundata(my_idata, tag = "baseline")
# Optional: request a specific parquet compression setup
write_immundata(my_idata, save_dir, compression = "zstd", compression_level = 9)
# Optional: let DuckDB choose both settings
write_immundata(my_idata, save_dir, compression = NULL, compression_level = NULL)
# Check the created files
list.files(save_dir) # Should show "annotations.parquet" and "metadata.json"
# Clean up
unlink(save_dir, recursive = TRUE)
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.