View source: R/data_provenance.R
| track_data | R Documentation |
Records comprehensive provenance information for data files including checksums, sources, timestamps, and metadata. Supports fast hashing for large files.
track_data(
data_path,
source = c("downloaded", "generated", "manual", "reference", "other"),
source_url = NULL,
description = NULL,
metadata = NULL,
fast_hash = TRUE,
size_threshold_gb = 1,
registry_file
)
data_path |
Character. Path to data file or directory. |
source |
Character. Source of the data (e.g., "downloaded", "generated", "manual", "reference"). |
source_url |
Character. URL if data was downloaded. Optional. |
description |
Character. Description of the data. Optional. |
metadata |
List. Additional metadata. Optional. |
fast_hash |
Logical. Use faster xxHash for large files (>1GB). Default TRUE. |
size_threshold_gb |
Numeric. Size threshold (GB) for using fast hash. Default 1. |
registry_file |
Character. Path to provenance registry (required). |
A list containing data provenance information
## Not run:
# Track a downloaded dataset
track_data("data/mydata.csv",
source = "downloaded",
source_url = "https://example.com/data.csv",
description = "Customer data from API",
registry_file = tempfile(fileext = ".json")
)
# Track generated data
track_data("results/simulation.rds",
source = "generated",
description = "Monte Carlo simulation results",
registry_file = tempfile(fileext = ".json")
)
# Track large file with fast hashing
track_data("data/large_file.bam",
source = "generated",
fast_hash = TRUE,
registry_file = tempfile(fileext = ".json")
)
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.