fixity_checksum: Calculate a fixity checksum for an object

View source: R/utils.R

fixity_checksumR Documentation

Calculate a fixity checksum for an object

Description

Uses a hash function (md5) on an object and calculates a digest of the object in the form of a character string.

Usage

fixity_checksum(data_object, algorithm = "md5")

Arguments

data_object

A dataset downloaded with some eurostat package function.

algorithm

Algorithm to use when calculating a checksum for a dataset. Default is 'md5', but can be any supported algorithm in digest function.

Details

“Fixity, in the preservation sense, means the assurance that a digital file has remained unchanged, i.e. fixed.” (Bailey, 2014). In practice, fixity can most easily be established by calculating a checksum for the data object that changes if anything in the data object has changed. What we use as a checksum here is by default calculated with md5 hash algorithm. It is possible to use other algorithms supported by the imported digest function, see function documentation.

In the case of big objects with millions of rows of data calculating a checksum can take a bit longer and require some amount of RAM to be available. Selecting another algorithm might perform faster and/or more efficiently. Whichever algorithm you are using, please make sure to report it transparently in your work for transparency and ensuring replicability.

This function takes the whole data object as an input, meaning that everything counts when calculating the fixity checksum. If the dataset column names are labeled, if the data itself is labeled, if stringsAsFactors is TRUE, if flags are removed or kept, if data is somehow edited... all these affect the calculated checksum. It is advisable to calculate the checksum immediately after downloading the data, before adding any labels or doing other mutating operations. If you are using other arguments than the default ones when downloading data, it is also good to report the exact arguments used.

This implementation fulfills the level 1 requirement of National Digital Stewardship Alliance (NDSA) preservation levels by creating "fixity info if it wasn’t provided with the content". In the current version of the package, fixity information has to be created manually and is at the responsibility of the user.

Source

https://www.dpconline.org/handbook/technical-solutions-and-tools/fixity-and-checksums

See Also

digest::digest()


rOpenGov/eurostat documentation built on Jan. 19, 2024, 11:45 a.m.