unf | R Documentation |
UNF is a cryptographic hash or signature that can be used to uniquely identify (a version of) a dataset, or a subset thereof.
unf(x, version = 6, ...) unf3( x, digits = 7L, characters = 128L, factor_as_character = TRUE, nonfinites_as_missing = FALSE, empty_character_as_missing = FALSE, dvn_zero = FALSE, ... ) unf4( x, digits = 7L, characters = 128L, truncation = 128L, version = 4, factor_as_character = TRUE, nonfinites_as_missing = FALSE, empty_character_as_missing = FALSE, dvn_zero = FALSE, ... ) unf5( x, digits = 7L, characters = 128L, truncation = 128L, raw_as_character = TRUE, factor_as_character = TRUE, nonfinites_as_missing = FALSE, empty_character_as_missing = FALSE, dvn_zero = FALSE, timezone = "", date_format = "%Y-%m-%d", decimal_seconds = 5, ... ) unf6( x, digits = 7L, characters = 128L, truncation = 128L, raw_as_character = TRUE, factor_as_character = TRUE, complex_as_character = TRUE, nonfinites_as_missing = FALSE, timezone = "", date_format = "%Y-%m-%d", decimal_seconds = 5, ... )
x |
For |
version |
Version of the UNF algorithm. Allowed values are 3, 4, 4.1, 5, and 6. Always use the same version of the algorithm to check a UNF. Default for |
digits |
The number of significant digits for rounding for numeric values. Default is 7L. Must be between 1 and 15. |
characters |
The number of characters for truncation. Default is 128L. Must be greater than 1. |
factor_as_character |
A logical indicating whether to treat an factors as character. If |
nonfinites_as_missing |
A logical indicating whether to treat nonfinite values ( |
empty_character_as_missing |
A logical indicating whether to treat an empty character string as a missing value. This is supplied to create compatibility with a Dataverse UNFv5 implementation. |
dvn_zero |
A logical indicating whether to format a zero (0) numeric value as |
truncation |
The number of bits to truncate the UNF signature to. Default is 128L. Must be one of: 128,192,196,256. |
raw_as_character |
A logical indicating whether to format raw vectors as character. |
timezone |
A character string containing a valid timezone. This is used for formatting “Date” and “POSIXt” class variables. Because of different implementations of datetime classes across computer applications, UNF signatures may vary due to the timezone in which they are calculated. This parameter allows for the comparison of UNFs calculated in different timezones. |
date_format |
A character string containing a formatting pattern for “Date” class variables. One of |
decimal_seconds |
A number indicating the number of decimal places to round fractional seconds to. The UNF specification (and default) is 5. |
complex_as_character |
A logical indicating whether to format raw vectors as character. If |
... |
Additional arguments passed to specific algorithm functions. Ignored. |
The Dataverse Network implements a potentially incorrect version of the UNF algorithm with regard to the handling of zero values and logical FALSE
values in data (though the specification is unclear). Setting the dvn
argument to TRUE
(the default), uses the Dataverse implementation (for comparison to files stored in that archive).
The unf
function returns a list of class UNF
, containing:
unf
: A character string containing the universal numeric fingerprint.
hash
: A raw vector expressing the unencoded universal numeric fingerprint. This can be converted to a UNF using base64Encode
.
unflong
: For unf5
, a character string containing the un-truncated universal numeric fingerprint.
formatted
: A character string containing the formatted UNF, including version number and header attributes.
The object additionally contains several attributes:
version
: A one-element numeric vector specifying which version of the UNF algorithm was used to generate the object.
digits
: A one-element numeric vector specifying how many significant digits were used in rounding numeric values.
characters
: A one-element numeric vector specifying how many characters were preserved during truncation of character values.
truncation
: A one-element numeric vector specifying how many bits the UNF hash was truncated to.
The default print method displays the UNF along with these attributes. For example:
UNF:3:4,128:ZNQRI14053UZq389x0Bffg==
This representation identifies the signature as UNF, using version 3 of the algorithm, computed to 4 significant digits for numbers and 128 for characters. The segment following the final colon is the actual fingerprint in base64-encoded format.
https://guides.dataverse.org/en/latest/developers/unf/index.html
Altman, M., J. Gill and M. P. McDonald. 2003. Numerical Issues in Statistical Computing for the Social Scientist. John Wiley \& Sons. [Describes version 3 of the algorithm]
Altman, M., \& G. King. 2007. A Proposed Standard for the Scholarly Citation of Quantitative Data. D-Lib 13(3/4). http://dlib.org/dlib/march07/altman/03altman.html [Describes a citation standard using UNFs]
Altman, M. 2008. A Fingerprint Method for Scientific Data Verification. In T. Sobh, editor, Advances in Computer and Information Sciences and Engineering, chapter 57, pages 311–316. Springer Netherlands, Netherlands, 2008. https://link.springer.com/chapter/10.1007/978-1-4020-8741-7_57 [Describes version 5 of the algorithm]
Data Citation Synthesis Group. 2013. Declaration of Data Citation Principles [DRAFT]. https://force11.org/info/joint-declaration-of-data-citation-principles-final/. [Describes general principles of data citation, of which UNF is likely to be a part]
%unf%
# Version 6 # ### FORTHCOMING ### # Version 5 # ## vectors ### just numerics unf5(1:20) # UNF:5:/FIOZM/29oC3TK/IE52m2A== unf5(-3:3, dvn_zero = TRUE) # UNF:5:pwzm1tdPaqypPWRWDeW6Jw== ### characters and factors unf5(c('test','1','2','3')) # UNF:5:fH4NJMYkaAJ16OWMEE+zpQ== unf5(as.factor(c('test','1','2','3'))) # UNF:5:fH4NJMYkaAJ16OWMEE+zpQ== ### logicals unf5(c(TRUE,TRUE,FALSE), dvn_zero=TRUE)# UNF:5:DedhGlU7W6o2CBelrIZ3iw== ### missing values unf5(c(1:5,NA)) # UNF:5:Msnz4m7QVvqBUWxxrE7kNQ== ## variable order and object structure is irrelevant unf(data.frame(1:3,4:6,7:9)) # UNF:5:ukDZSJXck7fn4SlPJMPFTQ== unf(data.frame(7:9,1:3,4:6)) unf(list(1:3,4:6,7:9)) # Version 4 # # version 4 data(longley) unf(longley, ver=4, digits=3) # PjAV6/R6Kdg0urKrDVDzfMPWJrsBn5FfOdZVr9W8Ybg= # version 4.1 unf(longley, ver=4.1, digits=3) # 8nzEDWbNacXlv5Zypp+3YCQgMao/eNusOv/u5GmBj9I= # Version 3 # x1 <- 1:20 x2 <- x1 + .00001 unf3(x1) # HRSmPi9QZzlIA+KwmDNP8w== unf3(x2) # OhFpUw1lrpTE+csF30Ut4Q== # UNFs are identical at specified level of rounding identical(unf3(x1), unf3(x2)) identical(unf3(x1, digits=5),unf3(x2, digits=5)) # dataframes, matrices, and lists are all treated identically: unf(cbind.data.frame(x1,x2),ver=3) # E8+DS5SG4CSoM7j8KAkC9A== unf(list(x1,x2), ver=3) unf(cbind(x1,x2), ver=3)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.