hashIDfun: Generate a function that calculates a hash from one or more...

View source: R/hashIDfun.R

hashIDfunR Documentation

Generate a function that calculates a hash from one or more columns


Make functions that calculate hashed keys from the collapsed values of one or more columns in a data.frame or data.table


hashIDfun(nms, as_char = TRUE)



A vector of names to hash. The output function will check that all nms are present in the data.frame or data.table.


Should the output be coerced to character or kept as class hash? Defaults to TRUE


This function is meant to condense information for each row into a compact, meaningful md5 hash, using the openssl::md5() function. The outputs can be useful keys to compare across tables where a primary key is not readily available.

Collisions are of course possible, but the chances of this happening should be exceedingly rare.


A vector of hashed values based on input nms. By default, will be coerced to character, or hash if as_char=FALSE.


df <- data.frame(
chr = LETTERS[1:10],
fctr = factor(LETTERS[1:10]),
num = as.numeric(1:10),
int = 1:10L,
date = seq.Date(from = Sys.Date() - 9, to = Sys.Date(), by = "day"),
stringsAsFactors = FALSE

f1 <- hashIDfun(names(df))
f2 <- hashIDfun(names(df), as_char = FALSE)

x1 <- f1(df)
x2 <- f2(df)

all.equal(x1, x2, check.attributes = FALSE) # class difference

# order of input matters, since we are collapsing
f1_rev <- hashIDfun(rev(names(df)))
x1_rev <- f1_rev(df)

slin30/wzMisc documentation built on Jan. 27, 2023, 1 a.m.