hashIDfun: Generate a function that calculates a hash from one or more...

View source: R/hashIDfun.R

hashIDfunR Documentation

Generate a function that calculates a hash from one or more columns

Description

Make functions that calculate hashed keys from the collapsed values of one or more columns in a data.frame or data.table

Usage

hashIDfun(nms, as_char = TRUE)

Arguments

nms

A vector of names to hash. The output function will check that all nms are present in the data.frame or data.table.

as_char

Should the output be coerced to character or kept as class hash? Defaults to TRUE

Details

This function is meant to condense information for each row into a compact, meaningful md5 hash, using the openssl::md5() function. The outputs can be useful keys to compare across tables where a primary key is not readily available.

Collisions are of course possible, but the chances of this happening should be exceedingly rare.

Value

A vector of hashed values based on input nms. By default, will be coerced to character, or hash if as_char=FALSE.

Examples

df <- data.frame(
chr = LETTERS[1:10],
fctr = factor(LETTERS[1:10]),
num = as.numeric(1:10),
int = 1:10L,
date = seq.Date(from = Sys.Date() - 9, to = Sys.Date(), by = "day"),
stringsAsFactors = FALSE
)

f1 <- hashIDfun(names(df))
f2 <- hashIDfun(names(df), as_char = FALSE)

x1 <- f1(df)
x2 <- f2(df)

all.equal(x1, x2, check.attributes = FALSE) # class difference

# order of input matters, since we are collapsing
f1_rev <- hashIDfun(rev(names(df)))
x1_rev <- f1_rev(df)


slin30/wzMisc documentation built on Jan. 27, 2023, 1 a.m.