Create hash function digests for arbitrary R objects

Share:

Description

The digest function applies a cryptographical hash function to arbitrary R objects. By default, the objects are internally serialized, and either one of the currently implemented MD5 and SHA-1 hash functions algorithms can be used to compute a compact digest of the serialized object.

This version of the function accomplishes essentially the same result as the function of the same name in the digest package, but via somewhat different computations, as discussed in the book “Software for Data Analysis”.

Usage

1
2
digest(object, algo=c("md5", "sha1", "crc32"), 
       serialize=TRUE, file=FALSE, length=Inf, use.Call = FALSE)

Arguments

object

An arbitrary R object which will then be passed to the serialize function, unless the serialize argument is set to FALSE

algo

The algorithms to be used; currently available choices are md5, which is also the default, sha1 and crc32

serialize

A logical variable indicating whether the object should be serialized using serialize. Setting this to FALSE allows to compare the digest output of given character strings to known control output.

file

A logical variable indicating whether the object is a file name.

length

Number of characters to process. By default, when length is set to Inf, the whole string or file is processed.

use.Call

Should the C interface use the .Call() interface or the .C() interface. An internal question whose answer should not affect the result.

Details

See the documentation for the digest package version of the function, and the references below for the underlying algorithms.

The version in the present package has been modified for tutorial reasons, to illustrate some principles of the design of interfaces to C code.

Value

The digest function returns a character string of a fixed length containing the requested digest of the supplied R object. For MD5, a string of length 32 is returned; for SHA-1, a string of length 40 is returned; for CRC32 a string of length 8.

Author(s)

Dirk Eddelbuettel edd@debian.org for the original R interface; Antoine Lucas for the integration of crc32; Jarek Tuszynski for the file-based operationss; Christophe Devine for the hash function implementations for sha-1 and md5; Jean-loup Gailly and Mark Adler for crc32.

John Chambers for the modified C and R code in this package.

References

MD5: http://www.ietf.org/rfc/rfc1321.txt.

SHA-1: http://www.itl.nist.gov/fipspubs/fip180-1.htm.

CRC32: ftp://ftp.rocksoft.com/cliens/rocksoft/papers/crc_v3.txt.

http://www.cr0.net:8040/code/crypto for the underlying C functions used here for sha-1 and md5, and further references.

http://zlib.net for documentation on the zlib library which supplied the code for crc32.

See Also

serialize, md5sum

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
## Standard RFC 1321 test vectors
md5Input <-
  c("",
    "a",
    "abc",
    "message digest",
    "abcdefghijklmnopqrstuvwxyz",
    "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789",
    paste("12345678901234567890123456789012345678901234567890123456789012",
          "345678901234567890", sep=""))
md5Output <-
  c("d41d8cd98f00b204e9800998ecf8427e",
    "0cc175b9c0f1b6a831c399e269772661",
    "900150983cd24fb0d6963f7d28e17f72",
    "f96b697d7cb7938d525a2f31aaf161d0",
    "c3fcd3d76192e4007dfb496cca67e13b",
    "d174ab98d277d9f5a5611c2c9f419d9f",
    "57edf4a22be3c955ac49da2e2107b67a")

for (i in seq(along=md5Input)) {
  md5 <- digest(md5Input[i], serialize=FALSE)
  stopifnot(identical(md5, md5Output[i]))
}

sha1Input <-
  c("abc",
    "abcdbcdecdefdefgefghfghighijhijkijkljklmklmnlmnomnopnopq",
    NULL)
sha1Output <- 
  c("a9993e364706816aba3e25717850c26c9cd0d89d",
    "84983e441c3bd26ebaae4aa1f95129e5e54670f1",
    "34aa973cd4c4daa4f61eeb2bdbad27316534016f")

for (i in seq(along=sha1Input)) {
  sha1 <- digest(sha1Input[i], algo="sha1", serialize=FALSE)
  stopifnot(identical(sha1, sha1Output[i]))
}

crc32Input <-
  c("abc",
    "abcdbcdecdefdefgefghfghighijhijkijkljklmklmnlmnomnopnopq",
    NULL)
crc32Output <- 
  c("352441c2",
    "171a3f5f",
    "2ef80172")

for (i in seq(along=crc32Input)) {
  crc32 <- digest(crc32Input[i], algo="crc32", serialize=FALSE)
  stopifnot(identical(crc32, crc32Output[i]))
}

# one of the FIPS-
sha1 <- digest("abc", algo="sha1", serialize=FALSE)
stopifnot(identical(sha1, "a9993e364706816aba3e25717850c26c9cd0d89d"))

# example of a digest of a standard R list structure
digest(list(LETTERS, data.frame(a=letters[1:5], b=matrix(1:10,ncol=2))))

# test 'length' parameter and file input
fname = file.path(R.home(),"COPYING")
x = readChar(fname, file.info(fname)$size) # read file
for (alg in c("sha1", "md5", "crc32")) {
  # partial file
  h1 = digest(x    , length=18000, algo=alg, serialize=FALSE)
  h2 = digest(fname, length=18000, algo=alg, serialize=FALSE, file=TRUE)
  h3 = digest( substr(x,1,18000) , algo=alg, serialize=FALSE)
  stopifnot( identical(h1,h2), identical(h1,h3) )
  # whole file
  h1 = digest(x    , algo=alg, serialize=FALSE)
  h2 = digest(fname, algo=alg, serialize=FALSE, file=TRUE)
  stopifnot( identical(h1,h2) )
}

# compare md5 algorithm to other tools
library(tools)
fname = file.path(R.home(),"COPYING")
h1 = as.character(md5sum(fname))
h2 = digest(fname, algo="md5", file=TRUE)
stopifnot( identical(h1,h2) )

Want to suggest features or report bugs for rdrr.io? Use the GitHub issue tracker.