validateFile: Validate files

Description Usage Arguments Value Examples

Description

Applies several checks to validate a file. Outputs a named character vector of strings giving the results of the checks, where the names are the tests and the value is the check result. Empty string are reported for checks that succeed and error message for checks that fail. Missing values NA_character_ are reported when a test id not performed. This means either a test was not supposed to run, or that a previous test failed. Generally, the first failing test causes the result vector to be returned with all following check results reported as missing.

Usage

1
2
validateFile(path, checksum = NULL, checksumFunc = tools::md5sum,
  fileSize = NULL)

Arguments

path

The path to the file to check. Should exist and be a real file. If not, failure will be reported.

checksum

The expected checksum of the file.By default this is NULL, meaning no checksum is generated. If given it will be checked agains the value provided by the checksumFunc.

checksumFunc

The function object (not the string name) to use when calculating the checksum of the file. By default this is tools::md5sum. The specified function will be called with one parameter, path. The value returned is then tested against the provided checksum. This returned value should be an (atomic) vector type but can not be NULL or a missing value. When checksumFunc is called, path has already been verified and is known to exist on the file system as a real file (not a directory or link). The function object passed may not be NULL when checksum is provided, but is not used checksum is NULL.

fileSize

The expected size of the file, in bytes. By default this is NULL, meaning file size will not be checked. If a positive value is supplied it will be checked against the size of the source file on the file system. If it does not match then failure is reported (and checksum is not checked).

Value

A named character vector of test results. The names are the tests performed and the values are the results of those tests as a string. Empty strings indicate success, non-empty strings are failure messages indicating what went wrong. May contain NA's, which either mean something went wrong earlier so this test was not run, or that the test was selected not to be run

Tests are:

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
# Setup for examples
noSuchFile  <- tempfile( 'noSuchFile'  )
emptyFile <- tempfile( 'emptyFile' )
md5EmptyFile <- 'd41d8cd98f00b204e9800998ecf8427e'
file.create( emptyFile )
binFile <- tempfile( 'binFile', fileext= '.bin' )
writeBin( as.raw(c(1,2,3)), binFile)

check <- validateFile( emptyFile, fileSize= 0, checksum= md5EmptyFile )
names(check)
#=> [1] "checkParam_path"                   "checkParam_checksum"
#=> [3] "checkParam_checksumFunc"           "checkParam_fileSize"
#=> [5] "checkParam_checksum_checksumFunc"          "checkIsFile"
#=> [7] "checkIsNotLink"                   "checkFileSizeMatches"
#=> [9] "checkChecksumMatches"
check['checkFileSizeMatches']
#=> checkFileSizeMatches
#=>                   ""
# Remove names to make things simple
names(check) <- NULL
check
#=> [1] "" "" "" "" "" "" "" "" ""

# Not checking file contents
check <- validateFile( emptyFile, fileSize= NULL, checksum= NULL )
names(check) <- NULL
#=> [1] "" "" "" "" "" "" "" NA NA

# Bad checksum
check <- validateFile( emptyFile, fileSize= 0, checksum= 'abc123' )
names(check) <- NULL
#=> [1] "" "" "" "" "" "" "" ""
#=> [9] "Checksum mismatch. Found d41d8cd98f00b204e9800998ecf8427e wanted abc123.

# Bad file size. No need to check the checksum if file size doesn't match
check <- validateFile( emptyFile, fileSize= 1, checksum= 'BAD' )
names(check) <- NULL
#=> [1] "" "" "" "" "" "" ""
#=> [8] "File size mismatch. Found 0 wanted 1."
#=> [9] NA

# Bad file path, no need to check anything else.
check <- validateFile( noSuchFile, fileSize= 0, checksum= md5EmptyFile )
names(check) <- NULL
#=> [1] ""  ""  ""  ""  ""  "No such path."
#=> [7] NA  NA  NA

# Using your own checksum function object
SumIt <- function (path) { sum( as.numeric(
    readBin( path, what = 'raw', n= file.info(path)[['size']] + 1 )
))}
check <- validateFile( binFile, fileSize= 3, checksum= 6, checksumFunc= SumIt )
names(check) <- NULL
check
#=> [1] "" "" "" "" "" "" "" "" ""

# Cleanup
file.remove( emptyFile )
file.remove( binFile )

jefferys/DataRepo documentation built on May 19, 2019, 3:58 a.m.