Description Usage Arguments Details Value Author(s) See Also Examples
Perform sanity checks on a single variable.
This function can be used after performing some data munging to check for mistakes.
1 2 3 |
var |
the variable to check (or it's name as a string if the data arg is supplied) |
data |
an optional dataframe containing the variable (otherwise 'var' is taken from the calling environment) |
vartype |
(optional) type of the variable (compared with typeof(var)) |
varclass |
(optional) class of the variable (compared with class(var), matches if arg is in class(var)) |
varmode |
(optional) mode of the variable (compared with mode(var)) |
min_len |
(optional) minimum length of the variable (compared with length(var)) |
max_len |
(optional) maximum length of the variable (compared with length(var)) |
min |
(optional), minimum allowed value (compared with min(var)) |
max |
(optional), maximum allowed value (compared with max(var)) |
vals |
(optional), list of all unique non missing values (compared with uniqueNotNA(var)) |
valstype |
(optional) used in conjunction with 'vals'. If "all" (default) then 'vals' should contain all the same items as uniqueNotNA(var), if "subset"/"superset" then 'vals' should be a subset/superset of uniqueNotNA(var) |
charmatch |
(optional) regexp that should match each value (apart from missing values). Use only with character variables. |
nocharmatch |
(optional) regexp that should not match any value (apart from missing values). Use only with character variables. |
min_uniq |
(optional) minimum number of unique values (compare with length(unique(var))) |
max_uniq |
(optional) maximum number of unique values (compare with length(unique(var))) |
max_na |
(optional) maximum number of missing values (compare with sum(is.na(var))) |
checksum |
(optional) a checksum of the variable as returned by digest(VAR,algo="crc32"). |
pred |
(optional) A function which takes a variable as input and returns TRUE/FALSE depending on whether the variable is valid or not. |
showbadvals |
(optional) if a positive integer N then print the first N non-matching values (only for tests on individual values. Default: N = 100). |
silent |
(optional) if TRUE then don't omit warning messages informing of error type (FALSE by default) |
stoponfail |
(optional) if TRUE then throw an error on the first check that fails (FALSE by default) |
You can check the data type, class, mode, length, max, min, unique, or missing values, and checksum. You can also supply your own function to check the variable. For the 'min_uniq', 'max_uniq' and 'max_na' variables, you can supply either a whole number indicating the number of cases, or a number between 0 & 1 representing a proportion of cases.
Note: if you need to repeatedly call the function on the same dataframe you can curry the data argument using
the CurryL function in the functional library, e.g: checkalldata <- CurryL(checkVar,data=alldata)
(it wont work with the non-lazy Curry function). To apply the function to all variables in a dataframe use the
apply
or checkDF
functions.
A list whose first element is TRUE if all checks passed, FALSE otherwise, and whose subsequent elements are vectors of indices of non-matching values for tests on individual values.
Ben Veal
1 2 3 4 | ## create a function for checking variables in "ChickWeight" dataframe
checkalldata <- functional::CurryL(checkVar,data=ChickWeight)
## check one variable
checkalldata("weight",vartype="double")
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.