data_check_table: Check formatting rules for standardized data tables

Description Usage Arguments Details See Also Examples

View source: R/data-check.R

Description

Prints a warning if any of the specied formatting rules don't pass (silent otherwise). Table-specific versions are convenience functions that call data_check_table() with appropriate defaults. The data_check function is a wrapper that calls all 3 versions (cust, lic, sale) together.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
data_check_table(df, df_name, primary_key, required_vars, allowed_values)

data_check_cust(df, df_name = "cust", primary_key = "cust_id",
  required_vars = c("cust_id", "sex", "birth_year"),
  allowed_values = list(sex = c(1, 2, NA), birth_year =
  c(1900:substr(Sys.Date(), 1, 4), NA)))

data_check_lic(df, df_name = "lic", primary_key = "lic_id",
  required_vars = c("lic_id", "type", "duration"),
  allowed_values = list(type = c("fish", "hunt", "combo"), duration =
  1:99))

data_check_sale(df, df_name = "sale", primary_key = NULL,
  required_vars = c("cust_id", "lic_id", "year", "month", "res"),
  allowed_values = list(year = c(2000:substr(Sys.Date(), 1, 4)), month =
  1:12, res = c(1, 0, NA)))

Arguments

df

data frame: table to check

df_name

character: name of relevant data table ("cust", "lic", or "sale")

primary_key

character: name of variable that acts as primary key, which should be unique and non-missing. NULL indicates no primary key in table.

required_vars

character: variables that should be included

allowed_values

list: named list with allowed values for specific variables

Details

Developer note: data_check_table() is itself a wrapper for several internal functions (see data_internal).

See Also

Other functions to check data format: data_check, data_foreign_key, data_internal, variable_allowed_values

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
library(dplyr)

# produce various format warnings
data(cust)
bind_rows(cust, cust) %>% data_check_cust()

cust$birth_year[1] <- 2100
data_check_cust(cust)

data(lic)
select(lic, -duration) %>% data_check_lic()
mutate(lic, duration = 0) %>% data_check_lic()

data(sale)
sale$year[1] <- NA
data_check_sale(sale)

southwick-associates/salic documentation built on Nov. 5, 2019, 9:13 a.m.