check_data_tables: Check tables against data model

check_data_tablesR Documentation

Check tables against data model

Description

Read a set of files containing data tables and check them against a data model.

Usage

read_data_tables(files, table_names = names(files), quiet = TRUE)

check_table_names(tables, model)

check_column_names(tables, model)

check_column_types(tables, model)

check_column_min_max(tables, model)

check_missing_values(tables, model)

check_unique(tables, model)

check_bucket_paths(tables, model)

check_valid_entity_id(tables, model, report_missing_id = FALSE)

check_primary_keys(tables, model)

check_foreign_keys(tables, model)

parse_column_name_check(chk)

parse_column_type_check(chk)

Arguments

files

Vector of file paths, one per data table.

table_names

Vector of table names associated with files.

quiet

Logical to control printing results of column parsing from read_tsv.

tables

Named list of data tables

model

dm object describing data model

report_missing_id

A logical indicating whether the absence of an entity id is regarded as an error.

chk

output of check_column_names or check_column_types

Value

read_data_tables returns a named list of data frames.

check_table_names returns NULL if tables matches model, or a list:

  • missing_tables: Vector of tables in model but not in tables

  • extra_tables: Vector of tables in tables but not in model

check_column_names return a list of all tables in common between data and model. Each table element is NULL if columns in tables matches model, or a list:

  • missing_required_columns: Vector of required columns in model but not in tables

  • missing_optional_columns: Vector of optional columns in model but not in tables

  • extra_columns: Vector of columns in tables but not in model

check_column_types returns a list of all tables in common between data and model. Each table element is a list of all columns in common between table and model. Each column element is NULL if values in column are a compatible type with the data model, or a string describing the mismatch.

check_column_types returns a list of all tables in common between data and model. Each table element is a list of all columns in common between table and model that have min and/or max values. Each column element is NULL if values in column are between min and max, or a string describing the mismatch.

check_missing_values returns a list of all tables in common between data and model. Each table element is a list of all required columns in common between table and model. Each column element is NULL if the column has no missing values, or the number of missing values in the column. If a condition is set on a column, missing values are only checked for rows where the condition is met.

check_unique returns a list of all tables in common between data and model. Each table element is a list of all columns in common between table and model also defined as unique by the model. Each column element is NULL if the column is unique, or a string listing duplicated elements.

check_bucket_paths returns a list of all tables in common between data and model. Each table element is a list of all columns in common between table and model also defined as containing bucket paths by the model. Each column element is NULL if all paths exist, or a string listing paths that do not exist.

check_valid_entity_id returns a list of all tables in common between data and model. Each table element is NULL if the table has a valid AnVIL entity_id, or a string describing the error.

check_primary_keys returns a list with two elements:

  • found_keysresults of dm_examine_constraints after applying primary keys from model to tables

  • missing_keyslist of missing primary keys in each table

check_foreign_keys returns a list with two elements:

  • found_keysresults of dm_examine_constraints after applying foreign keys from model to tables

  • missing_keyslist of missing child or parent keys in each table

parse_column_name_check and parse_column_type_check each return a tibble with check results suitable for printing

Examples

# read data model
json <- system.file("extdata", "data_model.json", package="AnvilDataModels")
model <- json_to_dm(json)

# read tables to check
table_names <- c("subject", "phenotype", "sample", "sample_set", "file")
files <- system.file("extdata", paste0(table_names, ".tsv"), package="AnvilDataModels")
names(files) <- table_names
tables <- read_data_tables(files)

check_table_names(tables, model)
check_column_names(tables, model)
check_column_types(tables, model)
check_primary_keys(tables, model)
check_foreign_keys(tables, model)


UW-GAC/AnvilDataModels documentation built on Nov. 3, 2024, 7:33 p.m.