library("knitr") opts_chunk$set(echo = TRUE) opts_chunk$set(collapse = TRUE, comment = "#>", message = FALSE) opts_knit$set(root.dir = "../../")
MySQL allows users to write short documentation comments for database tables (up to 80 characters) and for database fields (up to 255 characters). These documentation comments can be queried and retrieved like other data, so they provide a handy way to store descriptive metadata.
This script checks for missing documentation in database.
# Connect to the scores database library("L2TDatabase") library("dplyr") cnf_file <- "inst/l2t_db.cnf" l2t <- l2t_connect(cnf_file, "backend")
Views are tables that are dynamically show the results of a query. These are prefixed in the database with "q_" (for query). These view tables, and the derived fields computer in them, cannot be documented. We have to ignore these tables in our check
The following tables are views and can only have have limited documentation.
describe_db(l2t) %>% filter(Description == "VIEW")
In the q_MinPair_Aggregate
view, the proportion correct for non-training
trials is a derived value. The field MinPair_ProportionCorrect
is created
and computed as the query is executed. Therefore, there is no documentation
available for it.
l2t %>% describe_tbl("q_MinPair_Aggregate") %>% filter(Description == "")
The following tables are missing documentation:
no_table_doc <- describe_db(l2t) %>% filter(Description == "") no_table_doc
The following fields are missing documentation.
# Omit strings containing a pattern str_reject <- function(string, pattern) { string[!stringr::str_detect(string, pattern)] } non_view_tbls <- src_tbls(l2t) %>% str_reject("q_") # Get description of each non-view table. Combine them. all_descriptions <- Map(function(x) describe_tbl(l2t, x), non_view_tbls) %>% bind_rows() %>% as_data_frame() # Keep only undocumented rows undocumented_fields <- all_descriptions %>% # Ignore the temporary tables used for mass data-entry mutate(IsEntryTable = stringr::str_detect(Table, "_Entry")) %>% filter(Description == "", !IsEntryTable) %>% select(Table, Field, Description) %>% print(n = Inf) undocumented_fields
We need to be careful about tables that share field names because they will
not join correctly. For example, trying to combine an Experiment table with a
Dialect
column to a ParticipantInfo table with Dialect
column will wrongly
combine experiments and children that share the same values in the Dialect
column. Which is not what we want, especially the same child is exposed to
different versions of an experiment with different dialects.
Ideally, only index fields should be allowed to appear in more than one table,
and we check that this is the case. We allow tables for item-level data to have
repeated names (like Trial or Stimulus1), so we ignore tables with _Responses
in the name.
# Get description of each non-view table. Combine them. repeated_names <- all_descriptions %>% # Ignore the temporary tables used for mass data-entry mutate(IsEntryTable = stringr::str_detect(Table, "_Entry"), IsItemTable = stringr::str_detect(Table, "_Responses")) %>% filter(Index == "", !IsEntryTable, !IsItemTable) %>% # Count the occurrence of each field group_by(Field) %>% mutate(FieldCount = length(Field)) %>% ungroup() %>% # Keep duplicated fields filter(FieldCount != 1) %>% select(Field, Table) %>% arrange(Field) %>% print(n = Inf) repeated_names
results <- bind_rows(repeated_names, undocumented_fields, no_table_doc) results <- data_frame( Check = "MySQL metadata", Date = format(Sys.Date()), Passing = nrow(results) == 0 ) readr::write_csv(results, "./inst/audit/results_metadata.csv")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.