knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
The purpose of factcuratoR is to provide sets of functions to help standardize variety testing data for curation of variety testing data sets supporting the WAVE project.
The goal of the WAVE program curation is to generate trial data and trial metadata that conform to the controlled vocabulary codebooks.
Trial data consists of multiple plot entries per trial. This includes all data applicable at the plot-level, such as plot, variety, entry, and traits like height, stand, and moisture.
Metadata consists of one entry per trial. Metadata is trial-level data, such as location, year, program, and nursery.
First, load factcuratoR and point to the main codebook (which currently must be named codebooks_all_db.csv for the validation functions)
library(factcuratoR) rlang::check_installed("here") codebook_folder <- here::here( "tests/testthat/test_controlled_vocab")
The main codebook contains the variable names and required status for the trial data and the trial metadata. For example, the trial data should contain the columns trial, variety, entry, and plot. The trial metadata should contain columns for trial, nursery, year, location, etc.
Note! Traits are collected for each trial, so conceptually, it would make sense for the traits (such as test_weight
or height
) to be contained in the main codebook associated with trial_data. However, because the traits list is expected to get rather long, the traits are stored in a separate file called
traits.csv. For validating the columns in the trial data, the validation
functions pull in the traits and treats each trait as a column in the trial data.
create_dm(here::here(codebook_folder, "codebooks_all_db.csv"))
The file codebooks_all_db.csv also specifies whether a variable must conform to
controlled vocabularies. If so, the controlled vocabularies are listed
in another codebook (e.g. allowed levels of nursery in trials_metadata
are defined in the nursery codebook). Let's call these 'controlled vocabulary
codebooks' to better distinguish them from the 'main codebook.'
Said another way, the 'main codebook' hold column names and the 'controlled vocabulary codebooks' hold levels that are approved as column contents.
Just as the main codebook defines the columns that are present in the trial_data and trials_metadata, codebooks_all_db.csv also defines the columns that are present in the controlled vocabulary codebooks. For the example (Fig. 1), codebooks_all_db.csv also has information for the column names in the cultivar, nursery, location, and crop_market_classes controlled vocabulary codebooks.
The codebooks_all_db.csv has the following (required) columns:
book
: name of the codebook variable
: name of a column in the trial data, metadata, or a codebook value_type
: A level matching one of the following options: is.character() == TRUE
) %%1 == 0
) value_range
) meaning
: description of the variable, including units or formatting requirementsvalues_defined_in
: This is NA if value_type != "categorical." Otherwise,
this field should be populated with a name that matches a book where the variable
controlled vocabulary is defined
(e.g. for location in trials_metadata, the values_defined_in is "locations")value_range
: If this variable does not have controlled vocabularies, enter the accepted values or ranges. This formatting is used in the qc_validate_fns.R to validate the data. primary_key
[??]codebooks_all <- readin.db(codebook_folder) knitr::kable(codebooks_all$codebooks_all_db.csv %>% filter(book == "trial_data")) knitr::kable(codebooks_all$codebooks_all_db.csv %>% filter(book == "locations"))
When updating any codebook, the codebooks_all_db.csv must be updated to reflect the change.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.