knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.path = "README-" ) options(dplyr.width = Inf)
Sys.setenv(TZ = 'UTC') library(plhdbR) load_plhdb_packages()
The functions read_bio_table
and read_fert_table
read csv files of biography and fertility data, respectively, created by the download buttons for these tables on the PLHDB website. These functions strip away blank lines and header lines, parse any date/time columns, and return a well-ordered dplyr::tbl_df
, an extension of R's data.frame
. To pull all the data from a given table, use search criteria like 'Study.ID != 10'
. Note that the data are not extensively error-checked at this stage. If you try to feed these functions a normal csv file, bad things might happen.
# Assuming your file is called "biography_2015_05_20.csv" lh <- read_bio_table("../data/biography_2015_05_20.csv") summary(lh)
# Assuming your file is called "fertility_2015_05_20.csv" fert <- read_fert_table("../data/fertility_2015_05_20.csv") summary(fert)
The functions find_bio_errors
and find_fert_errors
scan the loaded biography and fertility data, respectively, looking for errors of various kinds. Each function takes as an argument the relevant dplyr::tbl_df
generated by the read_..._table
functions listed above. find_bio_errors
will find dates that are unrealistic as well as duplicate entries for the same (Study.Id, Animal.Id) combination. These are returned as named list elements $error_dates
and $error_duplicates
. find_fert_errors
only scans the date/time fields for errors, since there are multiple fertility entries for some individuals. This is returned in a list with named element $error_dates
.
# Check the biography data for errors bio_errors <- find_bio_errors(lh) bio_errors$error_dates %>% data.frame() bio_errors$error_duplicates # Check the fertility data for errors fert_errors <- find_fert_errors(fert) fert_errors$error_dates
The function find_mom_id_errors
checks to see if all the animals listed in Mom.Id
in the biography table for a given study have a corresponding record in Animal.Id.
It is important to note that not all of these are errors! In some studies, the
mother can be known but excluded from the biography table for a variety of reasons.
It would be a good idea to double-check the cases listed below.
find_mom_id_errors(lh)
The function find_first_born_errors
checks for two kinds of errors regarding
first-born offspring.
first_born_errors <- find_first_born_errors(lh)
First, there should be a known Mom.Id (not blank or NA) for
any animal that has "N" or "Y" in the First.Born column because if the animal's
first-born status is known ("N", or "Y"), then the researchers must know the
mother. Cases that violate this rule are returned in the named list element
$unknown_mother_first_born
.
first_born_errors$unknown_mother_first_born
Second, any given female should have no more than one first-born offspring.
Multiple offspring that are identified as first-born but attributed to the
same female are returned in the named list element $multiple_first_born
.
first_born_errors$multiple_first_born
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.