knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "README-"
)

options(dplyr.width = Inf)

Back to Respository

Functions for working with the PLHDB data tables

Prepare workspace

  Sys.setenv(TZ = 'UTC')

  library(plhdbR)
  load_plhdb_packages()

Reading data

The functions read_bio_table and read_fert_table read csv files of biography and fertility data, respectively, created by the download buttons for these tables on the PLHDB website. These functions strip away blank lines and header lines, parse any date/time columns, and return a well-ordered dplyr::tbl_df, an extension of R's data.frame. To pull all the data from a given table, use search criteria like 'Study.ID != 10'. Note that the data are not extensively error-checked at this stage. If you try to feed these functions a normal csv file, bad things might happen.

Biography data

  # Assuming your file is called "biography_2015_05_20.csv"
  lh <- read_bio_table("../data/biography_2015_05_20.csv")
  summary(lh)

Fertility data

  # Assuming your file is called "fertility_2015_05_20.csv"
  fert <- read_fert_table("../data/fertility_2015_05_20.csv")
  summary(fert)

Error-checking data

Problems with dates and duplicate animals

The functions find_bio_errors and find_fert_errors scan the loaded biography and fertility data, respectively, looking for errors of various kinds. Each function takes as an argument the relevant dplyr::tbl_df generated by the read_..._table functions listed above. find_bio_errors will find dates that are unrealistic as well as duplicate entries for the same (Study.Id, Animal.Id) combination. These are returned as named list elements $error_dates and $error_duplicates. find_fert_errors only scans the date/time fields for errors, since there are multiple fertility entries for some individuals. This is returned in a list with named element $error_dates.

  # Check the biography data for errors
  bio_errors <- find_bio_errors(lh)

  bio_errors$error_dates %>% data.frame()

  bio_errors$error_duplicates

  # Check the fertility data for errors
  fert_errors <- find_fert_errors(fert)

  fert_errors$error_dates

Problems with the Mom.Id field in the biography table

The function find_mom_id_errors checks to see if all the animals listed in Mom.Id in the biography table for a given study have a corresponding record in Animal.Id. It is important to note that not all of these are errors! In some studies, the mother can be known but excluded from the biography table for a variety of reasons. It would be a good idea to double-check the cases listed below.

  find_mom_id_errors(lh)

Problems with animals that are supposed to be first-born offspring

The function find_first_born_errors checks for two kinds of errors regarding first-born offspring.

  first_born_errors <- find_first_born_errors(lh)

First, there should be a known Mom.Id (not blank or NA) for any animal that has "N" or "Y" in the First.Born column because if the animal's first-born status is known ("N", or "Y"), then the researchers must know the mother. Cases that violate this rule are returned in the named list element $unknown_mother_first_born.

  first_born_errors$unknown_mother_first_born

Second, any given female should have no more than one first-born offspring. Multiple offspring that are identified as first-born but attributed to the same female are returned in the named list element $multiple_first_born.

  first_born_errors$multiple_first_born


camposfa/plhdbR documentation built on May 13, 2019, 11:02 a.m.