knitr::opts_chunk$set(echo = TRUE,include = TRUE, results = "markup", cache = FALSE)
pdf.options(useDingbats = TRUE)
start_time <- proc.time()
library(stringi)
#library(rmsutilityr)
suppressMessages(library(ggplot2))
library(knitr)
library(nprcgenekeepr)
set_seed(1)

Introduction

This tutorial demonstrates the major functions used within the Shiny application provided by the nprcgenekeepr package and provides sufficient insight into those functions that they may be used independently.

This tutorial is primarily directed toward someone with experience using R who wants to better understand how the Shiny application works or to perform some actions not directly supported by the Shiny application.

Please provide any comments, questions, or bug reports through the GitHub issue tracker at \url{https://github.com/rmsharp/nprcgenekeepr/issues}.

Installation and Help

You can install nprcgenekeepr from GitHub with the following code.

install.packages(nprcgenekeepr)
## Use the following code to get the development version
# install.packages("devtools")
# devtools::install_github("rmsharp/nprcgenekeepr")

All missing dependencies should be automatically installed.

You can get help from the R console with

?nprcgenekeepr

The help provided by this (nprcgenekeepr.R) needs to be more complete and include links to the tutorials.

Reading in a Pedigree

A pedigrees can be imported using either Excel worksheets or text files that contain all of the pedigree information or using either Excel worksheets or text files that contain a list of focal animals with the remainder of the pedigree information is pulled in through the LabKey API.

This tutorial will use a pedigree file that can be created using the makeExamplePedigreeFile function as shown below. The function makeExamplePedigreeFile both saves a file and returns the full path name to the saved file, which we are saving into the variable pedigreeFile. Note: the user will select where to store the file.

library(nprcgenekeepr)
pedigreeFile <- makeExamplePedigreeFile()

This writes ExamplePedigree.csv to a place you select within your file system.

pedigreeFile <- "../inst/extdata/ExamplePedigree.csv"

You use the file name provided by the makeExamplePedigreeFile function to tell read.table what file to read.

breederPedCsv <- read.table(pedigreeFile, sep = ",", header = TRUE,
                            stringsAsFactors = FALSE)

Note the number of rows read. Each row represents an individual within the pedigree.

nrow(breederPedCsv)

The next step is to put the information read from the file into a pedigree object. This is done with the qcStudbook function, which examines the file contents and tests for common pedigree errors.

You can see the errors that can be detected by qcStudbook by returning the empty error list with getEmptyErrorLst(). We are not showing the output of the function call now because later in this tutorial we will explore errors in more depth.

qcStudbook can take four arguments sb, minParentAge (in years), reportChanges, and reportErrors. However, all but sb have default values and only the sb argument is required.

It is prudent to ensure that parents are at least of breeding age, which is species specific. I have used a minParentAge of 2 years.[^1]

[^1]: Setting the minParentAge to 3.5 and above will cause an error along with the creation of a file ~/lowParentAge.csv that will list the parents with low age at the birth of an offspring.

breederPed <- qcStudbook(breederPedCsv, minParentAge = 2)

If qcStudbook reports an error, change the call by adding the reportErrors argument set to TRUE and examine the returned object. More on this is presented in the Pedigree Errors section.

Identifying Focal Animals

You may want to focus your work on a focal group of animals. This can be done by reading in a list of animal IDs that make up the focal group and use that list to update the pedigree. Alternatively you can created a list of animal IDs based on criteria you have selected.

For example, to select living animals at the facility with at least one parent, the following can be used.

focalAnimals <- breederPed$id[!(is.na(breederPed$sire) & 
                                  is.na(breederPed$dam)) &
                                is.na(breederPed$exit)]
print(stri_c("There are ", length(focalAnimals), 
             " animals in the vector _focalAnimals_."))

As can be seen, these animals have at least one parent and have not left the facility.

breederPed[breederPed$id %in% 
             focalAnimals, c("id", "sire", "dam", "exit")][1:10, ]

We indicate that these are the animals of interest by using the setPopulation function. This function simply sets a column named population[^2] to the logical value of TRUE if the row represents an animal in the list and FALSE otherwise.

[^2]: The population column is created and added to the pedigree object if it does not already exist.

The first line of code below sets the population column and the second counts the number of rows where the value was set to TRUE.

breederPed <- setPopulation(ped = breederPed, ids = focalAnimals)
nrow(breederPed[breederPed$population, ])

The IDs used to populate the population flag can be used to trim the pedigree so that it contains only those individuals who are in the ID list or are ancestors of those individuals.

trimmedPed <- trimPedigree(focalAnimals, breederPed)
nrow(breederPed); nrow(trimmedPed)

The trimPedigree function has the ability to remove those ancestors that do not contribute genetic information. Uninformative founders are those individuals who are parents of only one individual and who have no parental information. (Currently genotypic information is ignored by trimPedigree).

trimmedPedInformative <- trimPedigree(focalAnimals, breederPed,
                                      removeUninformative = TRUE)
nrow(trimmedPedInformative)

We can find all of the animals that are in the trimmed pedigree but are not focal animals.

nonfocalInTrimmedPed <- trimmedPed$id[!trimmedPed$id %in% focalAnimals]
length(nonfocalInTrimmedPed)

We can see which of these r length(nonfocalInTrimmedPed) are and are not parents. We will first make sure we have all of the parents by getting our list of parents from the entire pedigree. We then demonstrate that they are all in the trimmed pedigree.

allFocalParents <- c(breederPed$sire[breederPed$id %in% focalAnimals], 
                       breederPed$dam[breederPed$id %in% focalAnimals])
trimmedFocalParents <- c(trimmedPed$sire[trimmedPed$id %in% focalAnimals], 
                       trimmedPed$dam[trimmedPed$id %in% focalAnimals])
all.equal(allFocalParents, trimmedFocalParents) # Are the IDs the same?

However, not all of the animals in the trimmed pedigree are either the focal animals or their parents. They are more distant ancestors as we will show.

notFocalNotParent <- 
  trimmedPed$id[!trimmedPed$id %in% c(focalAnimals, allFocalParents)]
length(notFocalNotParent)
allFocalGrandParents <- c(breederPed$sire[breederPed$id %in% allFocalParents],
                          breederPed$dam[breederPed$id %in% allFocalParents])
## Not all parents are known so the unknown individuals (NA) are removed.
allFocalGrandParents <- allFocalGrandParents[!is.na(allFocalGrandParents)] 
trimmedFocalGrandParents <- 
  c(trimmedPed$sire[trimmedPed$id %in% allFocalParents],
    trimmedPed$dam[trimmedPed$id %in% allFocalParents])
trimmedFocalGrandParents <- 
  trimmedFocalGrandParents[!is.na(trimmedFocalGrandParents)] 
all.equal(allFocalGrandParents, trimmedFocalGrandParents)

Since the trimming process is supposed to retain the focal animals and their ancestors, we will leave it as an exercise for you to demonstrate that at least some of the remaining animals are grandparents of the focal animals. Hint: there are r length(trimmedFocalGrandParents) grandparents in both the trimmed and the complete pedigree.

trimmedPedInformative <- trimPedigree(focalAnimals, breederPed,
                                            removeUninformative = TRUE)
uninformative <- trimmedPed$id[!trimmedPed$id %in% trimmedPedInformative$id]
notFocalInTrimmedPed <- trimmedPed$id[!trimmedPed$id %in% focalAnimals]
additionalAnimals <- nrow(trimmedPed) - length(focalAnimals)
geneticallyInformative <- 
  nrow(trimPedigree(focalAnimals, breederPed, 
                                 removeUninformative = TRUE)) - 
  length(focalAnimals)

As you can see from the number of rows in the full pedigree (r nrow(breederPed)) versus the trimmed pedigree (r nrow(trimmedPed)), trimmed pedigrees can be much smaller. Of the additional r additionalAnimals animals, r geneticallyInformative provide genetic information while the others (r length(uninformative)) are genetically uninformative.

unknownBirth <- breederPed$id[is.na(breederPed$birth)]
unknownBirthOrExit <- 
  breederPed$id[is.na(breederPed$birth) | !is.na(breederPed$exit)]
knownPed <- breederPed[!breederPed$id %in% unknownBirthOrExit, ]
otherIds <- knownPed$id[!knownPed$id %in% trimmedPed$id[is.na(trimmedPed$exit)]]

As is shown below only r length(otherIds) (r get_and_or_list(otherIds)) living animals are still in the colony but are not in the trimmed pedigree.[^3]

[^3]: All animals within the colony have a known birth date.

unknownBirth <- breederPed$id[is.na(breederPed$birth)]
knownExit <- breederPed$id[ !is.na(breederPed$exit)]
unknownBirthKnownExit <- 
  breederPed$id[is.na(breederPed$birth) | !is.na(breederPed$exit)]
knownPed <- breederPed[!breederPed$id %in% unknownBirthKnownExit, ]
otherIds <- knownPed$id[!knownPed$id %in% trimmedPed$id[is.na(trimmedPed$exit)]]
print(stri_c("The living animals in the pedigree that are not in the trimmed ",
             "pedigree are ", get_and_or_list(otherIds), "."))

Age Sex Pyramid Plot

You can examine the population structure using an age-sex pyramid plot with a single function. We will limit our view to just the focal animals and their living relatives. This is appropriate for colony management because in addition to the genetic diversity we seek, we have to remain cognizant of the age and sex distributions within the colonies we manage.

getPyramidPlot(ped = trimmedPed[is.na(trimmedPed$exit), ])

Genetic Value Analysis

Your genetic value analysis must be carefully performed. The next three commands set up the entire pedigree for analysis. The first of these three commands set all of the pedigree members to be part of the population of interest by setting the population column to TRUE for all individuals.

ped <- setPopulation(breederPed, NULL)

Note that a new pedigree object (ped) is being created.

probands <- ped$id[ped$population]
ped <- trimPedigree(probands, ped, removeUninformative = FALSE,
                    addBackParents = FALSE)

The arguments to reportGV are all optional except for ped, but you may often want to non-default values.

geneticValue <- reportGV(ped, guIter = 50,
                         guThresh = 3,
                         byID = TRUE,
                         updateProgress = NULL)
summary(geneticValue)

What happens if we limit our analysis to the trimmed pedigree? Remember that the trimmed pedigree still contains all of the genetic information that the full pedigree has for the focal animals.

trimmedGeneticValue <- reportGV(trimmedPed, guIter = 50,
                         guThresh = 3,
                         byID = TRUE,
                         updateProgress = NULL)
summary(trimmedGeneticValue)

It is clear that limiting your analysis to the animals of interest reduces the effort required to examine the animals of interest.

Detailed look at the Genetic Value Report object

The names of the object within the genetic value report object (trimmedGeneticValue) can be listed as shown in the next line of code.

names(trimmedGeneticValue)

The report object (an R dataframe) can in-turn be examined.

names(trimmedGeneticValue$report) ## column names
nrow(trimmedGeneticValue$report) ## Number of rows

The report is more conveniently used as a separate object. The next section of code rounds some of the numerical values and converts all columns to characters for display as a table where only the first 10 lines are included.

rpt <- trimmedGeneticValue[["report"]]
rpt$indivMeanKin <- round(rpt$indivMeanKin, 5)
rpt$zScores <- round(rpt$zScores, 2)
rpt$gu <- round(rpt$gu, 5)
rpt <- toCharacter(rpt)
names(rpt) <- headerDisplayNames(names(rpt))
knitr::kable(rpt[1:10, ]) # needs more work for display purposes.

We start the next lines of code by getting a fresh copy of the genetic value report since we changed all of the numeric values to characters in the last section to print the table. These lines demonstrate one way of extracting the component objects from the genetic value object.

rpt <- trimmedGeneticValue[["report"]]
kmat <- trimmedGeneticValue[["kinship"]]
f <- trimmedGeneticValue[["total"]]
mf <- trimmedGeneticValue[["maleFounders"]]
ff <- trimmedGeneticValue[["femaleFounders"]]
nmf <- trimmedGeneticValue[["nMaleFounders"]]
nff <- trimmedGeneticValue[["nFemaleFounders"]]
fe <- trimmedGeneticValue[["fe"]]
fg <- trimmedGeneticValue[["fg"]]

It is informative to examine the distribution of genetic uniqueness, mean kinship, and z-scores (normalized mean kinship values).

Creation of the boxplot for the genetic uniqueness values is shown below.

gu <- rpt[, "gu"]
guBox <- ggplot(data.frame(gu = gu), aes(x = "", y = gu)) +
  geom_boxplot(
    color = "darkblue",
    fill = "lightblue",
    notch = TRUE,
    outlier.color = "red",
    outlier.shape = 1
  ) +
  theme_classic() + geom_jitter(width = 0.2) + coord_flip() +
  ylab("Score")  + ggtitle("Genetic Uniqueness")
print(guBox)

Extraction of the individual mean kinship values and their corresponding z-scores is shown in the next code chunk.

mk <- rpt[, "indivMeanKin"]
zs <- rpt[, "zScores"]

Creation of boxplots for the mean kinship and z-scores is left as an exercise.

Breeding Group Formation

The primary purpose of nprcgenekeepr is to form breeding groups according to our best information regarding maintaining the genetic characteristics we desire and the realities associated with other animal husbandry needs.

There are several options you must consider when forming groups using nprcgenekeepr, which we will examine using code below.

You decisions regarding each of the above options are expressed in a call to the function groupAddAssign. A complete description of the function and its arguments is available using the code shown below.

?groupAddAssign

Below is the descriptions of the function parameters extracted from the documentation near the time this tutorial was prepared.

We will use the trimmedPed pedigree in our code.

For illustration purposes we are going to keep the numbers of candidates, groups, and iterations fairly small.

We will get first some animal IDs to use for our candidates by selecting animals at least 2 years old at the time this pedigree was sampled (01-01-2015).

candidates <- trimmedPed$id[trimmedPed$birth < as.Date("2013-01-01") &
                              !is.na(trimmedPed$birth) &
                              is.na(trimmedPed$exit)]
table(trimmedPed$sex[trimmedPed$id %in% candidates])

Our candidates are made up of r table(trimmedPed$sex[trimmedPed$id %in% candidates])[[1]] females and r table(trimmedPed$sex[trimmedPed$id %in% candidates])[[2]] males. The parameters currentGroups, threshold, ignore, minAge, sexRatio, withKin, and updateProgress are allowed to take their default values. The setting of the sexRatio parameter to 0 is ignored in the following call of the groupAddAssign function. This is consistent with the a value of 0 making little since in a breeding colony.

The empty seventh group at the bottom is evidence that all of the candidate animals could be placed in a group without exceeding the default value of 0.015625.

Harems

The following group assignments will be forming harem groups. This is done by setting harem to \code{TRUE}. Setting iter to 100 or more will increase optimal composition of breeding groups

haremGrp <- groupAddAssign(candidates = candidates,
                     kmat = trimmedGeneticValue[["kinship"]],
                     ped = trimmedPed,
                     iter = 10,
                     numGp = 6,
                     harem = TRUE)
haremGrp$group

We can identify and list the males in each group with the following code.

sapply(haremGrp$group, function(ids) {
  ids[ids %in% trimmedPed$id[trimmedPed$sex == "M"]]})

It is easy to notice that the male is listed first within each breeding group.

We can also see the number of animals and the sex ratios created in each group. Since these are harem groups the sex ratios are determined by the number of animals in the individual groups.

lines <- sapply(haremGrp$group, function(ids) {
  paste0("Count: ", length(ids), " Sex Ratio: ", 
         round(calculateSexRatio(ids, trimmedPed), 2))})
for (line in lines) print(line)

Examination of this table shows that of the r table(trimmedPed$sex[trimmedPed$id %in% candidates])[[1]] females r (sum(sapply(haremGrp$group[-1], function(ids) length(ids))) - length(haremGrp$group[-1])) are included.

Controlling Sex Ratios

The following group assignments will be forming harem groups. This is done by setting harem to \code{TRUE}.

sexRatioGrp <- groupAddAssign(candidates = candidates,
                     kmat = trimmedGeneticValue[["kinship"]],
                     ped = trimmedPed,
                     iter = 10,
                     numGp = 6,
                     sexRatio = 9)
sexRatioGrp$group

Again we can identify and list the males in each group with the following code.

sapply(sexRatioGrp$group, function(ids) {
  ids[ids %in% trimmedPed$id[trimmedPed$sex == "M"]]})

We can also see the number of animals and the sex ratios created in each group.

lines <- sapply(sexRatioGrp$group, function(ids) {
  paste0("Count: ", length(ids), " Sex Ratio: ", 
         round(calculateSexRatio(ids, trimmedPed), 2))})
for (line in lines) print(line)

Examination of this table shows that of the r table(trimmedPed$sex[trimmedPed$id %in% candidates])[[1]] females r (sum(sapply(sexRatioGrp$group[-1], function(ids) length(ids))) - length(sexRatioGrp$group[-1])) are included.

Pedigree Errors

As stated earlier you can see which types of errors are detected by qcStudbook by looking at names returned by getEmptyErrorLst() as shown below.

names(getEmptyErrorLst())

Each is defined below.

errorTypes <- names(getEmptyErrorLst())
errorDescriptions <- c(
  "Database connection failed: configuration or permissions are invalid",
  "Columns that must be within the pedigree file are missing.",
  "Values, which are supposed to be dates, cannot be interpreted as a date.",
  "Parents were too young on the date of birth of to have been the parent.",
  "Individuals listed as female or hermaphroditic and as a sire.",
  "Individuals are listed as male and as a dam.", 
  "Individuals who are listed as both a sire and a dam.",
  "IDs listed more than once.",
  "Fatal Errors.",
  stri_c("Columns that have been changed to conform to internal naming ",
  "conventions and what they were changed to.")
)
errorTbl <- data.frame(`Error` = errorTypes, Definition = errorDescriptions, 
                       stringsAsFactors = FALSE)
knitr::kable(errorTbl) 

We are going to use the small imaginary pedigree listed below that has multiple errors to discuss pedigree error detection and reporting. First note the birth dates of ego_id o4 (2006-04-13) and the purported sire s2 (2006-06-19). Note also the purported birth date of the d2 and the birth dates of her offspring. Obviously dates or IDs may be incorrect.

This is the pedigree. (We will discuss the column names shortly.)

knitr::kable(nprcgenekeepr::pedOne)

If we try to convert this pedigree file into the standardized studbook format, we are going to get an error message and the creation of a file in the R sessions temporary directory that lists the records that have generated the errors.

pedOne <- nprcgenekeepr::pedOne # put it in the local environment
ped <- qcStudbook(pedOne, minParentAge = 0)

The contents of lowParentAge.csv is shown below.

lowParentAge <- read.csv(paste0(tempdir(),"/lowParentAge.csv"))
knitr::kable(lowParentAge)

Examination of the ages of the parents reveals the issues being reported.

We can remove the errors by changing the birth dates of o4 from 2006-04-13 to 2015-09-16 and of d2 from 2015-09-16 to 2006-04-13.

pedOne$birth_date[pedOne$ego_id == "o4"] <- as.Date("2015-09-16")
pedOne$birth_date[pedOne$ego_id == "d2"] <- as.Date("2006-04-13")

Note the changes made to the column names between the original pedOne pedigree and the pedigree (ped) we get from qcStudbook. We have chosen to limit the displayed pedigree by selecting the ego_id's and id's in pedOne and ped respectively.

ped <- qcStudbook(pedOne, minParentAge = 0)
ped[ped$id %in% c("s2", "d2", "o3", "o4"), ]

However, the preferred method of creating the standardized studbook format with qcStudbook is to examine all errors found and correcting them before proceeding. This is done by explicitly requesting that all aspects inconsistent with the studbook format be identified by setting reportChanges and reportErrors to \code{TRUE}.

errorList <- qcStudbook(pedOne, minParentAge = 0, reportChanges = TRUE, 
                        reportErrors = TRUE)
summary(errorList)

We will discuss each of these newly identified errors in a moment, however, let's look at shortening this report, because often you will not be interested in the more trivial changes to the column names made by qcStudbook and in those cases you simply choose not to report changes to the column names as is shown here by setting reportChanges to \code{FALSE}. For this illustration, we are going to bring back the original copy of pedOne to see how the suspicious parents are reported by the summary function.

pedOne <- nprcgenekeepr::pedOne
errorList <- qcStudbook(pedOne, minParentAge = 0, reportChanges = FALSE, 
                        reportErrors = TRUE)
options(width = 90)
summary(errorList)

The first two errors mentioned are of particular interest. Currently qcStudbook automatically changes the sex of dams to F (female) and sires to M (male) when reportErrors is set to \code{FALSE}.

Genetic Loops

This feature is not supported within the Shiny application and is not fully implemented.

To use the findLoops function run the following code and select a pedigree as your input file that has a loop in it. We are continuing to use the example pedigree that comes with the software Example_Pedigree.csv.

exampleTree <- createPedTree(breederPed)
exampleLoops <- findLoops(exampleTree)

You can count how many loops you have with the following code.

length(exampleLoops[exampleLoops == TRUE])
nLoops <- countLoops(exampleLoops, exampleTree)
sum(unlist(nLoops[nLoops > 0]))

You can list the first 10 sets of ids, sires and dams in loops with the following line of code:

examplePedigree[exampleLoops == TRUE, c("id", "sire", "dam")][1:10, ]
elapsed_time <- get_elapsed_time_str(start_time)

The current date and time is r Sys.time(). The processing time for this document was r elapsed_time.

sessionInfo()


rmsharp/nprcmanager documentation built on April 24, 2021, 3:13 p.m.