R/calcGU.R

Defines functions calcGU

Documented in calcGU

#' Calculates genome uniqueness for each ID that is part of the population.
#'
## Copyright(c) 2017-2020 R. Mark Sharp
## This file is part of nprcgenekeepr
#' Part of Genetic Value Analysis
#'
#' The following functions calculate genome uniqueness according to the equation
#' described in Ballou & Lacy.
#'
#' It should be noted, however that this function differs slightly in that it
#' does not distinguish between founders and non-founders in calculating the
#' statistic.
#'
#' Ballou & Lacy describe genome uniqueness as "the proportion of simulations
#' in which an individual receives the only copy of a founder allele." We have
#' interpreted this as meaning that genome uniqueness should only be calculated
#' for living, non-founder animals. Alleles possessed by living founders are
#' not considered when calculating genome uniqueness.
#'
#' We have a differing view on this, since a living founder can still contribute
#' to the population.
#' The function below calculates genome uniqueness for all living animals
#' and considers all alleles. It does not ignore living founders and their
#' alleles.
#'
#' Our results for genome uniqueness will, therefore differ slightly from those
#' returned by Pedscope. Pedscope calculates genome uniqueness only for
#' non-founders and ignores the contribution of any founders in the population.
#' This will cause Pedscope's genome uniqueness estimates to possibly be
#' slightly higher for non-founders than what this function will calculate.
#'
#' The estimates of genome uniqueness for founders within the population
#' calculated by this function should match the "founder genome uniqueness"
#' measure calculated by Pedscope.
#'
#' @description {Genome Uniqueness Functions}{}
#'
#' @references Ballou JD, Lacy RC.  1995. Identifying genetically important
#' individuals for management of genetic variation in pedigreed populations,
#' p 77-111. In: Ballou JD, Gilpin M, Foose TJ, editors.
#' Population management for survival and recovery. New York (NY):
#' Columbia University Press.
#'
#'
#' @return Dataframe \code{rows: id, col: gu}
#'  A single-column table of genome uniqueness values as percentages.
#'  Rownames are set to 'id' values that are part of the population.
#'
#' @examples
#' \donttest{
#' library(nprcgenekeepr)
#' ped1Alleles <- nprcgenekeepr::ped1Alleles
#' gu_1 <- calcGU(ped1Alleles, threshold = 1, byID = FALSE, pop = NULL)
#' gu_2 <- calcGU(ped1Alleles, threshold = 3, byID = FALSE, pop = NULL)
#' gu_3 <- calcGU(ped1Alleles, threshold = 3, byID = FALSE,
#'                pop = ped1Alleles$id[20:60])
#' }
#'
#' @param alleles dataframe of containing an \code{AlleleTable}. This is a
#' table of allele information produced by \code{geneDrop()}.
#' An AlleleTable contains information about alleles an ego has inherited.
#' It contains the following columns:
#' \itemize{
#'  \item {id} {--- A character vector of IDs for a set of animals.}
#'  \item {parent} {--- A factor with levels of sire and dam.}
#'  \item {V1} {--- Unnamed integer column representing allele 1.}
#'  \item {V2} {--- Unnamed integer column representing allele 2.}
#'  \item {...} {--- Unnamed integer columns representing alleles.}
#'  \item {Vn} {--- Unnamed integer column representing the nth column.}}
#'
#' @param threshold an integer indicating the maximum number of copies of an
#' allele that can be present in the population for it to be considered rare.
#' Default is 1.
#' @param byID logical variable of length 1 that is passed through to
#' eventually be used by \code{alleleFreq()}, which calculates the count of each
#' allele in the provided vector. If \code{byID} is TRUE and ids are provided,
#' the function will only count the unique alleles for an individual
#' (homozygous alleles will be counted as 1).
#' @param pop character vector with animal IDs to consider as the population of
#' interest, otherwise all animals will be considered. The default is NULL.
#' @export
calcGU <- function(alleles, threshold = 1, byID = FALSE, pop = NULL) {
  if (!is.null(pop)) {
    alleles <- alleles[alleles$id %in% pop, ]
  }

  # Calculate the number of an individual's alleles that are rare in
  # each simulation and average across all simulated alleles.
  rare <- calcA(alleles, threshold, byID)
  iterations <- sum(!(colnames(alleles) %in% c("id", "parent")))
  gu <- rowSums(rare) / (2 * iterations)

  # convert to a percentage
  gu <- gu * 100
  gu <- as.data.frame(gu)

  return(gu)
}
rmsharp/nprcmanager documentation built on April 24, 2021, 3:13 p.m.