Algorithm: Genome Uniqueness

Genome uniqueness is calculated through the use of a gene-drop simulation to estimate how frequently an animal will possess founder alleles not present in other members of the focal population, or present in a specified number or fewer.

The gene-drop simulation used by the web application is a vectorized version and is shown in the figure below. In an un-vectorized version, if 5000 gene-drop simulations are desired for the estimation process, the population had to be iterated over 5000 times. Since each iteration of the gene-drop is independent, the process can be vectorized so that each element of a vector represents 1 iteration of the gene-drop simulation. In the vectorized version, the population is iterated over once, regardless of the number of simulations desired. This drastically reduces the amount of time necessary for the program to run.

Overview

The basic steps of the gene-drop are:

  1. Each founder is assigned two unique alleles
  2. For each subsequent generation:
    a. Assign genotypes to each member of the generation
    • For each animal, find the genotypes of the parents, and select
      one allele from each parent randomly.

Once every animal has been assigned a genotype by mendelian inheritance tally the number of unique alleles possessed by each member of the focal population. In the case of this algorithm, we do allow the 'uniqueness' threshold to be adjusted so that an allele can be considered unique if it is possessed by N or fewer other members of the focal population.

Vectorized Gene-Drop Details

The vectorized gene-drop simulation follows the same basic process described above. The difference is that instead of dropping one allele at a time, and repeating the simulation N times, the vectorized version drops N independent alleles one time.

In the vectorized version, each animal has a vector of paternally inherited alleles and a vector of maternally inherited alleles. For each offspring, a random combination of these alleles is produced and dropped down to the offspring by the process below and shown in the following figure:

  1. To start the simulation, each founder is assigned two unique founding alleles.
    N-element vectors are created of these alleles, where N is the desired number of
    simulations. In the example below, this founder was assigned the unique founder
    alleles 1 & 2 and 5 simulations were desired.
  2. Each time alleles need to be dropped from parent to offspring, a unique
    transmission vector is created representing whether or not an allele
    was passed to that offspring. The vector is generated to contain a random combination
    of 0's and 1's. The animal's paternally inherited alleles are then multiplied by the
    transmission vector, while the maternally inherited alleles are multiplied by the
    compliment of the transmission vector.
  3. To generate the final set of alleles received by the offspring, the maternal and
    paternal allele vectors are added together.
  4. The result is a vector of alleles that this offspring has received from this parent.

Once allele vectors have been generated for every animal in the pedigree, the focal population can be subset out. Within this population of allele vectors, unique alleles can be determined:

For each position on the allele vectors (1:N) - Gather each animal's two alleles - If the number of other animals possessing that allele is equal to, or below the threshold, score the allele as unique (1) - Otherwise, score the allele as non-unique (0)

Once every position on each animal's two allele vector's has been scored, sum all of the scores for an animal and divide by the total number of alleles being considered (2 * number of simulations).

Generation of a vector of five gametes from one parent. Showing how the transmission vectors (row 2) determine which alleles are passed from the parental alleles or haplotypes (row 1) to form complementary vectors (row 3) that are combined by adding corresponding elements to form the final vector of transmitted alleles (row 4).



rmsharp/nprcmanager documentation built on April 24, 2021, 3:13 p.m.