In rmsharp/nprcmanager: Genetic Tools for Colony Management

Algorithm: Breeding Group Formation

The group formation process is accomplished by using an algorithm for determining the maximal independent set (MIS). In graph theory, a maximal independent set is the largest set of vertices in a graph where no two share an edge. In breeding group formation, the vertices are animals, and the edges are the kinships that need to be considered. For a given group of animals and pairwise kinships, there are potentially many maximal independent sets, depending on which animals are included or excluded from the final group. In order to effectively sample the set of MISs, we use random selection of animals and repeat the MIS generation numerous times. This allows us to sample a number of MISs and then choose the one that best fits our selection criteria. For our purposes, we want the largest group that can be formed from this set of animals, where none have concerning relatedness to each other.

The algorithm requires several pieces of information:

The candidate animals
A matrix of pairwise kinships between candidate animals
The number of groups desired from the list of candidate animals
The number of simulations to run.
* This is equivalent to the number of random MISs to generate and compare.
Information on which inter-animal relationships (if any) should be ignored.

Data Pre-processing

Before the group formation algorithm begins generating MISs, the data is pre-processed to remove any animals and pairwise kinships that should not be considered.

Specifically:

The candidate animals provided are checked, and any that were designated as low-value by the genetic value analysis will be removed from further consideration.
* This behavior can be toggled off to allow low-value animals in the formation process
The pairwise kinship data is filtered down to only the kinship between candidate animals.
If an age threshold has been set, kinships involving animals below the threshold will be filtered out.
* This allows the algorithm to ignore young animals, as young animals typically go to whatever social group their dam does.
* By default, we ignore animals under 1 year of age
Pairwise kinships below the specified level will be filtered out.
* By default, we ignore relatedness more distant than 2nd cousin
Pairwise kinships between females will be filtered out
* This allows females of the same matriline to be part of the same group like they would be in the wild.
* This behavior can be toggled off to prevent relatedness between females.

Random Maximum Independent Set Generation

After any animals and relationships that should be ignored are removed from the dataset, the algorithm begins using the remaining animals and kinship information to generate potential groups.

The algorithm proceeds by the following steps:

For I iterations:
a. Generate N empty sets, where N is the desired number of groups to be created.
b. While there are candidate animals remaining:
i. Pick an animal A randomly from the set of candidate animals
ii. Choose a group G randomly from one of the N groups, and assign A to it
iii. Remove animal A from consideration for all N groups
iv. Remove all animals related to A from consideration from for group G
c. Score the groups that were generated
i. For our purposes, we calculate the average group size
d. If the score of the new groups is higher than groups that were previously generated, save the new groups.
Return the currently saved groups
a. This should be the best groups encountered in I iterations.