bruvo.dist: Bruvo's distance for microsatellites

View source: R/bruvo.r

bruvo.distR Documentation

Bruvo's distance for microsatellites

Description

Calculate the average Bruvo's distance over all loci in a population.

Usage

bruvo.dist(pop, replen = 1, add = TRUE, loss = TRUE, by_locus = FALSE)

bruvo.between(
  query,
  ref,
  replen = 1,
  add = TRUE,
  loss = TRUE,
  by_locus = FALSE
)

Arguments

pop

a genind or genclone object

replen

a vector of integers indicating the length of the nucleotide repeats for each microsatellite locus. E.g. a locus with a (CAT) repeat would have a replen value of 3. (Also see fix_replen)

add

if TRUE, genotypes with zero values will be treated under the genome addition model presented in Bruvo et al. 2004. See the Note section for options.

loss

if TRUE, genotypes with zero values will be treated under the genome loss model presented in Bruvo et al. 2004. See the Note section for options.

by_locus

indicator to get the results per locus. The default setting is by_locus = FALSE, indicating that Bruvo's distance is to be averaged over all loci. When by_locus = TRUE, a list of distance matrices will be returned.

query

a genind or genclone object

ref

a genind or genclone object

Details

Bruvo's distance between two alleles is calculated as

d = 1 - 2^{-\mid x \mid}

, where x is the number of repeat units between the two alleles (see the Algorithms and Equations vignette for more details). These distances are calculated over all combinations of alleles at a locus and then the minimum average distance between allele combinations is taken as the distance for that locus. All loci are then averaged over to obtain the distance between two samples. Missing data is ignored (in the same fashion as mean(c(1:9, NA), na.rm = TRUE)) if all alleles are missing. See the next section for other cases.

Polyploids

Ploidy is irrelevant with respect to calculation of Bruvo's distance. However, since it makes a comparison between all alleles at a locus, it only makes sense that the two loci need to have the same ploidy level. Unfortunately for polyploids, it's often difficult to fully separate distinct alleles at each locus, so you end up with genotypes that appear to have a lower ploidy level than the organism.

To help deal with these situations, Bruvo has suggested three methods for dealing with these differences in ploidy levels:

  • Infinite Model - The simplest way to deal with it is to count all missing alleles as infinitely large so that the distance between it and anything else is 1. Aside from this being computationally simple, it will tend to inflate distances between individuals.

  • Genome Addition Model - If it is suspected that the organism has gone through a recent genome expansion, the missing alleles will be replace with all possible combinations of the observed alleles in the shorter genotype. For example, if there is a genotype of [69, 70, 0, 0] where 0 is a missing allele, the possible combinations are: [69, 70, 69, 69], [69, 70, 69, 70], [69, 70, 70, 69], and [69, 70, 70, 70]. The resulting distances are then averaged over the number of comparisons.

  • Genome Loss Model - This is similar to the genome addition model, except that it assumes that there was a recent genome reduction event and uses the observed values in the full genotype to fill the missing values in the short genotype. As with the Genome Addition Model, the resulting distances are averaged over the number of comparisons.

  • Combination Model - Combine and average the genome addition and loss models.

As mentioned above, the infinite model is biased, but it is not nearly as computationally intensive as either of the other models. The reason for this is that both of the addition and loss models requires replacement of alleles and recalculation of Bruvo's distance. The number of replacements required is equal to n^k where where n is the number of potential replacements and k is the number of alleles to be replaced. To reduce the number of calculations and assumptions otherwise, Bruvo's distance will be calculated using the largest observed ploidy in pairwise comparisons. This means that when comparing [69,70,71,0] and [59,60,0,0], they will be treated as triploids.

Value

an object of class dist or a list of these objects if by_locus = TRUE

Functions

  • bruvo.between(): Bruvo's distance between a query and a reference Only diferences between query individuals and reference individuals will be reported All other values are NaN

Note

Do not use missingno with this function.

Missing alleles and Bruvo's distance in poppr versions < 2.5

In earlier versions of poppr, the authors had assumed that, because the calculation of Bruvo's distance does not rely on orderd sets of alleles, the imputation methods in the genome addition and genome loss models would also assume unordered alleles for creating the hypothetical genotypes. This means that the results from this imputation did not consider all possible combinations of alleles, resulting in either an over- or under- estimation of Bruvo's distance between two samples with two or more missing alleles. This version of poppr considers all possible combinations when calculating Bruvo's distance for incomplete genotype with a negligable gain in computation time.

If you want to see the effect of this change on your data, you can use the global poppr option old.bruvo.model. Currently, this option is FALSE and you can set it by using options(old.bruvo.model = TRUE), but make sure to reset it to FALSE afterwards.

Repeat Lengths (replen)

The replen argument is crucial for proper analysis of Bruvo's distance since the calculation relies on the knowledge of the number of steps between alleles. To calculate Bruvo's distance, your raw allele calls are first divided by the repeat lengths and then rounded. This can create a problem with repeat lengths of even size due to the IEC 60559 standard that says rounding at 0.5 is to the nearest even number, meaning that it is possible for two alleles that are one step apart may appear to be exactly the same. This can be fixed by subtracting a tiny number from the repeat length with the function fix_replen. Please consider using this before running Bruvo's distance.

Model Choice

The add and loss arguments modify the model choice accordingly:

  • Infinite Model: add = FALSE, loss = FALSE

  • Genome Addition Model: add = TRUE, loss = FALSE

  • Genome Loss Model: add = FALSE, loss = TRUE

  • Combination Model (DEFAULT): add = TRUE, loss = TRUE

Details of each model choice are described in the Details section, above. Additionally, genotypes containing all missing values at a locus will return a value of NA and not contribute to the average across loci.

Repeat Lengths

If the user does not provide a vector of appropriate length for replen , it will be estimated by taking the minimum difference among represented alleles at each locus. IT IS NOT RECOMMENDED TO RELY ON THIS ESTIMATION.

Author(s)

Zhian N. Kamvar

David Folarin

References

Ruzica Bruvo, Nicolaas K. Michiels, Thomas G. D'Souza, and Hinrich Schulenburg. A simple method for the calculation of microsatellite genotype distances irrespective of ploidy level. Molecular Ecology, 13(7):2101-2106, 2004.

See Also

fix_replen, test_replen, bruvo.boot, bruvo.msn

Examples

# Please note that the data presented is assuming that the nancycat dataset 
# contains all dinucleotide repeats, it most likely is not an accurate
# representation of the data.

# Load the nancycats dataset and construct the repeat vector.
data(nancycats)
names(alleles(nancycats)) <- locNames(nancycats) # small bug in this data set
# Assume the alleles are all dinucleotide repeats.
ssr <- rep(2, nLoc(nancycats))
test_replen(nancycats, ssr)         # Are the repeat lengths consistent?
(ssr <- fix_replen(nancycats, ssr)) # Nope. We need to fix them.

# Analyze the first population in nancycats
bruvo.dist(popsub(nancycats, 1), replen = ssr)

## Not run: 

# get the per locus estimates:
bruvo.dist(popsub(nancycats, 1), replen = ssr, by_locus = TRUE)

# View each population as a heatmap.
sapply(popNames(nancycats), function(x) 
heatmap(as.matrix(bruvo.dist(popsub(nancycats, x), replen = ssr)), symm=TRUE))

## End(Not run)

poppr documentation built on March 31, 2023, 7:15 p.m.