Find and fix inconsistent repeat lengths

Share:

Description

Attempts to fix inconsistent repeat lengths found by test_replen

Usage

1
fix_replen(gid, replen, e = 1e-05, fix_some = TRUE)

Arguments

gid

a genind or genclone object

replen

a numeric vector of repeat motif lengths.

e

a number to be subtracted or added to inconsistent repeat lengths to allow for proper rounding.

fix_some

if TRUE (default), when there are inconsistent repeat lengths that cannot be fixed by subtracting or adding e, those than can be fixed will. If FALSE, the original repeat lengths will not be fixed.

Details

This function is modified from the version used in http://dx.doi.org/10.5281/zenodo.13007.
Before being fed into the algorithm to calculate Bruvo's distance, the amplicon length is divided by the repeat unit length. Because of the amplified primer sequence attached to sequence repeat, this division does not always result in an integer and so the resulting numbers are rounded. The rounding also protects against slight mis-calls of alleles. Because we know that

((A - e) - (B - e))/r

is equivalent to

(A - B)/r

, we know that the primer sequence will not alter the relationships between the alleles. Unfortunately for nucleotide repeats that have powers of 2, rounding in R is based off of the IEC 60559 standard (see round), that means that any number ending in 5 is rounded to the nearest even digit. This function will attempt to alleviate this problem by adding a very small amount to the repeat length so that division will not result in a 0.5. If this fails, the same amount will be subtracted. If neither of these work, a warning will be issued and it is up to the user to determine if the fault is in the allele calls or the repeat lengths.

Value

a numeric vector of corrected repeat motif lengths.

Author(s)

Zhian N. Kamvar

References

Zhian N. Kamvar, Meg M. Larsen, Alan M. Kanaskie, Everett M. Hansen, & Niklaus J. Grünwald. Sudden_Oak_Death_in_Oregon_Forests: Spatial and temporal population dynamics of the sudden oak death epidemic in Oregon Forests. ZENODO, http://doi.org/10.5281/zenodo.13007, 2014.

Kamvar, Z. N., Larsen, M. M., Kanaskie, A. M., Hansen, E. M., & Grünwald, N. J. (2015). Spatial and temporal analysis of populations of the sudden oak death pathogen in Oregon forests. Phytopathology 105:982-989. doi: 10.1094/PHYTO-12-14-0350-FI

Ruzica Bruvo, Nicolaas K. Michiels, Thomas G. D'Souza, and Hinrich Schulenburg. A simple method for the calculation of microsatellite genotype distances irrespective of ploidy level. Molecular Ecology, 13(7):2101-2106, 2004.

See Also

test_replen bruvo.dist bruvo.msn bruvo.boot

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
data(nancycats)
fix_replen(nancycats, rep(2, 9))
# Let's start with an example of a tetranucleotide repeat motif and imagine
# that there are twenty alleles all 1 step apart:
(x <- 1:20L * 4L)
# These are the true lengths of the different alleles. Now, let's add the
# primer sequence to them. 
(PxP <- x + 21 + 21)
# Now we make sure that x / 4 is equal to 1:20, which we know each have
# 1 difference.
x/4
# Now, we divide the sequence with the primers by 4 and see what happens.
(PxPc <- PxP/4)
(PxPcr <- round(PxPc))
diff(PxPcr) # we expect all 1s

# Let's try that again by subtracting a tiny amount from 4
(PxPc <- PxP/(4 - 1e-5))
(PxPcr <- round(PxPc))
diff(PxPcr)

Want to suggest features or report bugs for rdrr.io? Use the GitHub issue tracker.