Find and fix inconsistent repeat lengths
Attempts to fix inconsistent repeat lengths found by
fix_replen(gid, replen, e = 1e-05, fix_some = TRUE)
a numeric vector of repeat motif lengths.
a number to be subtracted or added to inconsistent repeat lengths to allow for proper rounding.
This function is modified from the version used in
Before being fed into the algorithm to calculate Bruvo's distance, the amplicon length is divided by the repeat unit length. Because of the amplified primer sequence attached to sequence repeat, this division does not always result in an integer and so the resulting numbers are rounded. The rounding also protects against slight mis-calls of alleles. Because we know that
((A - e) - (B - e))/r
is equivalent to
(A - B)/r
, we know that the primer sequence will not alter the relationships
between the alleles. Unfortunately for nucleotide repeats that have powers
of 2, rounding in R is based off of the IEC 60559 standard (see
round), that means that any number ending in 5 is rounded to
the nearest even digit. This function will attempt to alleviate this
problem by adding a very small amount to the repeat length so that division
will not result in a 0.5. If this fails, the same amount will be
subtracted. If neither of these work, a warning will be issued and it is up
to the user to determine if the fault is in the allele calls or the repeat
a numeric vector of corrected repeat motif lengths.
Zhian N. Kamvar
Zhian N. Kamvar, Meg M. Larsen, Alan M. Kanaskie, Everett M. Hansen, & Niklaus J. Grünwald. Sudden_Oak_Death_in_Oregon_Forests: Spatial and temporal population dynamics of the sudden oak death epidemic in Oregon Forests. ZENODO, http://doi.org/10.5281/zenodo.13007, 2014.
Kamvar, Z. N., Larsen, M. M., Kanaskie, A. M., Hansen, E. M., & Grünwald, N. J. (2015). Spatial and temporal analysis of populations of the sudden oak death pathogen in Oregon forests. Phytopathology 105:982-989. doi: 10.1094/PHYTO-12-14-0350-FI
Ruzica Bruvo, Nicolaas K. Michiels, Thomas G. D'Souza, and Hinrich Schulenburg. A simple method for the calculation of microsatellite genotype distances irrespective of ploidy level. Molecular Ecology, 13(7):2101-2106, 2004.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
data(nancycats) fix_replen(nancycats, rep(2, 9)) # Let's start with an example of a tetranucleotide repeat motif and imagine # that there are twenty alleles all 1 step apart: (x <- 1:20L * 4L) # These are the true lengths of the different alleles. Now, let's add the # primer sequence to them. (PxP <- x + 21 + 21) # Now we make sure that x / 4 is equal to 1:20, which we know each have # 1 difference. x/4 # Now, we divide the sequence with the primers by 4 and see what happens. (PxPc <- PxP/4) (PxPcr <- round(PxPc)) diff(PxPcr) # we expect all 1s # Let's try that again by subtracting a tiny amount from 4 (PxPc <- PxP/(4 - 1e-5)) (PxPcr <- round(PxPc)) diff(PxPcr)
Want to suggest features or report bugs for rdrr.io? Use the GitHub issue tracker.