Correcting rare allele frequencies

Description

The following is a set of arguments for use in rraf, pgen, and psex to correct rare allele frequencies that were lost in estimating round-robin allele frequencies.

Arguments

e

a numeric epsilon value to use for all missing allele frequencies.

d

the unit by which to take the reciprocal. div = "sample" will be 1/(n samples), d = "mlg" will be 1/(n mlg), and d = "rrmlg" will be 1/(n mlg at that locus). This is overridden by e.

mul

a multiplier for div. Default is mul = 1. This parameter is overridden by e

sum_to_one

when TRUE, the original frequencies will be reduced so that all allele frequencies will sum to one. Default: FALSE

Details

By default (d = "sample", e = NULL, sum_to_one = FALSE, mul = 1), this will add 1/(n samples) to all zero-value alleles. The basic formula is 1/(d * m) unless e is specified. If sum_to_one = TRUE, then the frequencies will be scaled as x/sum(x) AFTER correction, indicating that the allele frequencies will be reduced. See the examples for details. The general pattern of correction is that the value of the MAF will be rrmlg > mlg > sample

Motivation

When calculating allele frequencies from a round-robin approach, rare alleles are often lost resulting in zero-valued allele frequencies (Arnaud-Haond et al. 2007, Parks and Werth 1993). This can be problematic when calculating values for pgen and psex because frequencies of zero will result in undefined values for samples that contain those rare alleles. The solution to this problem is to give an estimate for the frequency of those rare alleles, but the question of HOW to do that arises. These arguments provide a way to define how rare alleles are to be estimated/corrected.

Using these arguments

These arguments are for use in the functions rraf, pgen, and psex. They will replace the dots (...) that appear at the end of the function call. For example, if you want to set the minor allele frequencies to a specific value (let's say 0.001), regardless of locus, you can insert e = 0.001 along with any other arguments (note, position is not specific):

1
2
pgen(my_data, e = 0.001, log = FALSE) 
psex(my_data, method = "multiple", e = 0.001)

Author(s)

Zhian N. Kamvar

References

Arnaud-Haond, S., Duarte, C. M., Alberto, F., & SerrĂ£o, E. A. 2007. Standardizing methods to address clonality in population studies. Molecular Ecology, 16(24), 5115-5139.

Parks, J. C., & Werth, C. R. 1993. A study of spatial features of clones in a population of bracken fern, Pteridium aquilinum (Dennstaedtiaceae). American Journal of Botany, 537-544.

See Also

rraf, pgen, psex, rrmlg

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
## Not run: 

data(Pram)
#-------------------------------------

# If you set correction = FALSE, you'll notice the zero-valued alleles

rraf(Pram, correction = FALSE)

# By default, however, the data will be corrected by 1/n

rraf(Pram)

# Of course, this is a diploid organism, we might want to set 1/2n

rraf(Pram, mul = 1/2)

# To set MAF = 1/2mlg

rraf(Pram, d = "mlg", mul = 1/2)

# Another way to think about this is, since these allele frequencies were
# derived at each locus with different sample sizes, it's only appropriate to
# correct based on those sample sizes.

rraf(Pram, d = "rrmlg", mul = 1/2)

# If we were going to use these frequencies for simulations, we might want to
# ensure that they all sum to one. 

rraf(Pram, d = "mlg", mul = 1/2, sum_to_one = TRUE) 

#-------------------------------------
# When we calculate these frequencies based on population, they are heavily
# influenced by the number of observed mlgs. 

rraf(Pram, by_pop = TRUE, d = "rrmlg", mul = 1/2)

# This can be fixed by specifying a specific value

rraf(Pram, by_pop = TRUE, e = 0.01)


## End(Not run)

Want to suggest features or report bugs for rdrr.io? Use the GitHub issue tracker.