Probability of encountering a genotype more than once by chance
1 2 
gid 
a genind or genclone object. 
pop 
either a formula to set the population factor from the

by_pop 
When this is 
freq 
a vector or matrix of allele frequencies. This defaults to

G 
an integer vector specifying the number of observed genets. If NULL,
this will be the number of original multilocus genotypes for

method 
which method of calculating psex should be used? Using

... 
options from correcting rare alleles. The default is to correct allele frequencies to 1/n 
Psex is the probability of encountering a given genotype more than once by chance. The basic equation from Parks and Werth (1993) is
psex = 1  (1  pgen)^G
where G is the number of multilocus genotypes and pgen is the
probability of a given genotype (see
pgen
for its calculation). For a given value of alpha (e.g.
alpha = 0.05), genotypes with psex < alpha can be thought of as a single
genet whereas genotypes with psex > alpha do not have strong evidence that
members belong to the same genet (Parks and Werth, 1993).
When method = "multiple"
, the method from ArnaudHaond et al. (1997)
is used where the sum of the binomial density is taken.
psex = dbinom(i, N, pgen)
where N is the number of sampling units i is the ith  1 encounter of a given genotype, and pgen is the value of pgen for that genotype. This procedure is performed for all samples in the data. For example, if you have a genotype whose pgen value was 0.0001, with 5 observations out of 100 samples, the value of psex is computed like so:
1  dbinom(0:4, 100, 0.0001)

It is possible to modify G
for single or multiple encounters. With
method = "single"
, G
takes place of the exponent, whereas
with method = "multiple"
, G
replaces N
(see above).
If you supply a named vector for G
with the population names and
by_pop = TRUE
, then the value of G
will be different for each
population.
For example, in the case of method = "multiple"
, let's say you have
two populations that share a genotype between them. The size of population
A and B are 25 and 75, respectively, The values of pgen for that genotype
in population A and B are 0.005 and 0.0001, respectively, and the number of
samples with the genotype in popualtions A and B are 4 and 6, respectively.
In this case psex for this genotype would be calculated for each population
separately if we don't specify G
:
1 2 
If we specify G = 100
, then it changes to:
1 2 
We could also specify G to be the number of genotypes observed in the population (let's say A = 10, B = 20)
1 2 
Unless freq
is supplied, the function will automatically calculate
the roundrobin allele frequencies with rraf
and G
with nmll
.
a vector of Psex for each sample.
The values of Psex represent the value for each multilocus genotype.
Additionally, when the argument pop
is not NULL
,
by_pop
is automatically TRUE
.
Zhian N. Kamvar, Jonah Brooks, Stacy A. KruegerHadfield, Erik Sotka
ArnaudHaond, S., Duarte, C. M., Alberto, F., & SerrĂ£o, E. A. 2007. Standardizing methods to address clonality in population studies. Molecular Ecology, 16(24), 51155139.
Parks, J. C., & Werth, C. R. 1993. A study of spatial features of clones in a population of bracken fern, Pteridium aquilinum (Dennstaedtiaceae). American Journal of Botany, 537544.
pgen
, rraf
, rrmlg
,
rare_allele_correction
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82  data(Pram)
Pram_psex < psex(Pram, by_pop = FALSE)
plot(Pram_psex, log = "y", col = ifelse(Pram_psex > 0.05, "red", "blue"))
abline(h = 0.05, lty = 2)
title("Probability of second encounter")
## Not run:
# With multiple encounters
Pram_psex < psex(Pram, by_pop = FALSE, method = "multiple")
plot(Pram_psex, log = "y", col = ifelse(Pram_psex > 0.05, "red", "blue"))
abline(h = 0.05, lty = 2)
title("Probability of multiple encounters")
# This can be also done assuming populations structure
Pram_psex < psex(Pram, by_pop = TRUE, method = "multiple")
plot(Pram_psex, log = "y", col = ifelse(Pram_psex > 0.05, "red", "blue"))
abline(h = 0.05, lty = 2)
title("Probability of multiple encounters\nwith pop structure")
# The above, but correcting zerovalue alleles by 1/(2*rrmlg) with no
# population structure assumed
# Type ?rare_allele_correction for details.
Pram_psex2 < psex(Pram, by_pop = FALSE, d = "rrmlg", mul = 1/2, method = "multiple")
plot(Pram_psex2, log = "y", col = ifelse(Pram_psex2 > 0.05, "red", "blue"))
abline(h = 0.05, lty = 2)
title("Probability of multiple encounters\nwith pop structure (1/(2*rrmlg))")
# We can also set G to the total population size
(G < nInd(Pram))
Pram_psex < psex(Pram, by_pop = TRUE, method = "multiple", G = G)
plot(Pram_psex, log = "y", col = ifelse(Pram_psex > 0.05, "red", "blue"))
abline(h = 0.05, lty = 2)
title("Probability of multiple encounters\nwith pop structure G = 729")
# Or we can set G to the number of unique MLGs
(G < rowSums(mlg.table(Pram, plot = FALSE) > 0))
Pram_psex < psex(Pram, by_pop = TRUE, method = "multiple", G = G)
plot(Pram_psex, log = "y", col = ifelse(Pram_psex > 0.05, "red", "blue"))
abline(h = 0.05, lty = 2)
title("Probability of multiple encounters\nwith pop structure G = nmll")
## An example of supplying previously calculated frequencies and G
# From Parks and Werth, 1993, using the first three genotypes.
# The row names indicate the number of samples found with that genotype
x < "
Hk Lap Mdh2 Pgm1 Pgm2 X6Pgd2
54 12 12 12 23 22 11
36 22 22 11 22 33 11
10 23 22 11 33 13 13"
# Since we aren't representing the whole data set here, we are defining the
# allele frequencies before the analysis.
afreq < c(Hk.1 = 0.167, Hk.2 = 0.795, Hk.3 = 0.038,
Lap.1 = 0.190, Lap.2 = 0.798, Lap.3 = 0.012,
Mdh2.0 = 0.011, Mdh2.1 = 0.967, Mdh2.2 = 0.022,
Pgm1.2 = 0.279, Pgm1.3 = 0.529, Pgm1.4 = 0.162, Pgm1.5 = 0.029,
Pgm2.1 = 0.128, Pgm2.2 = 0.385, Pgm2.3 = 0.487,
X6Pgd2.1 = 0.526, X6Pgd2.2 = 0.051, X6Pgd2.3 = 0.423)
xtab < read.table(text = x, header = TRUE, row.names = 1)
# Here we are expanding the number of samples to their observed values.
# Since we have already defined the allele frequencies, this step is actually
# not necessary.
all_samples < rep(rownames(xtab), as.integer(rownames(xtab)))
xgid < df2genind(xtab[all_samples, ], ncode = 1)
freqs < afreq[colnames(tab(xgid))] # only used alleles in the sample
pSex < psex(xgid, by_pop = FALSE, freq = freqs, G = 45)
# Note, pgen returns log values for each locus, here we take the sum across
# all loci and take the exponent to give us the value of pgen for each sample
pGen < exp(rowSums(pgen(xgid, by_pop = FALSE, freq = freqs)))
res < matrix(c(unique(pGen), unique(pSex)), ncol = 2)
colnames(res) < c("Pgen", "Psex")
res < cbind(xtab, nRamet = rownames(xtab), round(res, 5))
rownames(res) < 1:3
res # Compare to the first three rows of Table 2 in Parks & Werth, 1993
## End(Not run)

Questions? Problems? Suggestions? Tweet to @rdrrHQ or email at ian@mutexlabs.com.
Please suggest features or report bugs with the GitHub issue tracker.
All documentation is copyright its authors; we didn't write any of that.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.