Description Usage Arguments Details Value Author(s) Examples
Introduce sampling variance in allele frequency data to mimic the Pool-seq approach. On one hand, subjecting only a subset of individuals in a population to Pool-seq is modeled with hypergeometric sampling (mode = "individuals"
). On the other hand, sampling variance introduced by sequencing only a fraction of all DNA fragments is modeled with binomial sampling (mode = "coverage"
).
1 | sample.alleles(p, size, mode = c("coverage", "individuals"), Ncensus = NA, ploidy = 2)
|
p |
numeric vector defining relative allele frequencies, which are used as success probabilities in the sampling process. |
size |
numeric indicating the sample size to be used for binomial ( |
mode |
character string specifying the sampling mode. Possible values are |
Ncensus |
numeric specifying the census size of the entire population (before sampling). |
ploidy |
numeric, the ploidy of the individuals. |
If mode = "coverage"
and length(size) == 1
then for each allele frequency an individual sequence coverage value will be drawn from a Poisson distribution with lambda = size
. Otherwise (length(size) > 1
) the values in size
will be used directly and recycled if necessary. The "coverage"
sampling mode applies rbinom
with size
equal to the sequence coverage and prob
equal to the allele frequency (p
).
If mode = "individuals"
then size
has to be an integer specifying the number of individuals with a certain ploidy
that are sampled from the population. Here rhyper
is applied.
A numeric vector of allele frequencies after introducing sampling variance or (if mode = "coverage"
and length(size) == 1
) a data.table
containing the following columns:
p.smpld |
allele frequencies after sampling |
size |
sequence coverage for each position, drawn from a Poisson distribution with |
Thomas Taus
1 2 3 4 5 6 7 8 | # generate random allele frequencies
af <- runif(10000, min=0, max=1)
# introduce sampling variance to mimic Pool-seq of the entire population at 100X coverage
afSeq <- sample.alleles(af, size=100, mode="coverage")
# plot distribution of differences in allele frequency before and after sampling
hist(af-afSeq$p.smpld, main="Sequencing at 100X", xlab="Error in allele frequency (%)", ylab="Occurrences")
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.