sample.alleles: Sample alleles

Description Usage Arguments Details Value Author(s) Examples

View source: R/simaf.R

Description

Introduce sampling variance in allele frequency data to mimic the Pool-seq approach. On one hand, subjecting only a subset of individuals in a population to Pool-seq is modeled with hypergeometric sampling (mode = "individuals"). On the other hand, sampling variance introduced by sequencing only a fraction of all DNA fragments is modeled with binomial sampling (mode = "coverage").

Usage

1
sample.alleles(p, size, mode = c("coverage", "individuals"), Ncensus = NA, ploidy = 2)

Arguments

p

numeric vector defining relative allele frequencies, which are used as success probabilities in the sampling process.

size

numeric indicating the sample size to be used for binomial (mode = "coverage") or Poisson sampling (mode = "individuals"), see 'Details'.

mode

character string specifying the sampling mode. Possible values are "coverage" and "individuals".

Ncensus

numeric specifying the census size of the entire population (before sampling).

ploidy

numeric, the ploidy of the individuals.

Details

If mode = "coverage" and length(size) == 1 then for each allele frequency an individual sequence coverage value will be drawn from a Poisson distribution with lambda = size. Otherwise (length(size) > 1) the values in size will be used directly and recycled if necessary. The "coverage" sampling mode applies rbinom with size equal to the sequence coverage and prob equal to the allele frequency (p).

If mode = "individuals" then size has to be an integer specifying the number of individuals with a certain ploidy that are sampled from the population. Here rhyper is applied.

Value

A numeric vector of allele frequencies after introducing sampling variance or (if mode = "coverage" and length(size) == 1) a data.table containing the following columns:

p.smpld

allele frequencies after sampling

size

sequence coverage for each position, drawn from a Poisson distribution with lambda = size

Author(s)

Thomas Taus

Examples

1
2
3
4
5
6
7
8
# generate random allele frequencies
af <- runif(10000, min=0, max=1)

# introduce sampling variance to mimic Pool-seq of the entire population at 100X coverage
afSeq <- sample.alleles(af, size=100, mode="coverage")

# plot distribution of differences in allele frequency before and after sampling
hist(af-afSeq$p.smpld, main="Sequencing at 100X", xlab="Error in allele frequency (%)", ylab="Occurrences")

ThomasTaus/poolSeq documentation built on Feb. 17, 2020, 1:52 p.m.