sample.alleles: Sample alleles In ThomasTaus/poolSeq: Simulate and Analyze Pool-seq Data

Description

Introduce sampling variance in allele frequency data to mimic the Pool-seq approach. On one hand, subjecting only a subset of individuals in a population to Pool-seq is modeled with hypergeometric sampling (`mode = "individuals"`). On the other hand, sampling variance introduced by sequencing only a fraction of all DNA fragments is modeled with binomial sampling (`mode = "coverage"`).

Usage

 `1` ```sample.alleles(p, size, mode = c("coverage", "individuals"), Ncensus = NA, ploidy = 2) ```

Arguments

 `p` numeric vector defining relative allele frequencies, which are used as success probabilities in the sampling process. `size` numeric indicating the sample size to be used for binomial (`mode = "coverage"`) or Poisson sampling (`mode = "individuals"`), see 'Details'. `mode` character string specifying the sampling mode. Possible values are `"coverage"` and `"individuals"`. `Ncensus` numeric specifying the census size of the entire population (before sampling). `ploidy` numeric, the ploidy of the individuals.

Details

If `mode = "coverage"` and `length(size) == 1` then for each allele frequency an individual sequence coverage value will be drawn from a Poisson distribution with `lambda = size`. Otherwise (`length(size) > 1`) the values in `size` will be used directly and recycled if necessary. The `"coverage"` sampling mode applies `rbinom` with `size` equal to the sequence coverage and `prob` equal to the allele frequency (`p`).

If `mode = "individuals"` then `size` has to be an integer specifying the number of individuals with a certain `ploidy` that are sampled from the population. Here `rhyper` is applied.

Value

A numeric vector of allele frequencies after introducing sampling variance or (if `mode = "coverage"` and `length(size) == 1`) a `data.table` containing the following columns:

 `p.smpld` allele frequencies after sampling `size` sequence coverage for each position, drawn from a Poisson distribution with `lambda = size`

Thomas Taus

Examples

 ```1 2 3 4 5 6 7 8``` ```# generate random allele frequencies af <- runif(10000, min=0, max=1) # introduce sampling variance to mimic Pool-seq of the entire population at 100X coverage afSeq <- sample.alleles(af, size=100, mode="coverage") # plot distribution of differences in allele frequency before and after sampling hist(af-afSeq\$p.smpld, main="Sequencing at 100X", xlab="Error in allele frequency (%)", ylab="Occurrences") ```

ThomasTaus/poolSeq documentation built on Oct. 22, 2018, 7:21 p.m.