Description Usage Arguments Details Value Author(s) Examples

Introduce sampling variance in allele frequency data to mimic the Pool-seq approach. On one hand, subjecting only a subset of individuals in a population to Pool-seq is modeled with hypergeometric sampling (`mode = "individuals"`

). On the other hand, sampling variance introduced by sequencing only a fraction of all DNA fragments is modeled with binomial sampling (`mode = "coverage"`

).

1 | ```
sample.alleles(p, size, mode = c("coverage", "individuals"), Ncensus = NA, ploidy = 2)
``` |

`p` |
numeric vector defining relative allele frequencies, which are used as success probabilities in the sampling process. |

`size` |
numeric indicating the sample size to be used for binomial ( |

`mode` |
character string specifying the sampling mode. Possible values are |

`Ncensus` |
numeric specifying the census size of the entire population (before sampling). |

`ploidy` |
numeric, the ploidy of the individuals. |

If `mode = "coverage"`

and `length(size) == 1`

then for each allele frequency an individual sequence coverage value will be drawn from a Poisson distribution with `lambda = size`

. Otherwise (`length(size) > 1`

) the values in `size`

will be used directly and recycled if necessary. The `"coverage"`

sampling mode applies `rbinom`

with `size`

equal to the sequence coverage and `prob`

equal to the allele frequency (`p`

).

If `mode = "individuals"`

then `size`

has to be an integer specifying the number of individuals with a certain `ploidy`

that are sampled from the population. Here `rhyper`

is applied.

A numeric vector of allele frequencies after introducing sampling variance or (if `mode = "coverage"`

and `length(size) == 1`

) a `data.table`

containing the following columns:

`p.smpld` |
allele frequencies after sampling |

`size` |
sequence coverage for each position, drawn from a Poisson distribution with |

Thomas Taus

1 2 3 4 5 6 7 8 | ```
# generate random allele frequencies
af <- runif(10000, min=0, max=1)
# introduce sampling variance to mimic Pool-seq of the entire population at 100X coverage
afSeq <- sample.alleles(af, size=100, mode="coverage")
# plot distribution of differences in allele frequency before and after sampling
hist(af-afSeq$p.smpld, main="Sequencing at 100X", xlab="Error in allele frequency (%)", ylab="Occurrences")
``` |

ThomasTaus/poolSeq documentation built on Oct. 22, 2018, 7:21 p.m.

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.