polyfreqs: Bayesian population genomics in autopolyploids

Description Usage Arguments Details Value Author(s) References Examples

Description

polyfreqs implements a Gibbs sampling algorithm to perform Bayesian inference on the allele frequencies (and other quantities) in a population of autopolyploids. It is the main function for conducting inference with the polyfreqs package.

Usage

1
2
3
polyfreqs(tM, rM, ploidy, iter = 1e+05, thin = 100, burnin = 20,
  print = 1000, error = 0.01, genotypes = FALSE, geno_dir = "genotypes",
  col_header = "", outfile = "polyfreqs-mcmc.out", quiet = FALSE)

Arguments

tM

Total reads matrix: matrix containing the total number of reads mapping to each locus for each individual.

rM

Reference reads marix: matrix containing the number of reference reads mapping to each locus for each individual.

ploidy

The ploidy level of individuals in the population (must be >= 2).

iter

The number of MCMC generations to run (default=100,000).

thin

Thins the MCMC output by sampling everything thin generations (default=100).

burnin

Percent of the posterior samples to discard as burn-in (default=20).

print

Frequency of printing the current MCMC generation to stdout (default=1000).

error

The level of sequencing error. A fixed constant (default=0.01).

genotypes

Logical variable indicating whether or not to print the values of the genotypes sampled during the MCMC (default=FALSE).

geno_dir

File path to directory containing the posterior samples of genotypes output by polyfreqs (default = "genotypes").

col_header

Optional column header tag for use in running loci in parallel (default="").

outfile

The name of the ouput file that samples from the posterior distribution of allele frequencies are written to (default="polyfreqs-mcmc.out").

quiet

Suppress the printing of the current MCMC generation to stdout (default=FALSE).

Details

Data sets run through polyfreqs must be of class "matrix" with row names representing the names of the individuals sampled. The simplest way to get data into R for running an analysis is to format the total read matrix and reference read matrix as tab delimited text files with the first column containing the individual names and one column after that with the read counts for each locus. These data can then be read in using the read.table function with the row.names argument set equal to 1. An optional tab delimited list of locus names can be included as the first row and are treated as column headers for each locus (set header=T in the read.table function). When running the polyfreqs, there are a number of options that control what the function returns. To estimate genotypes and print posterior genotype samples to file, set the genotypes argument to TRUE and select a name for the output directory geno_dir (defaults to "genotypes"). polyfreqs also prints the current MCMC generation (with a frequency set by the print_freqs argument) to the R console so that users can track run times. This print can be turned off by setting quiet=TRUE. More details on using polyfreqs can be found in the introductory vignette.

Value

Returns a list of 3 (4 if genotypes=TRUE) items:

posterior_freqs

A matrix of the posterior samples of allele frequencies. These are also printed to the file with the name given by the outfile argument.

map_genotypes

If genotypes=TRUE, then a fourth item will be returned as a matrix containing the maximum a posteriori genotype estimates accounting for burn-in.

het_obs

Matrix of posterior samples of observed heterozygosity.

het_exp

Matrix of posterior samples of expected heterozygosity.

Author(s)

Paul Blischak

References

Blischak PD, LS Kubatko and AD Wolfe. Accounting for genotype uncertainty in the estimation of allele frequencies in autopolyploids. In revision.

Examples

1
2
3

pblischak/polyfreqs documentation built on May 24, 2019, 10:37 p.m.