sync_to_frequencies: Data input from a sync file

Description Usage Arguments Details Value Author(s) References

View source: R/functions-read-sync.R

Description

Reads in SNP time series data from a file with .sync format.

Usage

1
sync_to_frequencies(file, base.pops, header, mincov = 15)

Arguments

file

the name of the ".sync" file where the data should be read from. Sync files are specified in Kofler et al. (2011). Sync files contain 3 + n columns with; col 1: chromosome (reference contig), col 2: position (in the reference contig), col 3: reference allele, col >3: sync entries for allele frequencies for all populations in the form A-count:T-count:C-count:G-count:N-count:deletion-count. Sync files originally don't have a header but headers are accepted when specified with header=T.

base.pops

logical vector with the same length as the number of libraries present in the sync file. Libraries indicated with TRUE will be used for identification on the two main alleles (minor and major allele). Allele frequencies of all libraries will subsequently be polarized for the minor allele in this specified subset.

header

logical value specifying whether a header is present in the provided sync file.

mincov

minimum coverage to calculate allele frequencies. If the sum of allele counts of the minor and major allele are below this threshold the respective frequency will be encoded as NA (default=15).

Details

Time series data from a file with sync format are read in. The sync format is specified in Kofler et al. 2011 (PoPoolation2: identifying differentiation between populations using sequencing of pooled DNA samples (Pool-Seq)). Allele counts are read in for each library and SNP and transformed to allele frequencies. Allele frequencies are polarized for the minor and major allele of a specifies (sub-)set of libraries, i.e. libraries of the experimentla founder population. Frequencies are determined only based on the counts of the two most common alleles in the specified base populations base.pops. Please note: This procedure does not substitute a proper SNP calling. Provided sync files are expected only to contain positions of previously called SNPs and at least two alleles should be present in the specified base populations.

Value

a data.table with 6 plus N columns with; col 1: chr (chromosome), col 2: pos (position on respective chromosome), col 3: ref (reference allele), col 4: minallele (minor allele across all specified base populations), col 5: majallele (major allele across all specified base populations), col 6: weighted mean frequency of all specified base populations poloarlized for the minor allele, col >6: allele frequency of the minor allele for each library

Author(s)

Susanne U. Franssen

References

Franssen, Barton & Schloetterer 2016, Reconstruction of haplotype-blocks selected during experimental evolution, MBE


haploReconstruct documentation built on May 2, 2019, 1:46 p.m.