dadi_inputs | R Documentation |
Creates an input file for the program dadi, described in Gutenkunst et al. (2009). The input is biallelic genotypes or allele frequencies at SNP loci in a long-format data table.
dadi_inputs(
dat,
type,
sampCol = "SAMPLE",
popCol = "POP",
locusCol = "LOCUS",
refCol = "REF",
altCol = "ALT",
genoCol = "GT",
freqCol = "FREQ",
indsCol = "INDS",
freqMethod = "probs",
popSub = NULL,
popLevels = NULL
)
dat |
Data table: A long-format data table of biallelic genotypes, coded as '/' separated alleles ('0/0', '0/1', '1/1'), or counts of the Alt alleles (0, 1, 2, respectively). Alternatively, a long-format data table of allele frequencies. Columns required for both genotypes and allele frequencies:
Columns required only for genotypes:
Columns required only for allele frequencies:
|
type |
Character: One of |
sampCol |
Character: Sample ID. Default = |
popCol |
Character: Population ID. Default = |
locusCol |
Character: Locus ID. Default = |
refCol |
Character: Reference allele. Default = |
altCol |
Character: Alternate allele. Default = |
genoCol |
Character: The genotype. Default = |
freqCol |
Character: The reference allele frequency. Default = |
indsCol |
Character: The number of individuals per population pool. Default = |
freqMethod |
Character: The method to estimate the SFS from allele
frequency data. Either |
popSub |
Character: The populations to subset out of |
popLevels |
Character: An optional vector of the population IDs used
to manually specify the first and second population order. Default = |
Because pool-seq provides estimates of allele frequencies,
not direct observations of allele counts, we have to infer the SFS from
the allele frequencies. This is determined by the argument freqMethod
.
When freqMethod=='counts'
, the default, the allele counts are simply rounded to the
nearest integer (e.g. 1.5 = 2, and 1.4 = 1), relative to the number of chromosomes.
The Ref allele counts are made first, then the Alt allele counts are made.
For instance, if 20 diploid individuals were pooled and the Ref allele frequency was 0.82,
from the 40 haploid chromosomes, 33 (32.8 rounded up) would be expected to contain the
Ref allele, whilst 7 (40 - 33) would be expected to carry the Alt allele. NOTE: if the
estimated number of individuals for the Ref allele is < 1 but > 0, this will always be
rounded to 1. This method will produce a consistent SFS, but note that extremely low
Ref allele frequencies will have a tendency to produce counts of 1.
When freqMethod=='probs'
, the allele counts are derived from a binomial draw using
R's rbinom()
function. Again, if the Ref allele frequency from pooled diploids was
0.82, then the SFS would be generated from the command call: rbinom(n=1, size=40, prob=0.82)
,
which would produce a probable number of Ref allele counts, and the Alt allele counts would
be 40 minus this number. This method will not produce consistently reproducible SFSs due
to the nature of the probabilistic draws. However, it does avoid potentially biasing
the SFS from rounding errors when allele frequencies are low.
Returns a data table in the dadi input format.
Gutenkunst et al. (2009) Inferring the joint demographic history of multiply populations from multidimensional SNP frequency data. PLoS Genetics: 10, e1000695.
library(genomalicious)
data(data_Genos)
data(data_PoolFreqs)
data(data_PoolInfo)
### Make the dadi input from genotype data
dadi_inputs(dat=data_Genos, type='genos', popSub=c('Pop1', 'Pop2'))
### Make the dadi input from allele frequency data
colnames(data_PoolFreqs)
# We need to add in the $INDS column to the data, data_PoolFreqs
newFreqData <- left_join(data_PoolFreqs, data_PoolInfo)
colnames(newFreqData)
# Three
dadi_inputs(dat=newFreqData, type='freqs', freqMethod='probs', )
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.