dadi_inputs: Genertate dadi input from genotype or allele frequency data
In j-a-thia/genomalicious: A smorgasbord of R functions for population genomic analyses

dadi_inputs

R Documentation

Genertate dadi input from genotype or allele frequency data

Description

Creates an input file for the program dadi, described in Gutenkunst et al. (2009). The input is biallelic genotypes or allele frequencies at SNP loci in a long-format data table.

Usage

dadi_inputs(
  dat,
  type,
  sampCol = "SAMPLE",
  popCol = "POP",
  locusCol = "LOCUS",
  refCol = "REF",
  altCol = "ALT",
  genoCol = "GT",
  freqCol = "FREQ",
  indsCol = "INDS",
  freqMethod = "probs",
  popSub = NULL,
  popLevels = NULL
)

Arguments

`dat`	Data table: A long-format data table of biallelic genotypes, coded as '/' separated alleles ('0/0', '0/1', '1/1'), or counts of the Alt alleles (0, 1, 2, respectively). Alternatively, a long-format data table of allele frequencies. Columns required for both genotypes and allele frequencies: The population ID (see param `popCol`). The locus ID (see param `locusCol`). The reference allele (see param `refCol`). The alternate allele (see param `altCol`). Columns required only for genotypes: The sample ID (see param `sampCol`). The genotypes (see param `genoCol`). Columns required only for allele frequencies: The allele frequencies (see param `freqCol`). The number of individuals used to obtain the allele frequency estimate (see param `indsCol`).
`type`	Character: One of `'genos'` or `'freqs'`, to calculate F-statistics from genotype or allele frequency data, respectively.
`sampCol`	Character: Sample ID. Default = `'SAMPLE'`.
`popCol`	Character: Population ID. Default = `'POP'`.
`locusCol`	Character: Locus ID. Default = `'LOCUS'`.
`refCol`	Character: Reference allele. Default = `'REF'`.
`altCol`	Character: Alternate allele. Default = `'ALT'`.
`genoCol`	Character: The genotype. Default = `'GT'`.
`freqCol`	Character: The reference allele frequency. Default = `'FREQ'`.
`indsCol`	Character: The number of individuals per population pool. Default = `'INDS'`.
`freqMethod`	Character: The method to estimate the SFS from allele frequency data. Either `'probs'` or `'counts'`. Default = `'probs'`. Only applicable when `type=='freqs'`. See Details for parameterisation.
`popSub`	Character: The populations to subset out of `popCol`. Default = `NULL`.
`popLevels`	Character: An optional vector of the population IDs used to manually specify the first and second population order. Default = `NULL`.

Details

Because pool-seq provides estimates of allele frequencies, not direct observations of allele counts, we have to infer the SFS from the allele frequencies. This is determined by the argument freqMethod.

When freqMethod=='counts', the default, the allele counts are simply rounded to the nearest integer (e.g. 1.5 = 2, and 1.4 = 1), relative to the number of chromosomes. The Ref allele counts are made first, then the Alt allele counts are made. For instance, if 20 diploid individuals were pooled and the Ref allele frequency was 0.82, from the 40 haploid chromosomes, 33 (32.8 rounded up) would be expected to contain the Ref allele, whilst 7 (40 - 33) would be expected to carry the Alt allele. NOTE: if the estimated number of individuals for the Ref allele is < 1 but > 0, this will always be rounded to 1. This method will produce a consistent SFS, but note that extremely low Ref allele frequencies will have a tendency to produce counts of 1.

When freqMethod=='probs', the allele counts are derived from a binomial draw using R's rbinom() function. Again, if the Ref allele frequency from pooled diploids was 0.82, then the SFS would be generated from the command call: rbinom(n=1, size=40, prob=0.82), which would produce a probable number of Ref allele counts, and the Alt allele counts would be 40 minus this number. This method will not produce consistently reproducible SFSs due to the nature of the probabilistic draws. However, it does avoid potentially biasing the SFS from rounding errors when allele frequencies are low.

Value

Returns a data table in the dadi input format.

References

Gutenkunst et al. (2009) Inferring the joint demographic history of multiply populations from multidimensional SNP frequency data. PLoS Genetics: 10, e1000695.

Examples

library(genomalicious)

data(data_Genos)
data(data_PoolFreqs)
data(data_PoolInfo)

### Make the dadi input from genotype data
dadi_inputs(dat=data_Genos, type='genos', popSub=c('Pop1', 'Pop2'))

### Make the dadi input from allele frequency data
colnames(data_PoolFreqs)

# We need to add in the $INDS column to the data, data_PoolFreqs
newFreqData <- left_join(data_PoolFreqs, data_PoolInfo)
colnames(newFreqData)

# Three
dadi_inputs(dat=newFreqData, type='freqs', freqMethod='probs', )

j-a-thia/genomalicious documentation built on April 13, 2025, 9:41 a.m.

j-a-thia/genomalicious index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

j-a-thia/genomalicious
A smorgasbord of R functions for population genomic analyses

dadi_inputs: Genertate dadi input from genotype or allele frequency data
In j-a-thia/genomalicious: A smorgasbord of R functions for population genomic analyses

Genertate dadi input from genotype or allele frequency data

Description

Usage

Arguments

Details

Value

References

Examples

Related to dadi_inputs in j-a-thia/genomalicious...

R Package Documentation

Browse R Packages

We want your feedback!

j-a-thia/genomalicious A smorgasbord of R functions for population genomic analyses

dadi_inputs: Genertate dadi input from genotype or allele frequency data In j-a-thia/genomalicious: A smorgasbord of R functions for population genomic analyses

Genertate dadi input from genotype or allele frequency data

Description

Usage

Arguments

Details

Value

References

Examples

Related to dadi_inputs in j-a-thia/genomalicious...

R Package Documentation

Browse R Packages

We want your feedback!

j-a-thia/genomalicious
A smorgasbord of R functions for population genomic analyses

dadi_inputs: Genertate dadi input from genotype or allele frequency data
In j-a-thia/genomalicious: A smorgasbord of R functions for population genomic analyses