allele_freqs_DT: Generate an allele frequency data table
In j-a-thia/genomalicious: A smorgasbord of R functions for population genomic analyses

allele_freqs_DT

R Documentation

Generate an allele frequency data table

Description

Takes a data.table of genotypes or allele counts and calculates the allele frequency for each allele. Can be used for multiallelic datasets.

Usage

allele_freqs_DT(
  dat,
  type,
  sampCol = "SAMPLE",
  popCol = "POP",
  locusCol = "LOCUS",
  genoCol = "GT",
  countCol = "COUNTS",
  indsCol = "INDS"
)

Arguments

`dat`	Data.table: Long-format data table of variants, e.g., as read in with `genomalicious::vcf2DT`.
`type`	Character: Two modes, one of "genos" for individual genotype data, or "counts" of allele in populations.
`sampCol`	Character: The column with sample ID information. Default is "SAMPLE". Only needed if `type=="genos"`.
`popCol`	Character: The column with population ID information. Default is "POP".
`locusCol`	Character: The column with locus ID information. Default is "LOCUS".
`genoCol`	Character: The column with genotype information. Default is "GT". Only needed if `type=="genos"`. Genotypes must be in character format where alleles are separated by the delimiter, "/". For example, "0/1" is one Ref and one Alt allele 1; "2/2" is two Alt allele 2.
`countCol`	Character: The column with allele count information for all alleles. For example, in pool-seq of populations, the number of read counts for each allele. Default is "COUNTS". Only needed if `type=="counts"`. Counts should be separated by commas, with the Ref allele first. E.g., "20,60,4" would indicate 20, 60, and 4 counts of the Ref allele, Alt allele 1, and Alt allele 2, respectively.
`indsCol`	Character: The column with the number of sampled individuals per population. Default is "INDS".

Details

This function assumes no missing values. For type=="genos", all sampled individuals must have a genotype value for each locus. For type=="counts", all sampled populations must have count data for each locus. You could impute for individuals, or drop loci with missing data for for individual or population datasets.

Note, when type=="counts", the allele frequencies are based on the proportion of counts per allele relative to the total number of observed counts at a locus. However, this function will align the total sample number of sequenced individuals against the counts.

Value

Returns a long format data table with the following columns:

$POP, the population ID column.
$LOCUS, the locus ID column.
$ALLELE, the allele ID column (0 is Ref, and each subsequent Alt allele is 1 -> n alleles).
$COUNTS, the number of observations of the allele: the number of individuals for genotype data, or the number of counts (e.g., reads) for population count data.
$INDS, the number of individuals sampled per population.
$FREQ, the estimated allele frequency.
$HET, the proportion of heterozygotes, calculated directly from genotype data, or estimated as the expected heteroygosity for population allele frequencies. Assumes diploid organisms.

Examples

library(genomalicious)

# Import biallelic SNPs as genotypes or population counts
data(data_Genos)
data(data_PoolFreqs)

# On genotypes, convert the $GT values to characters.
dat_gt <- data_Genos %>%
  copy %>%
  .[, GT:=as.character(GT)] %>%
  .[GT==0, GT:='0/0'] %>%
  .[GT==1, GT:='0/1'] %>%
  .[GT==2, GT:='1/1']

print(dat_gt)

allele_freqs_DT(dat=dat_gt, type='genos')

# On counts, need to make a $COUNTS column, and add in 30 individuals
# per locus per population in a new $INDS column.
dat_counts <- data_PoolFreqs %>%
  copy %>%
  .[, COUNTS:=paste(RO,AO,sep=',')] %>%
  .[, INDS:=30]

print(dat_counts)

allele_freqs_DT(dat=dat_counts, type='counts')

j-a-thia/genomalicious documentation built on April 13, 2025, 9:41 a.m.

j-a-thia/genomalicious index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

j-a-thia/genomalicious
A smorgasbord of R functions for population genomic analyses

allele_freqs_DT: Generate an allele frequency data table
In j-a-thia/genomalicious: A smorgasbord of R functions for population genomic analyses

Generate an allele frequency data table

Description

Usage

Arguments

Details

Value

Examples

Related to allele_freqs_DT in j-a-thia/genomalicious...

R Package Documentation

Browse R Packages

We want your feedback!

j-a-thia/genomalicious A smorgasbord of R functions for population genomic analyses

allele_freqs_DT: Generate an allele frequency data table In j-a-thia/genomalicious: A smorgasbord of R functions for population genomic analyses

Generate an allele frequency data table

Description

Usage

Arguments

Details

Value

Examples

Related to allele_freqs_DT in j-a-thia/genomalicious...

R Package Documentation

Browse R Packages

We want your feedback!

j-a-thia/genomalicious
A smorgasbord of R functions for population genomic analyses

allele_freqs_DT: Generate an allele frequency data table
In j-a-thia/genomalicious: A smorgasbord of R functions for population genomic analyses