fstat_calc: Calculate F-statistics from genotypes or allele frequencies...
In j-a-thia/genomalicious: A smorgasbord of R functions for population genomic analyses

fstat_calc

R Documentation

Calculate F-statistics from genotypes or allele frequencies (counts)

Description

Function takes a genotypes or allele frequencies in a long-format data table and calculates Weir & Cockerham's F-statistics (Weir & Cockerham, 1984). Permutations can be used to test statistical significance of F-statistics in genotype data sets. Can deal with multiallelic data. See Details for more information.

Usage

fstat_calc(
  dat,
  type,
  method,
  fstatVec = NULL,
  popCol = "POP",
  sampCol = "SAMPLE",
  locusCol = "LOCUS",
  genoCol = "GT",
  countCol = "COUNTS",
  indsCol = "INDS",
  permute = FALSE,
  keepLocus = TRUE,
  numPerms = 100,
  numCores = 1
)

Arguments

`dat`	Data table: For genotype data, a long-format data table of genotypes, coded as '/' separated alleles ('0/0', '0/1', '1/1'). For allele frequency data, a long-format data table of allele counts. Columns required for both genotypes and allele frequencies: The population ID (see param `popCol`). The locus ID (see param `locusCol`). Columns required only for genotypes: The sample ID (see param `sampCol`). The genotypes (see param `genoCol`). Columns required only for allele frequencies: The allelic count column (see param `countCol`). The number of individuals used to obtain the allele frequency estimate (see param `indsCol`).
`type`	Character: One of `'genos'` or `'freqs'`, to calculate F-statistics from genotype or allele frequency data, respectively.
`method`	Character: One of `'global'` or `'pairwise'` for global or pairwise F-statistics, respectively.
`fstatVec`	Character: A vector of F-statistics to calculate. This is only applicable for genotype data, `type=='genos'`. Must include one of `"FST"`, `"FIS"`, or `"FIT"`.
`popCol`	Character: The column name with the population information. Default is `'POP'`.
`sampCol`	Character: The column name with the sampled individual information. Default is `'SAMPLE'`.
`locusCol`	Character: The column name with the locus information. Default is `'LOCUS'`.
`genoCol`	Character: The column name with the genotype information. Default is `'GT'`.
`countCol`	Character: The column name with the allele count information. Default is `'FREQ'`. Counts for each allele need to be separated with a comma, starting with the Ref allele, followed by each subsequent Alt allele. E.g., '0,25', or '5,7,10', for a locus with 2 alleles and 3 alleles, respectively. You must code alleles within a locus at same positions in the character string across all populations.
`indsCol`	Character: The column name with the number of individuals contributing to the allele freuqency estimate. Default is `indsCol`.
`permute`	Logical: Should permutations be performed to test the statistical significance of F-statistics? Default is `FALSE`. Can only be performed on genotype data, `type=='genos'`.
`keepLocus`	Logical: Should locus-specific estimates of F-statistics be kept? Default is TRUE. Dropping locus-specific estimates will dramatically save memory and the size of the returned list.
`numPerms`	Integer: The number of permutations to perform. Default is 100.
`numCores`	Integer: The number of cores to use when running permutations. Default is 1.

Details

With genotype data, the F-statistics FST, FIS, and FIT can be calculated. Only FST can be calculated from allele frequency data.

F-statistics from genotype data are calculated from the variance components 'a', 'b', and 'c', which have been standardised for observed heterozygosity. FST from allele frequency data uses an estimate of the expected heterozygosity.

Permutation tests for genotype data involve random shuffling of individuals among populations, recalculating F-statistics, and testing the hypothesis that the permuted F-statistic > observed F-statistic. The p-value represents the proportion of permutation that were TRUE to this expression. That is, if no permuted values are greater than the observed, p=0. Likewise, if all the permuted values are greater than the observed, p=1.

Value

A list is returned with three indexes.

The first index is $genome, the genome-wide F-statistics. If global estimates were requested, global==TRUE, then this is just a single row; the estimates across all populations. If pairwise esimates were requested, pairwise==TRUE, then there are $POP1 and $POP2, which represent two populations tested.

The second index is $locus, the locus-specific F-statistics. This is a data table with a $LOCUS column for global estimates at each locus. when global==TRUE. If pairwise population estimates have been requested, pairwise==TRUE, then there are $POP1 and $POP2, which represent the two populations tested.

The third index is $permute, the permutation results. This index will be NULL when frequencies are used, i.e., type=='freqs', and will only contain data if type=='genos' and permute==TRUE. $permute is itself a list, with two subindexes:

$fstat: The permuted F-statistics. If global==TRUE, then this will simply be a single row of global estimates. If pairwise==TRUE, then this will be a data table with columns $POP1, $POP2, and a column for each F-statistic.
$pval: The permuted p-values. This is a long-format data table. If global==TRUE, then there are two column: $STAT, which contains the F-statistic; and $PVAL, which contains the global permuted p-value. If pairwise==TRUE, then there will two additional columns, $POP1 and $POP2.

References

Weir & Cockerham (1984) Evolution. DOI: 10.1111/j.1558-5646.1984.tb05657.x Weir et al. (2002) Annals of Human Genetics. DOI: 10.1146/annurev.genet.36

Examples

library(genomalicious)

data(data_Genos)
data(data_PoolFreqs)
data(data_PoolInfo)

# Set genotypes as characters
data_Genos$GT %>% head
data_Genos[, GT:=genoscore_converter(GT)]
data_Genos$GT %>% head

# Set allele counts and individuals in pool-seq data
data_PoolFreqs %>% head
data_PoolInfo %>% head

data_PoolFreqs[, COUNTS:=paste(RO,AO,sep=',')]

data_PoolFreqs$INDS <- data_PoolInfo$INDS[
match(data_PoolFreqs$POOL, data_PoolInfo$POOL)
]

head(data_PoolFreqs)

# Genotypes and global F-statistics
geno_global_f <- fstat_calc(
dat=data_Genos,
type='genos', method='global', fstatVec=c('FST','FIS','FIT'),
popCol='POP', sampCol='SAMPLE',
locusCol='LOCUS', genoCol='GT',
permute=FALSE
)

# Genotypes and pairwise F-statistics
geno_pair_f <- fstat_calc(
dat=data_Genos,
type='genos', method='pairwise', fstatVec=c('FST','FIS','FIT'),
popCol='POP', sampCol='SAMPLE',
locusCol='LOCUS', genoCol='GT',
permute=FALSE
)

# Allele frequencies (from counts) and global FST
freqs_global_f <- fstat_calc(
dat=data_PoolFreqs,
type='freqs', method='global', fstatVec=NULL,
popCol='POP', locusCol='LOCUS',
countCol='COUNTS', indsCol='INDS',
permute=FALSE
)

# Allele frequencies (from counts) and pairwise FST
freqs_pair_f <- fstat_calc(
dat=data_PoolFreqs,
type='freqs', method='pairwise', fstatVec=NULL,
popCol='POP', locusCol='LOCUS',
countCol='COUNTS', indsCol='INDS',
permute=FALSE
)

j-a-thia/genomalicious documentation built on April 13, 2025, 9:41 a.m.

j-a-thia/genomalicious index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

j-a-thia/genomalicious
A smorgasbord of R functions for population genomic analyses

fstat_calc: Calculate F-statistics from genotypes or allele frequencies...
In j-a-thia/genomalicious: A smorgasbord of R functions for population genomic analyses

Calculate F-statistics from genotypes or allele frequencies (counts)

Description

Usage

Arguments

Details

Value

References

Examples

Related to fstat_calc in j-a-thia/genomalicious...

R Package Documentation

Browse R Packages

We want your feedback!

j-a-thia/genomalicious A smorgasbord of R functions for population genomic analyses

fstat_calc: Calculate F-statistics from genotypes or allele frequencies... In j-a-thia/genomalicious: A smorgasbord of R functions for population genomic analyses

Calculate F-statistics from genotypes or allele frequencies (counts)

Description

Usage

Arguments

Details

Value

References

Examples

Related to fstat_calc in j-a-thia/genomalicious...

R Package Documentation

Browse R Packages

We want your feedback!

j-a-thia/genomalicious
A smorgasbord of R functions for population genomic analyses

fstat_calc: Calculate F-statistics from genotypes or allele frequencies...
In j-a-thia/genomalicious: A smorgasbord of R functions for population genomic analyses