fstat_calc | R Documentation |
Function takes a genotypes or allele frequencies in a long-format data table and calculates Weir & Cockerham's F-statistics (Weir & Cockerham, 1984). Permutations can be used to test statistical significance of F-statistics in genotype data sets. Can deal with multiallelic data. See Details for more information.
fstat_calc(
dat,
type,
method,
fstatVec = NULL,
popCol = "POP",
sampCol = "SAMPLE",
locusCol = "LOCUS",
genoCol = "GT",
countCol = "COUNTS",
indsCol = "INDS",
permute = FALSE,
keepLocus = TRUE,
numPerms = 100,
numCores = 1
)
dat |
Data table: For genotype data, a long-format data table of genotypes, coded as '/' separated alleles ('0/0', '0/1', '1/1'). For allele frequency data, a long-format data table of allele counts. Columns required for both genotypes and allele frequencies:
Columns required only for genotypes:
Columns required only for allele frequencies:
|
type |
Character: One of |
method |
Character: One of |
fstatVec |
Character: A vector of F-statistics to calculate. This is only
applicable for genotype data, |
popCol |
Character: The column name with the population information.
Default is |
sampCol |
Character: The column name with the sampled individual information.
Default is |
locusCol |
Character: The column name with the locus information.
Default is |
genoCol |
Character: The column name with the genotype information.
Default is |
countCol |
Character: The column name with the allele count information.
Default is |
indsCol |
Character: The column name with the number of individuals
contributing to the allele freuqency estimate. Default is |
permute |
Logical: Should permutations be performed to test the
statistical significance of F-statistics? Default is |
keepLocus |
Logical: Should locus-specific estimates of F-statistics be kept? Default is TRUE. Dropping locus-specific estimates will dramatically save memory and the size of the returned list. |
numPerms |
Integer: The number of permutations to perform. Default is 100. |
numCores |
Integer: The number of cores to use when running permutations. Default is 1. |
With genotype data, the F-statistics FST, FIS, and FIT can be calculated. Only FST can be calculated from allele frequency data.
F-statistics from genotype data are calculated from the variance components 'a', 'b', and 'c', which have been standardised for observed heterozygosity. FST from allele frequency data uses an estimate of the expected heterozygosity.
Permutation tests for genotype data involve random shuffling of individuals among populations, recalculating F-statistics, and testing the hypothesis that the permuted F-statistic > observed F-statistic. The p-value represents the proportion of permutation that were TRUE to this expression. That is, if no permuted values are greater than the observed, p=0. Likewise, if all the permuted values are greater than the observed, p=1.
A list is returned with three indexes.
The first index is $genome
, the genome-wide F-statistics. If global
estimates were requested, global==TRUE
, then this is just a single row;
the estimates across all populations. If pairwise esimates were requested,
pairwise==TRUE
, then there are $POP1
and $POP2
, which
represent two populations tested.
The second index is $locus
, the locus-specific F-statistics. This is
a data table with a $LOCUS
column for global estimates at each locus.
when global==TRUE
. If pairwise population estimates have been requested,
pairwise==TRUE
, then there are $POP1
and $POP2
, which
represent the two populations tested.
The third index is $permute
, the permutation results. This index will
be NULL
when frequencies are used, i.e., type=='freqs'
, and
will only contain data if type=='genos'
and permute==TRUE
.
$permute
is itself a list, with two subindexes:
$fstat
: The permuted F-statistics. If global==TRUE
, then
this will simply be a single row of global estimates. If pairwise==TRUE
,
then this will be a data table with columns $POP1
, $POP2
, and
a column for each F-statistic.
$pval
: The permuted p-values. This is a long-format data table.
If global==TRUE
, then there are two column: $STAT
, which
contains the F-statistic; and $PVAL
, which contains the global
permuted p-value. If pairwise==TRUE
, then there will two additional
columns, $POP1
and $POP2
.
Weir & Cockerham (1984) Evolution. DOI: 10.1111/j.1558-5646.1984.tb05657.x Weir et al. (2002) Annals of Human Genetics. DOI: 10.1146/annurev.genet.36
library(genomalicious)
data(data_Genos)
data(data_PoolFreqs)
data(data_PoolInfo)
# Set genotypes as characters
data_Genos$GT %>% head
data_Genos[, GT:=genoscore_converter(GT)]
data_Genos$GT %>% head
# Set allele counts and individuals in pool-seq data
data_PoolFreqs %>% head
data_PoolInfo %>% head
data_PoolFreqs[, COUNTS:=paste(RO,AO,sep=',')]
data_PoolFreqs$INDS <- data_PoolInfo$INDS[
match(data_PoolFreqs$POOL, data_PoolInfo$POOL)
]
head(data_PoolFreqs)
# Genotypes and global F-statistics
geno_global_f <- fstat_calc(
dat=data_Genos,
type='genos', method='global', fstatVec=c('FST','FIS','FIT'),
popCol='POP', sampCol='SAMPLE',
locusCol='LOCUS', genoCol='GT',
permute=FALSE
)
# Genotypes and pairwise F-statistics
geno_pair_f <- fstat_calc(
dat=data_Genos,
type='genos', method='pairwise', fstatVec=c('FST','FIS','FIT'),
popCol='POP', sampCol='SAMPLE',
locusCol='LOCUS', genoCol='GT',
permute=FALSE
)
# Allele frequencies (from counts) and global FST
freqs_global_f <- fstat_calc(
dat=data_PoolFreqs,
type='freqs', method='global', fstatVec=NULL,
popCol='POP', locusCol='LOCUS',
countCol='COUNTS', indsCol='INDS',
permute=FALSE
)
# Allele frequencies (from counts) and pairwise FST
freqs_pair_f <- fstat_calc(
dat=data_PoolFreqs,
type='freqs', method='pairwise', fstatVec=NULL,
popCol='POP', locusCol='LOCUS',
countCol='COUNTS', indsCol='INDS',
permute=FALSE
)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.