knitr::opts_chunk$set(collapse = T, comment = "#>") options(tibble.print_min = 4L, tibble.print_max = 4L)
assigner::fst_WC84
During computation, the function first starts by keeping only the polymorphic markers in common between the populations.
When your project involves > 2 populations that evolved more by neutral processes
(e.g. genetic drift) than by natural selection, filtering to keep only the markers
in common between your populations will drastically reduce the number of markers,
and consequently your genome coverage for the estimate. Strategically, you should
reduce the threshold of number of populations required to keep a marker
(e.g. use a minimum of 2 populations genotyped for the markers to keep it).
This is easily tested with radiator::filter_common_markers
or in
stacks population module.
The overall fst and fis estimates will have reduce genome-wide coverage, because the computation is conducted with shared markers accross all your populations. However, for the pairwise comparisons, the markers will be tallied automatically for the pair. This usually results in more legitimate markers (i.e. less filtering bias) being used during computation, providing better genome-wide estimates for that pair.
rm(list=ls())
if (!require("pak")) install.packages("pak") pak::pkg_install("thierrygosselin/assigner") library(assigner)
data.fst <- readr::read_tsv(file = "https://raw.githubusercontent.com/thierrygosselin/package_data/master/assigner_data_fst.tsv.gz")
assigner::fst_WC84
first run: overall and pairwise, but without boostrap (i.e. no confidence intervals).
Get the function documentation with: ??assigner::fst_WC84
.
fst <- assigner::fst_WC84( data = data.fst, pop.levels = c("pop1", "pop2", "pop3", "pop4", "pop5", "pop6", "pop7", "pop8", "pop9", "pop10", "pop11"), pairwise = TRUE, filename = "testing_fst", #will trigger the function to write the results in directory as well verbose = TRUE )
After 10 sec max you should get the Fst (overall) value of 0.05
fst
:names(fst)
Get access to specific content with $
:
$sigma.loc
: the variance components per locus:
lsiga
: among populations,
lsigb
: among individuals within populations,
lsigw
: within individuals.
$fst.markers
: the fst by markers
$fst.ranked
: the fst ranked$fst.overall
: the mean fst overall markers and number of markers $fis.markers
: the fis by markers $fis.overall
: the mean fis overall markers and number of markers $fst.plot
: the histogram of the overall Fst per markers$pairwise.fst
: the pairwise fst in long/tidy data frame and number of markers per pairwise comparison$pairwise.fst.upper.matrix
: the pairwise fst in a upper triangle matrix$pairwise.fst.full.matrix
: the pairwise fst matrix (duplicated upper and lower triangle)$pairwise.fst.ci.matrix
: matrix with pairwise fst in the upper triangle
and the confidence intervals in the lower triangle.To view the pairwise Fst values in a tibble with 4 columns:
df <- fst$pairwise.fst
The first 10 parwise comparisons in df
N_MARKERS: is the number of markers used between the pairs to calculate the Fst.
| POP1 | POP2 | FST | N_MARKERS | |:----:|:-----:|:------:|:---------:| | pop1 | pop2 | 0.0442 | 3457 | | pop1 | pop3 | 0.0507 | 3457 | | pop1 | pop4 | 0.0341 | 3457 | | pop1 | pop5 | 0.0300 | 3457 | | pop1 | pop6 | 0.0434 | 3457 | | pop1 | pop7 | 0.0367 | 3457 | | pop1 | pop8 | 0.0256 | 3457 | | pop1 | pop9 | 0.0475 | 3457 | | pop1 | pop10 | 0.0522 | 3457 | | pop1 | pop11 | 0.0473 | 3457 |
The matrix with fst values:
# to see as a tibble: fst.matrix <- tibble::as_tibble(fst$pairwise.fst.full.matrix, rownames = "POP") # to keep matrix: fst.matrix <- fst$pairwise.fst.full.matrix
pairwise.fst.ci.matrix
:fst$pairwise.fst.ci.matrix # you will get this: [1] "confidence intervals not selected"
fst_WC84
, but this time will ask for the confidence intervals and the heatmap:fst.ci <- fst_WC84(data = data.fst, pop.levels = c("pop1", "pop2", "pop3", "pop4", "pop5", "pop6", "pop7", "pop8", "pop9", "pop10", "pop11"), pairwise = TRUE, ci = TRUE, iteration.ci = 100, quantiles.ci = c(0.025, 0.975), parallel.core = 12, heatmap.fst = TRUE, filename = "testing_fst", verbose = TRUE )
`fst.ci$pairwise.fst` is the tibble with the Fst values and now with the confidence intervals.
fst.ci$pairwise.fst.ci.matrix
pairwise.fst.ci.df
:pairwise.fst.ci.df <- tibble::as_tibble(fst.ci$pairwise.fst.ci.matrix, rownames = "POP") # if you have numeric pop_id identifier you might have to do this to get proper column names: colnames(pairwise.fst.ci.df) <- colnames(fst.ci$pairwise.fst.ci.matrix) # to save: readr::write_tsv(x = pairwise.fst.ci.df, path = "pairwise.fst.ci.df.tsv")
For the next steps, we need the full distance matrix created in step 10.
# build the tree: require(ape) tree <- ape::nj(X = fst.matrix) # fst.matrix as a matrix # for annotating bootstraps values on the tree: bootstrap.value <- ape::boot.phylo( phy = tree, x = fst.matrix, FUN = function(x) ape::nj(x), block = 1, B = 10000, trees = FALSE, rooted = FALSE ) # to get percentage values bootstrap.value <- round((bootstrap.value/10000)*100, 0) bootstrap.value # to include in the tree tree$node.label <- bootstrap.value
require(stats) tree <- ape::as.phylo(stats::hclust(stats::dist(fst.matrix), method = "average")) bootstrap.value <- ape::boot.phylo(phy = tree, x = fst.matrix, FUN = function(xx) ape::as.phylo(stats::hclust(stats::dist(xx), method = "average")) , block = 1, B = 10000, trees = FALSE, rooted = TRUE) bootstrap.value <- round((bootstrap.value/10000)*100, 0) bootstrap.value tree$node.label <- bootstrap.value
# get the latest development version of ggtree: if (!require("ggtree")) install_github("GuangchuangYu/ggtree") # If it's not working, use the Bioconductor version: if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager") BiocManager::install("ggtree")
Several vignettes are available to get to know ggtree
Build a very basic tree figure:
require(ggtree) require(ggplot2) tree.figure <- ggplot2::ggplot(tree, ggplot2::aes(x, y), ladderize = TRUE) + ggtree::geom_tree() + # geom_tiplab(size = 3, hjust = -0.05, vjust = 0.5)+ # for just the tip label ggplot2::geom_text(ggplot2::aes(label = label), size = 3, hjust = -0.05, vjust = 0.5) + # for both tips and nodes ggtree::theme_tree() + ggplot2::xlim(0, 0.16) # to allocate more space for tip labels (trial/error) tree.figure ggplot2::ggsave(filename = "assigner.fst.tree.example.pdf", width = 15, height = 15, dpi = 600, units = "cm", useDingbats = FALSE)
Please send me suggestions and bug reports through github
References
Weir BS, Cockerham CC (1984) Estimating F-Statistics for the Analysis of Population Structure. Evolution, 38, 1358–1370.
G Yu, D Smith, H Zhu, Y Guan, TTY Lam, ggtree: an R package for visualization and annotation of phylogenetic tree with different types of meta-data. revised.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.