Here we'll use the ex_counts
feature table included with ecodive. It contains
the number of observations of each bacterial genera in each sample. In the text
below, you can substitute the word 'genera' for the feature of interest in your
own data.
library(ecodive) counts <- rarefy(ex_counts) counts #> Saliva Gums Nose Stool #> Streptococcus 162 309 6 1 #> Bacteroides 2 2 0 341 #> Corynebacterium 0 0 171 1 #> Haemophilus 180 34 0 1 #> Propionibacterium 1 0 82 0 #> Staphylococcus 0 0 86 1
Beta diversity is a measure of how different two samples are.
Looking at the counts
matrix above, you can easily see that saliva and gums
are similar, while saliva and stool are different. The different metrics
described below quantify that difference, referred to as the "distance" or
"dissimilarity" between a pair of samples. The distance is 0
for identical
samples and 1
for completely different samples.
The classic algorithms all run in weighted mode by default. Specifying weighted
= FALSE
, e.g. canberra(counts, weighted = FALSE)
will switch them to
unweighted mode.
bray_curtis()
, canberra()
, euclidean()
, gower()
, jaccard()
, kulczynski()
, manhattan()
For the UniFrac algorithms, unweighted_unifrac()
is unweighted and all the others are weighted.
Unweighted: unweighted_unifrac()
Weighted: weighted_unifrac()
, weighted_normalized_unifrac()
, generalized_unifrac()
, variance_adjusted_unifrac()
The default value of pairs=NULL
in ecodive's beta diversity functions results
in the returned all-vs-all distance matrix being completely filled in.
bray_curtis(counts) #> Saliva Gums Nose #> Gums 0.4260870 #> Nose 0.9797101 0.9826087 #> Stool 0.9884058 0.9884058 0.9913043
If you are doing a reference-vs-all comparison, you can use the pairs
parameter to skip unwanted calculations and save some CPU time. The larger the
dataset, the more noticeable the improvement will be.
bray_curtis(counts, pairs = 1:3) #> Saliva Gums Nose #> Gums 0.4260870 #> Nose 0.9797101 NA #> Stool 0.9884058 NA NA
The pairs
argument can be:
function(i,j)
that returns whether columns i
and j
should be compared.Therefore, all of the following are equivalent:
bray_curtis(counts, pairs = 1:3) bray_curtis(counts, pairs = c(TRUE, TRUE, TRUE, FALSE, FALSE, FALSE)) bray_curtis(counts, pairs = function (i, j) i == 1)
The ordering of pairs
follows the pairings produced by combn()
.
# Column index pairings combn(ncol(counts), 2) #> [,1] [,2] [,3] [,4] [,5] [,6] #> [1,] 1 1 1 2 2 3 #> [2,] 2 3 4 3 4 4 # Sample name pairings combn(colnames(counts), 2) #> [,1] [,2] [,3] [,4] [,5] [,6] #> [1,] "Saliva" "Saliva" "Saliva" "Gums" "Gums" "Nose" #> [2,] "Gums" "Nose" "Stool" "Nose" "Stool" "Stool"
So, for instance, to use gums as the reference sample:
my_combn <- combn(colnames(counts), 2) my_pairs <- my_combn[1,] == 'Gums' | my_combn[2,] == 'Gums' my_pairs #> [1] TRUE FALSE FALSE TRUE TRUE FALSE bray_curtis(counts, pairs = my_pairs) #> Saliva Gums Nose #> Gums 0.4260870 #> Nose NA 0.9826087 #> Stool NA 0.9884058 NA
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.