Introduction to ecodive
In ecodive: Parallel and Memory-Efficient Ecological Diversity Metrics

Introduction

Ecodive calculates ecological diversity metrics. Alpha diversity metrics provide insight about a single sample's diversity, whereas beta diversity metrics indicate how different a pair of samples are from each other.

In this guide, we'll use the ex_counts dataset included with ecodive. ex_counts is a feature table that enumerates how many times each bacterial genera was observed on different body sites.

library(ecodive)

ex_counts
#>                   Saliva Gums Nose Stool
#> Streptococcus        162  793   22     1
#> Bacteroides            2    4    2   611
#> Corynebacterium        0    0  498     1
#> Haemophilus          180   87    2     1
#> Propionibacterium      1    1  251     0
#> Staphylococcus         0    1  236     1

In this example, the 'features' in our feature table are genera. However, your own dataset can use whatever feature makes sense - species, OTUs, ASVs, or even something completely unrelated to ecology.

Alpha Diversity

Alpha diversity metrics describe how many different genera are present in a sample. Depending on the metric, this can take into account the number of unique genera (richness), how evenly the population is split among genera (evenness), or how distantly related the genera are (phylogenetic diversity).

Classic metrics: chao1(), shannon(), simpson(), inv_simpson()
Phylogenetic metrics: faith()
Further reading: vignette('adiv')

Beta Diversity

Beta diversity metrics describe how different two samples are, based on the genera observed in each. Also known as "distance" or "dissimilarity". UniFrac metrics incorporate a phylogenetic tree into this calculation.

Classic metrics: bray_curtis(), canberra(), euclidean(), gower(), jaccard(), kulczynski(), manhattan()
Phylogenetic metrics: unweighted_unifrac(), weighted_unifrac(), weighted_normalized_unifrac(), generalized_unifrac(), variance_adjusted_unifrac()
Further reading: vignette('bdiv') and vignette('unifrac').

Example

Rarefaction

The ex_counts feature table has 345 saliva observations, but nose has 1011 observations. This unequal sampling depth can cause systematic biases. Specifically, rare genera will be observed more often in samples with greater sampling depths, thereby artificially inflating the observed richness.

The first step then is to rarefy ex_counts so that all samples have the same number of observations. Rarefying randomly removes observations from samples with more observations.

colSums(ex_counts)
#> Saliva   Gums   Nose  Stool 
#>    345    886   1011    615 

counts <- rarefy(ex_counts)

colSums(counts)
#> Saliva   Gums   Nose  Stool 
#>    345    345    345    345 

counts
#>                   Saliva Gums Nose Stool
#> Streptococcus        162  309    6     1
#> Bacteroides            2    2    0   341
#> Corynebacterium        0    0  171     1
#> Haemophilus          180   34    0     1
#> Propionibacterium      1    0   82     0
#> Staphylococcus         0    0   86     1

Classic Metrics

These alpha and beta diversity metrics have been around for 50+ years and don't require a phylogenetic tree. The beta diversity functions can take a weighted = FALSE argument to use only presence/absence information instead of relative abundances.

## Alpha Diversity -------------------

shannon(counts)
#>     Saliva       Gums       Nose      Stool 
#> 0.74119910 0.35692121 1.10615349 0.07927797 


## Beta Diversity --------------------

bray_curtis(counts)
#>          Saliva      Gums      Nose
#> Gums  0.4260870                    
#> Nose  0.9797101 0.9826087          
#> Stool 0.9884058 0.9884058 0.9913043

bray_curtis(counts, weighted = FALSE)
#>          Saliva      Gums      Nose
#> Gums  0.1428571                    
#> Nose  0.5000000 0.7142857          
#> Stool 0.3333333 0.2500000 0.3333333

Phylogenetic Metrics

A phylogenetic tree enables alpha and beta diversity metrics to take into account evolutionary relatedness between the observed genera, generally giving higher diversity values for samples with more distantly related genera. Faith (for alpha diversity) and UniFrac (for beta diversity) are examples of phylogenetic metrics.

The ex_tree object included with ecodive provides the phylogenetic tree for the genera in ex_counts. For your own datasets, you can use ecodive's read_tree() function to import a phylogenetic tree from a newick formatted string or file.

## Alpha Diversity -------------------

faith(counts, tree = ex_tree)
#> Saliva   Gums   Nose  Stool 
#>    180    155    101    202 


## Beta Diversity --------------------

weighted_normalized_unifrac(counts, tree = ex_tree)
#>          Saliva      Gums      Nose
#> Gums  0.4328662                    
#> Nose  0.7928701 0.6767840          
#> Stool 0.9677535 0.9829736 0.9936121

Distance Matrices

Beta diversity functions return a dist object. You can convert this to a standard R matrix with the as.matrix() function.

dm <- bray_curtis(counts, weighted = FALSE)
dm
#>          Saliva      Gums      Nose
#> Gums  0.1428571                    
#> Nose  0.5000000 0.7142857          
#> Stool 0.3333333 0.2500000 0.3333333

mtx <- as.matrix(dm)
mtx
#>           Saliva      Gums      Nose     Stool
#> Saliva 0.0000000 0.1428571 0.5000000 0.3333333
#> Gums   0.1428571 0.0000000 0.7142857 0.2500000
#> Nose   0.5000000 0.7142857 0.0000000 0.3333333
#> Stool  0.3333333 0.2500000 0.3333333 0.0000000

mtx['Saliva', 'Nose']
#> [1] 0.5