knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
This is a brief introduction to AMBI index calculations with ambiR.
Species counts (or abundance) should be organized in a data frame arranged in long format. That is, species names are in one column and species counts in another column. If the data represents several stations and/or if there are replicates, then the data should include columns with this information.
Looking at the test_data example dataset included with the ambiR package,
one can see how the data should be arranged:
library(ambiR) head(test_data)
If your data look like this, you can go directly to [Calculate AMBI]. If not, the following examples show how to reorganize your data.
Here is an example dataframe where there are counts for species in separate columns:
library(dplyr) library(tidyr) wide_data_species <- test_data %>% pivot_wider(names_from = "species", values_from = "count", values_fill = 0)
head(wide_data_species)
To arrange the data in the correct form, use tidyr::pivot_longer():
# columns 1 and 2 contain station and replicate information # so, select all columns from 3 to be pivoted long_data <- wide_data_species %>% pivot_longer(cols = 3:ncol(wide_data_species), names_to = "species", values_to = "count") head(long_data)
Here is an example where each column contains species counts for a station and replicate. The first row of the table contains the station ID 1, 2, ... and the second row contains the replicate ID a, b, .... The first column of the table contains species names.
df <- test_data %>% pivot_wider(names_from = c("station","replicate"), values_from = "count", values_fill = 0) stns <- names(df)[2:ncol(df)] stns <- stns %>% sapply(strsplit,"_") reps <- stns %>% sapply(function(x){x[2]}) stns <- stns %>% sapply(function(x){x[1]}) n <- 1+length(stns) df2 <- matrix(nrow=2, ncol=n) %>% as.data.frame() df2[1,2:n] <- stns df2[2,2:n] <- reps names(df) <- names(df2) df <- df %>% mutate(across(all_of(names(df)), as.character)) df <- df2 %>% bind_rows(df) names(df) <- rep("", ncol(df)) wide_data_stns <- df
head(wide_data_stns)
Note that if the if the observation data only has stations or replicates, then rearranging the data can be done in one step, as in the previous example. But in this case, the station and replicate information are in separate rows. The restructuring process is a little more complicated.
Before we can use a pivot function, we need to combine the station ID
and replicate ID for each column into a single value. In this example,
each station ID and replicate ID are joined into a single character
value, with an underscore to separate them _. The underscore will be
used again after pivoting the table to identify where to split the
combined station/replicate values back into separate values again.
If there are station names which already contain underscores, then another suitable character should be used when joining and splitting station/replicate IDs.
sep_character <- "_" # get the station IDs from row 1, excluding column 1 (this contains species names) stations <- wide_data_stns[1, 2:ncol(wide_data_stns)] # get the replicate IDs from row 2 replicates <- wide_data_stns[2, 2:ncol(wide_data_stns)] # combine the station and replicate for each column using an underscore station_replicate <- paste(stations, replicates, sep=sep_character) # now we have extracted the station/replicate information, we can drop the # first two rows of the data frame wide_data_stns <- wide_data_stns[3:nrow(wide_data_stns),] # apply the station_replicates as column names names(wide_data_stns) <- c("species", station_replicate) head(wide_data_stns)
We can see that the column names now contain the combined station and replicate information. We are ready to transpose the data.
# column 1 contains species names # so, select all columns from 2 to be pivoted long_data <- wide_data_stns %>% pivot_longer(cols = 2:ncol(wide_data_stns), names_to = "stn_rep", values_to = "count") head(long_data)
Now we can split the stn_rep column into separate columns for station and replicate, We will use the underscore we introduced earlier to identify where the split should occur:
long_data <- long_data %>% separate_wider_delim(cols="stn_rep", delim = sep_character, names=c("station", "replicate")) head(long_data)
Now we are ready to calculate AMBI...
We have now ensured that our species abundance/count data have the
correct structure, as in the example test_data provided:
head(test_data)
Call the AMBI() function:
res <- AMBI(test_data, by="station", var_rep="replicate", var_species="species", var_count="count")
The AMBI() function returns a list of three dataframes:
.$AMBI.$AMBI_rep.$matched.$AMBI- the main results with the AMBI index calculated for each
unique combination of by variables, in our case the results are per
station.
In addition to the AMBI index, the results also include the Shannon diversity
index H' and the Species richness S, the three
metrics which are necessary to calculate M-AMBI.
res$AMBI
.$AMBI_rep - if the observations include replicates, then the
function also returns results calculated for each replicate, within each
unique combination of by variables:
res$AMBI_rep
.$matched - for each observation, this dataframe shows which species
in the AMBI list the observed species was matched with, if any. It
also shows the AMBI species group assigned. This dataframe has the
same number of rows as the input data.
head(res$matched)
For more information about the principles underlying the AMBI calculations and
the grouping of species according to sensitivity to pollution,
see vignette("background").
Calculate M-AMBI the multivariate AMBI index, based on the three separate species diversity metrics:
AMBI.HS.All three indices required to calculate MAMBI()are included in the
results returned by the AMBI() function.
In addition to index values calculated from observed species data, the M-AMBI factor analysis requires values defining the limits for the three metrics, corresponding to the best and worst possible conditions.
See @MUXIKA200716 for more details.
The default limit values used by MAMBI() are:
limits_AMBI <- c("bad" = 6, "high" = 0) limits_H <- c("bad" = 0, "high" = NA) limits_S <- c("bad" = 0, "high" = NA)
By default, upper limit values for H and S are not provided (= NA).
If they are not provided explicitly, then the maximum values found in
the input data will be used. The results returned by MAMBI() show the
limits used.
MAMBI() also returns the status class (Bad, Poor, Moderate, Good or
High) for each M-AMBI index value. To do this, it compares the
calculated M-AMBI index value with values defining the boundaries
between status classes.
PB - Poor / BadMP - Moderate / PoorGM - Good / ModerateHG - High / GoodThe default values for the class boundaries are:
bounds <-c("PB" = 0.2, "MP" = 0.39, "GM" = 0.53, "HG" = 0.77)
MAMBI()We call MAMBI() using the previously generated AMBI results:
res_mambi <- MAMBI(res$AMBI, var_H = "H", var_S = "S", var_AMBI = "AMBI")
In addition to retaining the values for AMBI. H and S, the
results include the following information:
x, y, z - factor scores giving coordinates in the new factor
space.MAMBI - the calculated M-AMBI index valueStatus - the status class for the M-AMBI index valueEQR - the normalised EQR indexThe dataframe returned contains 2 more rows than the input data. Our
input data contained 3 rows (1 for each station). The results contain
5 rows. The 2 additional rows show the limit values for each of the
three metrics used in the M-AMBI calculations.
res_mambi %>% select(station, AMBI, H, S, x, y, z, MAMBI, Status, EQR)
As well as the status class, the function also returns a normalised
index value from 0 to 1 (EQR), calculated using the bounds boundary
values for the M-AMBI index. The following EQR values correspond to
status class boundaries:
0.2 - Poor / Bad0.4 - Moderate / Poor0.6 - Good / Moderate0.8 _ High / GoodFor example, the M-AMBI value for station 3 is 0.478. This lies
between the M-AMBI value corresponding to the Moderate/Poor status
class boundary ("MP" = 0.39) and the M-AMBI value corresponding to the
Good/Moderate status class boundary ("GM" = 0.53).
The normalised EQR value is given by linear interpolation: $$ EQR = 0.4 + (0.6 - 0.4) \frac{0.478 - 0.39}{0.53 - 0.39 } $$
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.