Nothing
#' Calculates AMBI, the AZTI Marine Biotic Index
#'
#' @description
#' [AMBI()] matches a list of species counts with the official AMBI species list
#' and calculates the AMBI index.
#'
#' @references
#' Borja, Á., Franco, J., Pérez, V. (2000). “A Marine Biotic Index to Establish the Ecological Quality of Soft-Bottom Benthos Within European Estuarine and Coastal Environments.” *Marine Pollution Bulletin* 40 (12) 1100–1114. \doi{doi:10.1016/S0025-326X(00)00061-8}.
#'
#' @details
#'
#' The theory behind the AMBI index calculations and details of the method, as
#' developed by [Borja et al. (2000)](#references),
#'
#' ## AMBI method
#'
#' Species can be matched to one of five groups, the distribution of individuals
#' between the groups reflecting different levels of stress on the ecosystem.
#'
#' * _Group I_. Species very sensitive to organic enrichment
#' and present under unpolluted conditions (initial state). They include the
#' specialist carnivores and some deposit- feeding _tubicolous polychaetes_.
#'
#' * _Group II_. Species indifferent to enrichment, always present in low densities with
#' non-significant variations with time (from initial state, to slight unbalance).
#' These include suspension feeders, less selective carnivores and scavengers.
#'
#' * _Group III_. Species tolerant to excess organic matter enrichment. These species
#' may occur under normal conditions, but their populations are stimulated by
#' organic enrichment (slight unbalance situations). They are surface
#' deposit-feeding species, as _tubicolous spionids_.
#'
#' * _Group IV_. Second-order opportunistic species (slight to pronounced unbalanced
#' situations). Mainly small sized _polychaetes_: subsurface deposit-feeders,
#' such as _cirratulids_.
#'
#' * _Group V_. First-order opportunistic species (pronounced unbalanced
#' situations). These are deposit- feeders, which proliferate in reduced
#' sediments.
#'
#' The distribution of individuals between these ecological groups, according
#' to their sensitivity to pollution stress, gives a biotic index ranging
#' from 0.0 to 6.0.
#'
#' \eqn{Biotic\ Index = 0.0 * f_{I} + 1.5 * f_{II} + 3.0 * f_{III} + 4.5 * f_{IV} + 6.0 * f_V}
#'
#' where:
#'
#' \eqn{f_i} = fraction of individuals in Group \eqn{i \in\{I, II, III, IV, V\}}
#'
#' Under certain circumstances, the AMBI index should not be used:
#'
#' * The percentage of individuals not assigned to a group is higher than 20%
#' * The (not null) number of species is less than 3
#' * The (not null) number of individuals is less than 6
#'
#' In these cases the function will still perform the calculations but will
#' also return a warning.(see below)
#'
#' ## Results
#'
#' The output of the function consists of a list of at least three dataframes:
#'
#' * `AMBI` containing the calculated `AMBI` index, as well as other information.
#' * (`AMBI_rep`) generated only if replicates are used, showing the `AMBI` index
#' for each replicate.
#' * `matched` showing the species matches used.
#' * `warnings` containing any warnings generated regarding numbers of of species
#' or numbers of individuals.
#'
#' ## Species matching and interactive mode
#'
#' The function will check for a species list supplied in the function call
#' using the argument `df_species`, if this is specified. The function will
#' also search for names in the AMBI standard list. After this, if no match
#' is found in either, then the species will be recorded with a an `NA`value
#' for species group and will be ignored in calculations.
#'
#' By calling the function once and then checking the output from this first
#' function call, the user can identify species names which were not matched.
#' Then, if necessary, they can provide or update a dataframe with a list of
#' user-defined species group assignments, before running the function a
#' second time.
#'
#' ### Conflicts
#'
#' If there is a conflict between a user-provided group assignment for a species
#' and the group specified in the AMBI species group information, only one of
#' them will be selected. The outcome depends on a number of things:
#'
#' * some species in the AMBI list are considered _reallocatable_ (RA) - that is,
#' there can be disagreement about which species group they should belong to.
#' For these species, any user-specified groups will replace the default group.
#' * if a species is not _reallocatable_, then any user-specified groups will
#' _by default_ be ignored. However, if the function is called with the argument
#' `groups_strict = FALSE` then the user-specified groups will override AMBI
#' species groups.
#'
#' Any conflicts and their outcomes will be recorded in
#' the `matched` output.
#'
#' ### _interactive_ mode
#'
#' If the function is called using the argument `interactive = TRUE` then the
#' user has an opportunity to _manually_ assign species groups
#' (_I, II, III, IV, V_) for any species names which were not identified.
#' The user does this by typing `1`, `2`, `3`, `4` or `5` and pressing _Enter_.
#' Alternatively, the user can type `0` to mark
#' the species as recognised but not assigned to a group. By typing _Enter_ without
#' any number the species will be recorded as unidentified (`NA`). This is the
#' same result which would have been returned when calling the function in
#' non-interactive mode. There are two other options: typing `s` will display a
#' list of 10 species names which occur close to the unrecognised name when names
#' are sorted in alphabetical order. Entering `s` a second time will display the
#' next 10 names, and so on. Finally, entering `x` will abort the interactive
#' species assignment process. Any species groups assigned manually at this point
#' will be discarded and the calculations will process as in the non-interactive mode.
#'
#' Any user-provided group information will be recorded in the `matched` results.
#'
#' See `vignette("interactive")` for an example.
#'
#' @param df a dataframe of species observations
#'
#' @param by a vector of column names found in `df` by which calculations
#' should be grouped _e.g. c("station","date")_
#'
#' @param var_rep _optional_ column name in `df` which contains the name of
#' the column identifying replicates. If replicates are used,
#' the AMBI index will be calculated for each replicate before
#' an average is calculated for each combination of `by`
#' variables. If the Shannon diversity index `H` is calculated
#' this will be done for species counts collected within `by`
#' groups without any consideration of replicates.
#'
#' @param var_species name of the column in `df` containing species names
#'
#' @param var_count name of the column in `df` containing count/density/abundance
#'
#' @param df_species _optional_ dataframe of user-specified species groups. By default,
#' the function matches species in `df` with the official species
#' list from AZTI. If a dataframe with a user-defined list of
#' species is provided, then a search for species groups will
#' also be made in this list. _see [Details](#species-matching-and-interactive-mode)_.
#'
#' @param var_group_AMBI _optional_ name of the column in `df_species`
#' containing the groups for the AMBI index calculations. These
#' should be specified as integer values from 1 to 7. Any other
#' values will be ignored. If `df_species` is not specified
#' then `var_group_AMBI` will be ignored.
#'
#' @param groups_strict By default, any user-assigned species group which
#' conflicts with an original AMBI group assignment will be
#' ignored and the original group remains unchanged. If the argument
#' `groups_strict = FALSE` is used then user-assigned groups
#' will always override AMBI groups in case of conflict.
#' _DO NOT use this option unless you are sure you know what
#' you are doing! It could invalidate your results._
#'
#' @param quiet warnings about low numbers of species and/or individuals
#' are contained in the `warnings` dataframe. By default
#' (`quiet = FALSE`) these warnings are also shown in the console.
#' If the function is called with the parameter `quiet = TRUE`
#' then warnings will not be displayed in the console.
#'
#' @param interactive (default `FALSE`) if a species name in the input data is not
#' found in the AMBI species list, then this will be seen in
#' the output dataframe `matched`. If _interactive_ mode is
#' selected, the user will be given the opportunity to assign
#' _manually_ a species group (_I, II, III, IV, V_) or to
#' mark the species as _not assigned_ to a species group (see
#' details).
#'
#' @param format_pct (_optional_) By default, frequency results including the
#' fraction of total numbers within each species group are
#' expressed as real numbers . If this is argument is
#' given a positive integer value (e.g. `format_pct = 2`)
#' then the fractions are expressed as percentages
#' with the number of digits shown after the decimal point
#' equal to the number specified. _NOTE_ by formatting as
#' percentages, values are converted to text and may lose
#' precision.
#'
#' @param show_class (default `TRUE`). If `TRUE` then the `AMBI` results will
#' include a column showing the AMBI disturbance classification
#' _Undisturbed_, _Slightly disturbed_, _Moderately disturbed_,
#' or _Heavily disturbed_.
#'
#' @param exact_species_match by default, a family name without _sp._ will
#' be matched with a family name on the AMBI (or
#' user-specified) species list which includes _sp._. If
#' the option `exact_species_match = TRUE` is used, species
#' names will be matched only with identical names.
#'
#' @return a list of dataframes:
#'
#' * `AMBI` : results of the AMBI index calculations. For each unique
#' combination of `by` variables, the following values are calculated:
#' - `AMBI` : the AMBI index value
#' - `AMBI_SD` : sample standard deviation of AMBI _included only
#' when replicates are used_
#' has specified `var_rep`.
#' - `N` : number of individuals
#' - `S` : number of species
#' - `H` : Shannon diversity index *H'*
#' - `fNA` : fraction of individuals _not assigned_, that is, matched to
#' a species in the AMBI species list with *Group 0*. Note that this is
#' different from the number of rows where no match was found. Species not
#' matched are excluded from the totals.
#'
#' * `AMBI_rep` : results of the AMBI index calculations _per replicate_. This
#' dataframe is present only if the observation data includes replicates and
#' the user has specified `var_rep`. Similar to the main `AMBI` result but does
#' not include results for `H` (Shannon diversity index) or for `AMBI_SD`
#' (sample standard deviation of AMBI) which are not estimated at replicate level.
#'
#'
#' * `matched` : the original dataframe with columns added from the species list.
#' Contains the following columns:
#' - `group` : showing the species group. Any species/taxa in `df` which were not
#' matched will have an `NA` value in this column.
#' - `RA` : a value of `1` indicates that the species is _reallocatable_ according to the
#' AMBI list. That is, it could be re-assigned to a different species group.
#' - `source` : this column is included only if a user-specified list was
#' provided `df_species`, or if species groups were assigned interactively.
#' An `"I"` in this column indicates that the group was assigned interactively.
#' A `"U"` shows that the group information came from a user-provided species
#' list. An `NA` value indicates that no interactive or user-provided changes
#' were applied.
#'
#' * `warnings` : a dataframe showing warnings for any combination of `by`
#' variables a warning where
#' - The percentage of individuals not assigned to a group is higher than 20%
#' - The (not null) number of species is less than 3
#' - The (not null) number of individuals is less than 6
#'
#' @seealso [MAMBI()] which calculates _M-AMBI_ the multivariate AMBI
#' index using results of `AMBI()`.
#'
#' @import tidyr
#' @import dplyr
#' @import cli
#' @import utils
#' @importFrom stats sd
#'
#' @examples
#'
#' # example (1) - using test data included with package
#'
#' AMBI(test_data, by = c("station"), var_rep = "replicate")
#'
#'
#' # example (2)
#'
#' df <- data.frame(station = c("1", "1", "2", "2", "2"),
#' species = c("Acidostoma neglectum",
#' "Acrocirrus validus",
#' "Acteocina bullata",
#' "Austrohelice crassa",
#' "Capitella nonatoi"),
#' count = c(2, 4, 5, 3, 7))
#'
#' \donttest{ AMBI(df, by = c("station"))}
#'
#'
#' # example (3) - conflict with AZTI species group
#'
#' df_user <- data.frame(
#' species = c("Cumopsis fagei"),
#' group = c(1))
#'
#' \donttest{AMBI(test_data, by = c("station"), var_rep = "replicate", df_species = df_user)}
#'
#'
#' @export
AMBI <- function(df, by = NULL,
var_rep = NA_character_,
var_species = "species",
var_count = "count",
df_species = NULL,
var_group_AMBI = "group",
groups_strict = TRUE,
quiet = FALSE,
interactive = FALSE,
format_pct = NA,
show_class = TRUE,
exact_species_match = FALSE
){
if(!interactive()){
# if R is not running interactively, then we cannot interact with the user
interactive <- FALSE
}
AMBI_SD <- H <- N <- NNA <- fGroup <- fNA <- NULL
sum_count <- wt <- f <- n_species <- count_0 <- NULL
species <- S <- ambi_group <- species_1 <- NULL
RA <- U <- NULL
var_species_matched <- "species_matched"
fill_val <- 0
format_pct <- ifelse(is.numeric(format_pct),format_pct,NA)
format_pct <- ifelse(as.integer(format_pct)==format_pct,format_pct,NA)
if(!is.na(format_pct)){
fill_val <- percent(fill_val, digits = format_pct)
}
if(is.null(df)){
msg <- paste0("No observation data was provided")
stop(msg)
}
if(!"data.frame" %in% class(df)){
msg <- paste0("AMBI() was expecting the argument df to be a data.frame. You provided a an object of class '", class(df),"'")
stop(msg)
}
missing <- c()
if(is.na(var_rep)){
var_check <- c(by, var_species, var_count)
vars_group <- c(var_group_AMBI, by)
vars_group_f <- vars_group
}else{
var_check <- c(var_rep, by, var_species, var_count)
vars_group <- c(var_rep, var_group_AMBI, by)
vars_group_f <- c(var_group_AMBI, by)
}
for(var in var_check){
if(!var %in% names(df)){
missing <- c(missing, var)
}
}
if(length(missing)>0){
msg <- paste0(missing, collapse="','")
msg <- paste0(length(missing)," column(s) not found in observation data: '", msg, "'")
stop(msg)
}
# eliminate duplicates - this should really be fixed in the data
df_ambi <- AMBI_species() %>%
group_by(species) %>%
slice(1) %>%
ungroup()
names(df_ambi)[names(df_ambi)=="species"] <- var_species
names(df_ambi)[names(df_ambi)=="group"] <- var_group_AMBI
if(is.null(df_species)){
df_species <- df_ambi
}else{
if(!"data.frame" %in% class(df_species)){
msg <- paste0("AMBI() was expecting the argument df_species to be a data.frame. You provided an object of class '", class(df_species),"'")
stop(msg)
}
# checking for required columns in df_species
missing <- c()
for(var in c(var_species, var_group_AMBI)){
if(!var %in% names(df_species)){
missing <- c(missing, var)
}
}
if(length(missing)>0){
msg <- paste0(missing, collapse="','")
msg <- paste0("A user-specified species list was supplied but ",
length(missing),
" column(s) were not found: '",
msg, "'")
stop(msg)
}
# checking if df_species contains "RA" column
# this conflicts with the official species list
if("RA" %in% names(df_species)){
df_species <- df_species %>%
select(-RA)
}
df_species <- df_species %>%
distinct(across(dplyr::all_of(c(var_species,var_group_AMBI))))
df_dup <- df_species %>%
species_check_duplicates(var_species=var_species,
var_group_AMBI=var_group_AMBI)
if(nrow(df_dup)>0){
duplicates <- df_dup %>%
mutate(duplicates=paste0(!!as.name(var_species),
" [Groups ", groups, "]")) %>%
pull("duplicates")
# allow user to correct this is if running interactively?
msg <- paste0(duplicates, collapse=", ")
msg <- paste0("More than one user-assigned species group was specified for ",
ifelse(length(duplicates)>1, length(duplicates), "a"),
" species: ", msg)
stop(msg)
}
df_species <- df_species %>%
mutate(source="U") %>%
bind_rows(df_ambi %>%
mutate(source="AMBI"))
species_conflict <- df_species %>%
distinct(dplyr::across(dplyr::all_of(c(var_species,"group")))) %>%
dplyr::group_by(dplyr::across(dplyr::all_of(c(var_species)))) %>%
summarise(n=n(),.groups="drop") %>%
filter(n>1) %>%
pull(var_species)
df_conflict <- df_species %>%
filter(!!as.name(var_species) %in% species_conflict)
df_species <- df_species %>%
mutate(source=ifelse(source=="AMBI",NA,source))
if(nrow(df_conflict)>0){
df_species <- df_species %>%
filter(!(!!as.name(var_species)) %in% species_conflict)
df_conflict <- df_conflict %>%
group_by(across(all_of(var_species))) %>%
mutate(RA=max(RA,na.rm=T)) %>%
ungroup()
df_conflict <- df_conflict %>%
pivot_wider(names_from = "source", values_from = "group")
df_changed <- df_conflict %>%
filter(RA==1)
df_conflict <- df_conflict %>%
filter(RA==0)
if(nrow(df_changed)>0){
df_changed <- df_changed %>%
mutate(group=U, group_note=paste0("original group: ",AMBI), source="U") %>%
mutate(msg=paste0(col_green(symbol$tick)," {.emph ",species,"} (",
roman(AMBI),")",symbol$arrow_right,"(",
roman(U),")"))
msgs <- df_changed$msg
if(quiet==F){
cli::cli_alert_info("User-assigned group change{?s} applied for {length(msgs)} AMBI species:")
cli::cli_bullets(msgs)
cli::cli_text("")
}
}
if(nrow(df_conflict)>0){
if(groups_strict==FALSE){
res_msg <- col_red(
"{length(msgs)} user-assigned group{?s} override{?s/} existing AMBI group{?s}:")
res_symbol <- col_green(symbol$tick)
df_conflict <- df_conflict %>%
mutate(group=U, group_note=paste0("Overriding AMBI group: ",AMBI), source="U")
}else{
res_msg <- "{length(msgs)} user-assigned group{?s} in conflict with AMBI {?was/were} ignored:"
res_symbol <- col_red(symbol$cross)
df_conflict <- df_conflict %>%
mutate(group=AMBI, group_note=paste0("ignored user group: ",U), source=NA)
}
df_conflict <- df_conflict %>%
rowwise() %>%
mutate(group=ifelse(groups_strict==FALSE,U,AMBI)) %>%
mutate(msg=paste0(res_symbol," {.emph ",!!as.name(var_species),"} (",
roman(AMBI),")",symbol$arrow_right,"(",
roman(U),")")) %>%
ungroup()
msgs <- df_conflict$msg
if(groups_strict==FALSE){
cli::cli_alert_warning(res_msg)
}else{
cli::cli_alert_info(res_msg)
}
cli::cli_bullets(msgs)
cli::cli_text("")
}
df_changed <- df_changed %>%
bind_rows(df_conflict) %>%
select(-c(AMBI,U, msg))
df_species <- df_species %>%
bind_rows(df_changed)
} # nrow(df_conflict)>0
df_species <- df_species %>%
dplyr::group_by(dplyr::across(dplyr::all_of(var_species))) %>%
dplyr::arrange(dplyr::across(dplyr::all_of("source"))) %>%
slice(1) %>%
ungroup()
}
unmatched <- df %>%
distinct(!!as.name(var_species)) %>%
.species_match(df_species,
var_species_obs=var_species,
var_species = var_species,
exact = exact_species_match) %>%
filter(is.na(!!as.name(var_species_matched))) %>%
dplyr::pull(var_species)
unmatched <- sort(unmatched)
user_entry <- rep(NA, length(unmatched))
if(length(unmatched)>0){
if(interactive){
list_ok <- c("I","II","III","IV","V")
msg <- c("{length(unmatched)} species name{?s} {?was/were} not recognised.
These will now be displayed, one at a time.",
"For each species, you can assign it to one of the five
AMBI categories ({.emph I, II, III, IV, V}).
Do this by entering an integer value from {.emph 1} to {.emph {length(list_ok)}}.
Alternatively, you can assign a value of {.emph 0}.
This indicates that the species name is recognised but it is
not possible to assign it to one of the five categories.",
"If {.emph Enter} is pressed, without providing a value,
no change will be made. The species name in question
will be treated as unrecognised.",
"Enter {.emph s} at the prompt to see a list of similar species
names and their corresponding AMBI categories.",
"Enter {.emph x} to abort the interactive species selection.
Any entries made up to that point will be discarded. All
{length(unmatched)} species will be treated as unrecognised.")
cli::cli_par()
cli::cli_h1("Assigning unrecognised species")
cli::cli_end()
for(msgi in msg){
cli::cli_par()
ecolor <- ifelse(length(grep("similar", msgi))>0, "green", "blue")
ecolor <- ifelse(length(grep("abort", msgi))>0, "red", ecolor)
cli_div(theme = list(span.emph = list(color = ecolor)))
cli::cli_ul(msgi)
cli::cli_end()
cli::cli_end()
}
for(ispecies in 1:length(unmatched)){
res <- get_user_entry(ispecies, unmatched[ispecies], list_ok,
df_species, var_species, var_group_AMBI)
if(res >= 0){
user_entry[ispecies] <- res
}else if(res == -99){
# user selected "x" to abort
break
}
}
}else{
# we are not running interactively but if quiet == FALSE then we should
# show unrecognised species names in the console
if(quiet==F){
cli::cli_alert_info("{length(unmatched)} species name{?s} {?was/were} not recognised:")
unmatched <- paste0("{.emph ", unmatched,"}")
cli::cli_ol(unmatched)
}
}
}
unmatched <- unmatched[!is.na(user_entry)]
user_entry <- user_entry[!is.na(user_entry)]
if(length(user_entry)>0){
df_user <- data.frame(v1=unmatched,
v2=user_entry)
names(df_user) <- c(var_species, var_group_AMBI)
df_species <- df_user %>%
mutate(source="I") %>%
bind_rows(df_species) %>%
dplyr::group_by(dplyr::across(dplyr::all_of(var_species))) %>%
dplyr::arrange(dplyr::across(dplyr::all_of("source"))) %>%
slice(1) %>%
ungroup()
}
df <- df %>%
.species_match(df_species,
var_species_obs = var_species,
var_species = var_species,
exact = exact_species_match) %>%
relocate(!!var_species_matched, .after = !!var_species)
df <- df %>%
dplyr::left_join(df_species,
by=dplyr::join_by(!!var_species_matched==!!var_species))
df_matched <- df
df <- df %>%
filter(!is.na(!!as.name(var_group_AMBI)))
# vars_group <- c(var_group_AMBI, by)
if(nrow(df)==0){
df <- NULL
msg <- paste0("No recognised species were found. Please check the ",
cli::style_italic(cli::col_br_blue("matched")), " results or try to run with ",
cli::style_italic(cli::col_br_blue("interactive = "), cli::col_red("TRUE")))
cli::cli_alert_warning(msg)
return(list(matched=df_matched))
}else{
# calculate H'
dfH <- Hdash(df, by=by,
var_species=var_species,
var_count=var_count,
check_species = F)$H
df <- df%>%
mutate(species_1 = ifelse(!!as.name(var_count) > 0, 1, 0))
df <- df %>%
dplyr::group_by(dplyr::across(dplyr::all_of(vars_group))) %>%
dplyr::summarise(sum_count = sum(!!as.name(var_count)),
n_species = sum(species_1, na.rm=T),
.groups="drop") %>%
dplyr::arrange(dplyr::across(dplyr::all_of(by)))
# apply the weighting for Groups I - V
df_multipliers <- data.frame(
ambi_group = c(0,1,2,3,4,5),
wt = c(0, 0, 1.5, 3, 4.5, 6)
)
df <- df %>%
left_join(df_multipliers, by=dplyr::join_by(!!var_group_AMBI==ambi_group))
if(!is.na(var_rep)){
by_rep <- c(by, var_rep)
}else{
by_rep <- by
}
# if a species is not recognised, it is not counted AT ALL
# if a species is recognised (matched to the list of species)
# but does not have a category, then it counts towards the fNA
df <- df %>%
dplyr::mutate(
count_0 = ifelse(is.na(!!as.name(var_group_AMBI)), 0,
ifelse(!!as.name(var_group_AMBI)==0,
sum_count, 0))) %>%
dplyr::group_by(dplyr::across(dplyr::all_of(by_rep))) %>%
dplyr::mutate(f = wt * sum_count / (sum(sum_count, na.rm = T) -
sum(count_0, na.rm = T)))
if(!is.na(var_rep)){
dfFrep <- df %>%
dplyr::group_by(dplyr::across(dplyr::all_of(vars_group))) %>%
dplyr::summarise(
N = sum(sum_count, na.rm = T),
NNA = sum(count_0, na.rm = T),
.groups="drop")
dfFrep <- dfFrep %>%
dplyr::group_by(dplyr::across(dplyr::all_of(by_rep))) %>%
mutate(fGroup = N / (sum(N, na.rm=T)-sum(NNA, na.rm=T))) %>%
ungroup()
dfall_rep <- data.frame(x = c(1,2,3,4,5))
names(dfall_rep) <- var_group_AMBI
dfall_rep <- dfFrep %>%
distinct(dplyr::across(dplyr::all_of(by_rep))) %>%
merge(dfall_rep, all=T)
dfFrep <- dfall_rep %>%
left_join(dfFrep, by=dplyr::all_of(vars_group))
dfFrep <- dfFrep %>%
mutate(NNA=ifelse(is.na(NNA),0,NNA)) %>%
filter(NNA==0) %>%
mutate(fGroup=ifelse(is.na(fGroup),0,fGroup)) %>%
rowwise() %>%
mutate(!! var_group_AMBI := roman(!! as.name(var_group_AMBI))) %>%
ungroup()
if(!is.na(format_pct)){
dfFrep <- dfFrep %>%
mutate(fGroup = percent(fGroup, digits=format_pct))
}
dfFrep <- dfFrep %>%
select(-c(N,NNA)) %>%
pivot_wider(names_from = var_group_AMBI, values_from = fGroup,
values_fill=fill_val)
}
dfF <- df %>%
dplyr::group_by(dplyr::across(dplyr::all_of(vars_group_f))) %>%
dplyr::summarise(
N = sum(sum_count, na.rm = T),
NNA = sum(count_0, na.rm = T),
.groups="drop")
dfF <- dfF %>%
dplyr::group_by(dplyr::across(dplyr::all_of(by))) %>%
mutate(fGroup = N / (sum(N, na.rm=T)-sum(NNA, na.rm=T))) %>%
ungroup()
dfall <- data.frame(x = c(1,2,3,4,5))
names(dfall) <- var_group_AMBI
dfall <- dfF %>%
distinct(dplyr::across(dplyr::all_of(by))) %>%
merge(dfall, all=T)
dfF <- dfall %>%
left_join(dfF, by=dplyr::all_of(vars_group_f))
dfF <- dfF %>%
mutate(NNA=ifelse(is.na(NNA),0,NNA)) %>%
filter(NNA==0) %>%
mutate(fGroup=ifelse(is.na(fGroup),0,fGroup)) %>%
rowwise() %>%
mutate(!! var_group_AMBI := roman(!! as.name(var_group_AMBI))) %>%
ungroup()
if(!is.na(format_pct)){
dfF <- dfF %>%
mutate(fGroup = percent(fGroup, digits=format_pct))
}
dfF <- dfF %>%
select(-c(N,NNA)) %>%
pivot_wider(names_from = var_group_AMBI, values_from = fGroup,
values_fill=fill_val)
# explanation needed in documentation
# that f I-V is excluding NA group values
# fNA is based on all counts
df <- df %>%
dplyr::summarise(
AMBI = sum(f, na.rm=T),
N = sum(sum_count, na.rm = T),
S = sum(n_species, na.rm=T),
fNA = sum(count_0, na.rm = T) / sum(sum_count, na.rm = T),
.groups="drop") %>%
dplyr::mutate(AMBI=ifelse(S==0, 7, AMBI))
if(!is.na(var_rep)){
dfrep <- df %>%
left_join(dfFrep, by=all_of(by_rep)) %>%
relocate(S, fNA, N, .after = AMBI)
if(!is.na(format_pct)){
dfrep <- dfrep %>%
mutate(fNA = percent(fNA, digits=format_pct))
}
df <- df %>%
dplyr::group_by(dplyr::across(dplyr::all_of(by))) %>%
dplyr::summarise(
AMBI_SD = stats::sd(AMBI, na.rm=T),
AMBI = mean(AMBI, na.rm=T),
fNA = sum(fNA * N, na.rm = T) / sum(N, na.rm = T),
.groups="drop")
}else{
dfrep <- NULL
df <- df %>%
select(dplyr::all_of(c(by, "AMBI", "fNA")))
}
if(is.null(by)){
df <- df %>%
bind_cols(dfH)
df <- df %>%
bind_cols(dfF)
}else{
df <- df %>%
left_join(dfH, by=all_of(by))
df <- df %>%
left_join(dfF, by=all_of(by))
}
df <- df %>%
relocate(H, S, .after = AMBI)
if("AMBI_SD" %in% names(df)){
df <- df %>%
relocate(AMBI_SD, .after=AMBI)
}
dfwarn <- ambi_warnings(df, by, quiet=quiet)
if(!is.na(format_pct)){
df <- df %>%
mutate(fNA = percent(fNA, digits=format_pct))
}
} # no species matched
if(show_class==T){
df <- df %>%
mutate(Disturbance=.ambi_class(AMBI))
}
if(!is.null(dfwarn)){
if(is.na(var_rep)){
return(list(AMBI=df, matched=df_matched, warnings=dfwarn))
}else{
return(list(AMBI=df, AMBI_rep=dfrep,
matched=df_matched, warnings=dfwarn))
}
}else{
if(is.na(var_rep)){
return(list(AMBI=df, matched=df_matched))
}else{
return(list(AMBI=df, AMBI_rep=dfrep,
matched=df_matched))
}
}
}
# auxiliary functions
species_check_duplicates <- function(df,
var_species,
var_group_AMBI){
df <- df %>%
dplyr::distinct(across(dplyr::all_of(c(var_species, var_group_AMBI))))
duplicates <- df %>%
dplyr::group_by(across(dplyr::all_of(var_species))) %>%
dplyr::summarise(n=n(), .groups="drop") %>%
filter(n>1) %>%
select(-n) %>%
pull(var_species)
if(length(duplicates)>0){
df <- df %>%
filter(!!as.name(var_species) %in% duplicates) %>%
arrange(across(dplyr::all_of(c(var_species, var_group_AMBI))))
df <- df %>%
group_by(across(dplyr::all_of(var_species))) %>%
dplyr::summarise(groups = paste0(!!as.name(var_group_AMBI), collapse=", "),
n=n(), .groups="drop")
}else{
df <- data.frame()
}
return(df)
}
list_similar <- function(df, species, n0, n, var_species, var_group_AMBI){
spc <- X <- NULL
maxchar <- df %>% pull(var_species) %>% nchar() %>% max()
species <- strsplit(species, " ") %>% unlist()
species <- species[1]
df_match <- data.frame(v1 = species, v2="X")
names(df_match) <- c(var_species, "X")
df <- df %>%
bind_rows(df_match) %>%
dplyr::arrange(dplyr::across(dplyr::all_of(var_species)))
df$row <- 1:nrow(df)
id <- df %>%
filter(X=="X") %>%
pull("row")
n0 <- ifelse(n0<=1,0,1) + id + n0
df <- df %>%
filter(is.na(X)) %>%
filter(row>=n0) %>%
slice(1:n) %>%
select(dplyr::all_of(c(var_species, var_group_AMBI)))
if(nrow(df)>0){
df <- df %>%
mutate(n = nchar(!!as.name(var_species)))
n1 <- max(df$n) + 1
df <- df %>%
mutate(n= 1+ maxchar -n)
df <- df %>%
rowwise() %>%
mutate(spc = strrep("\u00a0", n)) %>%
mutate(s = paste0(" {.emph ", !!as.name(var_species),"}", spc,
" (Group ",!!as.name(var_group_AMBI),")"))
similar <- df$s
}else{
similar <- c()
}
return(similar)
}
get_user_entry <- function(i, name, list, df_species,
var_species="species",
var_group_AMBI="group",
nsimilar=10){
n0similar <- -1
x <- NA
list_ok <- 1:length(list)
cli::cli_h2("{i} {.emph {name}}")
msg <- c("enter an integer value from {.emph 0} to {.emph {length(list)}} or press {.emph Enter} to skip. ",
"{.emph ", col_green("s"), "} - see similar names. ",
"{.emph ", col_red("x"), "} - abort interactive assignment.")
list_ok <- c(0, list_ok)
list_ok <- as.character(list_ok)
cli_div(theme = list(span.emph = list(color = "blue")))
cli::cli_alert_info(msg)
cli::cli_end()
while(is.na(x)){
x <- readline(prompt = ">> ")
if(tolower(x)=="s"){
# user entered "s" - show similar species
similar <- list_similar(df_species, name,
n0similar, nsimilar,
var_species, var_group_AMBI)
if(length(similar)>0){
n0similar <- n0similar + nsimilar
if(n0similar <= 0){
cli::cli_inform("Names occurring close to {.emph {name}} when arranged alphabetically...")
}
cli::cli_ul(similar)
if(length(similar) < nsimilar){
cli::cli_inform("...reached end of list")
}
}
}
if(x==""){
# user pressed Enter without entering any other text
cli::cli_alert_info("{.emph {name}} - skipped")
x <- -1
}else if(tolower(x)=="x"){
# user entered "x" - this aborts interactive species assignment
cli::cli_alert_warning(col_red("interactive species assignment cancelled"))
x <- -99
}else{
if(x %in% list_ok){
x <- as.numeric(x)
selected <- c("Not assigned", list)[x+1]
selected <- ifelse(x==0, selected, paste0("Category ",selected))
msg_ok <- "{.emph {name}} - {selected}"
cli::cli_alert_success(msg_ok)
}else{
if(tolower(x)!="s"){
cli_div(theme = list(span.emph = list(color = "red")))
cli::cli_alert_danger("invalid selection: {.emph {x}}")
cli::cli_end()
}
cli_div(theme = list(span.emph = list(color = "blue")))
cli::cli_alert_info(msg)
cli::cli_end()
x <- NA
}
}
}
return(x)
}
# auxiliary function to convert numeric group to roman numerals
roman <- function(n, roman_numbers=c("I","II","III","IV","V"),
zero="not assigned"){
if(n==0){
return(zero)
}else{
return(roman_numbers[n])
}
}
# auxiliary function to give warnings for numbers of individuals and species
ambi_warnings <- function(df, by, quiet=F){
fNA <- S <- N <- warnpctNA <- warnS <- warnN <- warningtype <- NULL
warningNA = "The percentage of individuals not assigned to a group is higher than 20%"
warningS = "The (not null) number of species is less than 3"
warningN = "The (not null) number of individuals is less than 6"
df <- df %>%
mutate(warnpctNA = ifelse(fNA>0.2,
paste0(warningNA, " [", round(100*fNA, 1),"%]."),
NA_character_)) %>%
mutate(warnS = ifelse(S<3,paste0(warningS," [",S,"]."),
NA_character_)) %>%
mutate(N = N * (1-fNA)) %>%
mutate(warnN = ifelse(N<6, paste0(warningN," [",N,"]."),
NA_character_))
by <- c(by, "warnpctNA", "warnS", "warnN")
df <- df %>%
select(all_of(c(by))) %>%
pivot_longer(cols=c(warnpctNA, warnS, warnN), names_to = "warningtype",
values_to = "warning", names_prefix = "warn")
df <- df %>%
select(-warningtype) %>%
filter(!is.na(warning))
nc <- ncol(df)
if(nrow(df)>0){
if(quiet!=T){
for(i in 1:nrow(df)){
info <- rep(NA_character_, nc-1)
for(j in 1:(nc-1)){
info[j] <- paste0(names(df)[j], " ", df[i, j])
}
info <- paste0(info, collapse = ", ")
cli::cli_warn(paste0(info,": ",df[i, "warning"]))
}
}
return(df)
}else{
return(NULL)
}
}
# return the ambi status description for a single ambi index value
.ambi_class_single <- function(ambi){
ambi <- as.numeric(ambi)
classes <- c("Undisturbed", "Slightly disturbed", "Moderately disturbed", "Heavily disturbed")
bounds <- c(1.2, 3.3, 5)
n <- 1 + length(bounds[bounds < ambi])
class <- ifelse(is.na(ambi), NA_character_, classes[n])
return(class)
}
# wrapper to return ambi status for a vector of ambi indices
.ambi_class <- function(ambi){
ambi_class <- lapply(ambi, .ambi_class_single)
ambi_class <- unlist(ambi_class)
return(ambi_class)
}
utils::globalVariables(c(":="))
percent <- function(x, digits = 3, format = "f", ...) {
# Create user-defined function
paste0(formatC(x * 100, format = format, digits = digits, ...), "%")
}
# ---------------- species name matching ----------------------------
.species_match_single<- function(species, species_list, exact=F){
res <- NULL
species <- ifelse(is.na(species),"",species)
res <- species_list[species_list==species]
if(length(res)==0){
if(exact!=T){
species_list2 <- .firstCap(species_list)
species_list2 <- sub(" sp\\.","", species_list2)
species <- .firstCap(species)
species2 <- sub(" sp\\.","", species)
species2 <- sub(" spp\\.","", species2)
res <- species_list[species_list2==species2]
if(length(res)==0){
res <- NA_character_
}
}else{
res <- NA_character_
}
}
if(length(res)>1){
n_res <- length(res)
msg <- paste0(res, collapse = ", ")
msg <- paste0(species, " had ", n_res, " matches: ", msg)
cli::cli_warn(msg)
res <- res[1]
}
return(res)
}
.species_match <- function(df_obs, df_species,
var_species_obs="species",
var_species="species",
exact=F){
species_match <- NULL
species_list <- NULL
species_list <- df_species %>%
pull(var_species) %>%
unique()
species_match <- df_obs %>%
pull(var_species_obs) %>%
lapply(.species_match_single, species_list, exact) %>%
unlist()
df_obs$species_matched <- species_match
return(df_obs)
}
.firstCap <- function(x) {
paste0(toupper(substring(x,1,1)),
tolower(substring(x,2)))
}
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.