# This script defines the function "est_iec" which calculates Index of
# Ecological Condition (IEC) scores. The function "est_iec"
# ("iec_site_score" at the time) was originally used for Erin E. G. Giese's
# thesis (2012). Giese's thesis used the script
# "IEC_Application_NoForLoopExp20120328.r" which this current function is based
# on. The purpose of this script is to estimate IEC scores based on Biotic
# Response (BR) functions of species or other taxa for many sites at once.
# BR functions can be calculated using function "est_brc".
#
# File used for E. Giese's thesis: "IEC_Application_NoForLoopExp20120328.r"
# File "IEC_Builder20120328.r" was based on March Excel files but most closely
# resembles April files (right? based on BRC script comments).
#
# Input:
# "method" is used to set the criteria to use for estimating IEC. "method"
# can be set to "pa" (default), or "q". Method "pa" is used
# where present/absence data are available, and method "q" is used if
# probability data are available.
# Note that method "pa" corresponds to the Excel tab "CalcPA" (formerly
# "CalcLik"), and method "q" corresponds to Excel tab "CalcQ" (formerly
# "CalcLSq").
#
# Note: Future versions of "est_iec" may be written so that only species in
# "brc" are used for scoring sites.
#
# Structure of function "est_iec":
# * "Error checking" ensures that input variables "sp" and "brc"
# are both data frames, and that they contain the same species. If any of
# these criteria are not met, an error is raised and the function exits with
# a message describing the issue.
# * "Declarations" contains variables that will be used later in the
# function. In general, these are the only variables you are likely to
# modify in this script.
# * "Main Function" contains function "f" which returns the likelihood
# function which is minimized by "nlminb". Note that Dr. Howe's MS Excel
# based versions of this function maximize this function. Because "nlminb"
# can only minimize functions, the function here has the sign reversed
# during analysis, but the sign is corrected before adding it to the output
# so the numbers generated are comparable to those generated by Solver in
# Excel.
# * "for loop over each site" contains a for loop that processes
# each taxa in turn. Each taxa has function "nlminb" called on it the
# number of times specified by input variable "n_reps". "nlminb" tries to
# minimize the likelihood function returned by function "f".
#
# Authors: Nicholas G. Walton and Robert W. Howe
#
# Created: Mar 2013(?)
# Last modified: 25 Sept 2015
#' Estimate IEC site scores.
#'
#' \code{est_iec} estimates the Index of Ecological Condition (IEC) for each
#' site in \code{sp} based on the biotic response curves (BRC) in \code{brc}.
#'
#' An Index of Ecological Condition (IEC) score is estimated for each site (row)
#' in data frame \code{sp}. This score is based on the Biotic Response functions
#' (curves) in \code{brc} (output from \code{\link{est_brc}}). The column names
#' in \code{sp} and column 1 of \code{brc} must contain the same taxa in the
#' same order.
#'
#' The default method treats observations as presence/absence (\code{method =
#' "pa"}). When using \code{pa}, it is essential to use BR functions based on
#' probability or probability-like (e.g., scaled to 0-1) observation data
#' (\code{\link{est_brc}}). Note that input records do not need to be coded as
#' presence/absence as the method simply checks for 0 or > 0.
#'
#' When quantitative observations are available and were used to generate BR
#' functions, set \code{method} to \code{"q"}. As with \code{"pa"} it is
#' important to match the type of data used to score the sites with the type of
#' data used to estimate the BR functions. For example, if the data used to
#' construct the BR functions in \code{est_brc} were probabilities, then the
#' data used to score the sites (\code{sp}) should also be probabilities If BR
#' functions were constructed using log transformed (e.g.
#' \code{\link[base]{log1p}}) count data, then the observations used to score
#' the sites in \code{est_iec} should be transformed in the same way.
#'
#' \code{n_reps} determines the number of times the
#' optimization function (\code{\link[stats]{nlminb}}) is run with different
#' starting values. \code{n_reps} can be set to any positive integer, but the
#' authors recommend against using values of less than 10 except for testing
#' purposes. Larger values will take longer to process. The default value is
#' 30. If \code{keep_zeros = TRUE} (default), all taxa in \code{brc} are used
#' for scoring the sites. But if \code{keep_zeros = FALSE}, only those taxa
#' detected at a given site are used to score that site. With the latter
#' option, the number of taxa used to score each site may vary.
#'
#' @param brc BRC data frame generated by \code{\link{est_brc}}.
#' @param method one of \code{"pa"} (default; CalcPA) or \code{"q"} (CalcQ).
#' @param n_reps scalar integer setting the number of random starts for
#' optimization (default is 30).
#' @param keep_zeros logical indicating if all taxa are used in scoring
#' (\code{TRUE}; default) or only detected taxa.
#' @inheritParams est_brc
#' @return A data frame containing the IEC scores for each site in \code{sp} and
#' the corresponding likelihood of the score.
#' @references Gnass Giese, E.E., R.W. Howe, A.T. Wolf, N.A. Miller, and N.G.
#' Walton. 2015. Sensitivity of breeding birds to the "human footprint" in
#' western Great Lakes forest landscapes. Ecosphere 6(6):90.
#' http://dx.doi.org/10.1890/ES14-00414.1
#'
#' Howe, R.W., R. R. Regal, J.M. Hanowski, G.J. Niemi, N.P. Danz, and C.R.
#' Smith. 2007a. An index of ecological condition based on bird assemblages
#' in Great Lakes coastal wetlands. Journal of Great Lakes Research 33
#' (Special Issue 3): 93-105.
#'
#' Howe, R.W., R. R. Regal, G.J. Niemi, N.P. Danz, J.M. Hanowski. 2007b. A
#' probability-based indicator of ecological condition. Ecological Indicators
#' 7:793-806.
#' @seealso \code{\link{est_brc}} to generate biotic response curves.
est_iec <- function(sp, brc, method = "pa", n_reps = 30, keep_zeros = TRUE) {
# The input "sp" is a data frame containing the observations for each
# species or other taxa (abundance or presence/absence) at each site.
# The column names of "sp" must be the species or other taxa.
# All rows must contain the observations of each species at each site
# (one column per species). Row names are site names.
# Example of "sp":
# row.names Sp(1) Sp(2) Sp(?) Sp(n)
# Site(1) 0 0 0 0
# Site(2) 1 0 1 0
# Site(?) 0 0 0 0
# Site(n) 0 3 0 0
## Error checking ----
# Check that the input "sp" and "brc" are both data frames.
if (!is.data.frame(sp) | !is.data.frame(brc)) {
stop("Inputs 'sp' and 'brc' must be data frames.")
}
# Check that "sp" and "brc" contain the same number of taxa.
if (ncol(sp) != nrow(brc)) {
stop("Species observation and BRC data frames differ in number of taxa.")
}
# Check that "sp" and "brc" contain the same taxa.
if (!identical(as.character(names(sp)), as.character(brc[, 1]))) {
stop(paste("Data frames 'sp' and 'brc' do not contain the same taxa \n",
"or are not ordered the same."))
}
# Check that "method" has been set correctly.
if (!method %in% c("q", "pa")) {
stop("Method must be 'pa' (default) or 'q'.")
}
## Declarations ----
# Generate a data frame to hold the output IEC site score data;
# an empty data frame with 3 columns - data type defined.
iec_scores <- data.frame(character(0), rep(list(numeric(0)), 2),
stringsAsFactors = FALSE)
names(iec_scores) <- c("Site", "IEC", "LogLik")
# Make an original copy of "brc" if zeros are to be dropped from each site
if (!keep_zeros) brc_raw <- brc
# Function to generate stratified random selection of starting IEC values.
get_strat <- function(n) {
# Generalized to allow for other numbers to be set for "n" ("n_reps").
# "n" does not have to be a multiple of 10.
# Note that it's ok to set "n_reps" as low as 10,
# and it can be set even lower if you're just running tests.
if (n < 10) {
int <- 10/n
llimit <- sapply(0:(n - 1), function(x) x * int)
strat <- sapply(llimit, function(x) runif(1, min = x, max = x + int))
} else {
# Lower limit for each stratum (upper will be llimit + 1).
llimit <- 0:9
# This is the number of random values that will be selected from
# each stratum.
pick <- n %/% length(llimit)
# Select stratified random values.
strat <- sapply(llimit, function(x) runif(pick, min = x, max = x + 1))
# If n has been set to a value that is not a multiple of 10,
# the remainder will be filled using random values in [0, 10].
# Note that only this part will be used to assign starting IEC values
# if n is set to less then length(llimit) (AKA 10).
if (length(strat) < n) {
remain <- n - length(strat)
strat[(length(strat) + 1):n] <- runif(remain, min = 0, max = 10)
}
}
strat # Return the vector of starting values for "nlminb".
}
# Set method/criteria for estimating IEC
# Return function for "q" or "pa" based on "method"
# Method must be set to "q" or "pa"
if (method == "q") {
criteria <- function(pc, observed) {
# Least-squares method (AKA "CalcQ").
# Note that currently the output from this will have the
# opposite sign of that from the Excel spread sheet.
# This needs to be fixed, but will also require modifications to "pa".
numerator <- (observed - pc) ^ 2
sum(numerator / pc) # (Obs-Exp)^2 / Exp
}
} else {
criteria <- function(pc, observed) {
# Likelihood method (AKA "pa").
# Calculate lack-of-fit expression ("lof") for each species.
lof <- rep(NA, length(pc)) # set lof to length of pc.
# Set pc > 1 to 1
pc[pc > 1] <- 1
# If species is present at the current site, log set to Log P(C).
lof[observed > 0] <- log10(pc)[observed > 0]
# If species is absent, set to Log (1-P(C)).
# Per the Excel file, it's really Log(1.0001-P(C)) to avoid log of 0.
lof[is.na(lof)] <- log10(1.0001 - pc)[is.na(lof)]
# Return the negative sum of the lof.
# I'm using the negative because the original function in Excel was
# set to maximize, but "nlminb" can only minimize a function.
# The result is the same.
-sum(lof)
}
}
## Main Function ----
f <- function(x) {
# This function returns the function which
# will be minimized with function "nlminb".
# "x" is the current IEC score being tried by "nlminb".
# Note that "observed" is called from outside the function, but there
# doesn't seem to be a better way to do this with "nlminb".
# Calculate P(C) for each species.
# In the original version, P(C) was constrained to > 0 and <= 1.
# P(C) is the probability or value of each species at given IEC.
# Input values "Mean", "SD", and "H" are part of data frame "brc".
pc <- with(brc, {dnorm(x, mean = Mean, sd = SD) * H})
pc[pc < .001] <- .001 # Set 0 probabilities to .001.
# criteria is set in the Declarations section of this script.
# It is set to either "pa" (default) or "q".
criteria(pc, observed)
}
## for loop over each site ----
# This for loop iteratively processes each site in the data frame "sp".
for (site in 1:nrow(sp)) {
# A list to contain the best solution found by "nlminb".
best <- NULL
# Extract the "observed" responses for the current site.
# If "keep_zeros" is TRUE (default), all taxa are used for scoring.
# Else, only those taxa that were detected are used for scoring.
if (keep_zeros) {
observed <- sp[site, ]
} else {
# "keep" contains the indices for detected species.
keep <- which(sp[site, ] > 0)
# If there are no species detected, set iec attributes to NA and
# go to the next site
if (length(keep) == 0) {
iec_scores[site, ] <- list(row.names(sp)[site], NA_real_, NA_real_)
next
}
# subset "observed" and "brc" to the detected species
observed <- sp[site, keep]
brc <- brc_raw[keep, ]
}
# Function "get_strat" is defined in the "Declarations" section.
iec_start <- get_strat(n_reps)
# Estimate IEC for current site using "n_reps" different starting values.
for (i in 1:n_reps) {
# "n_reps" is set in the call to function "siteScore" and
# defaults to 30.
# For help with function "nlminb" type "?nlminb" in R.
# This is the optimization function.
# The result is constrained to be between 0 and 10 inclusive.
result <- nlminb(start = iec_start[i], objective = f,
lower = 0, upper = 10)
# Check if the current random starting value resulted in an
# improved estimate of IEC.
if (i == 1) {
# On the 1st iteration set "best" to "result".
best <- result
} else if (result$objective < best$objective) {
# Otherwise check if newest "result" is better than the current
# best result ("best").
# If the new "result" is better than the current best,
# update "best" to "result".
best <- result
}
}
# After fitting a best solution, add the solution to the
# data frame "iec_scores".
# best$objective is multiplied by -1 to put it on the same scale as
# the original Excel based versions.
iec_scores[site, ] <- list(row.names(sp)[site], best$par, -best$objective)
}
# Return "iec_scores"
iec_scores
}
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.