require(dplyr) knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
RFPM
provides an implementation of the Floating Percentile Model (FPM) in the R statistical environment. The FPM was originally developed by others on behalf of the Washington State Department of Ecology (Avocet 2003) and has since been recommended for use in Washington, Oregon, and Idaho (Ecology 2011). The FPM currently exists as a Microsoft Excel macro (written with Visual Basic for Applications). RFPM
represents an independent effort to port the FPM to R with the purpose of:
- correcting a data handling error
- streamlining the chemical selection and sediment quality benchmark calculation procedures
- providing new tools to rapidly evaluate and refine benchmarks
- improving several aspects of the FPM algorithm
- improving the availability, accessibility, and transparency of the FPM tool
The purpose of this vignette is to lead the reader through an example analysis of real-world data using the core functions of RFPM
rather than go into great detail on any particular function. For more function details, see the help files by using ?
or help()
(or by navigating in RStudio to the Packages tab, selecting RFPM from the list of available packages, and then selecting the function of interest).
The main user interface within RFPM
is via the FPM
function. To use this function, the user must first prepare a sediment chemistry and toxicity data.frame that contains one or more chemical columns and a single toxicity column called Hit
. Several example datasets are provided with RFPM
with this structure, for example h.northport
, which we will evaluate repeatedly in this vignette. Other datasets include the empirical c.northport
and h.tristate
as well as simulated FPM data perfect
, lowNoise
, and highNoise
.
Other key functions that the user is directed to include optimFPM
, cvFPM
, and chemVI
which can help the user to explore and refine FPM inputs and optimize FPM outputs.
In this vignette, we focus on evaluating a case study dataset called h.northport
, which includes a data.frame of bulk sediment chemistry metals concentrations (mg/kg) and Hit
data. Hits are logical results indicating whether a particular sample was deemed toxic (TRUE
) or not (FALSE
) based on the Hyalella azteca biomass endpoint in 28-day laboratory exposures. The definition of a toxic Hit is up to the practitioner and not a part of the model. Data must be input to FPM
in a cross-tab (wide) format, meaning each sediment chemical has a unique column and each sample has just one row.
library(RFPM)
head(h.northport)
To run FPM
, you will need to specify, at a minimum, the data.frame object and the column names associated with sediment chemistry data. As noted above, FPM
also requires that the data.frame include a column called Hit
. Columns other than Hit
and those specified using the paramList
argument are ignored.
p.northport <- names(h.northport)[1:10] ## all chemical column names FPM(data = h.northport, paramList = p.northport) ## minimum input - dataset and chemical column names
The output of FPM
is a list with at least 1 item called FPM
, which displays three types of data:
- The sediment quality benchmark for each of the chemicals selected by FPM
as significant (meaning that concentrations when Hit == TRUE
are significantly greater than when Hit == FALSE
).
- Selected FN_crit
value, the user-defined limit on benchmark conservatism (default = 0.2
)
- Classification metrics: e.g., false negatives rate (pFN
), false positive rate (pFP
), sensitivity (sens
), specificity (spec
), overall reliability (OR
), Fowlkes-Mallows Index (FM
), and Matthew's Correlation Coefficient (MCC
). These provide an indication of the relative fit of the FPM benchmarks to the underlying toxicity dataset.
This is technically all that was required to run the FPM on this example dataset. There is of course a lot going on behind the scenes. Functions that were used within FPM include:
- chemSig
: chemical selection algorithm, which evaluates distribution assumptions of normality and equal variance, then applies appropriate hypothesis test methods to identify significant differences between concentrations when Hit == TRUE
and Hit == FALSE
. The output of chemSig
is a logical vector indicating which chemicals are significant (therefore selected by FPM
for generating benchmarks).
- chemSigSelect
: a function that uses chemSig
but instead returns the chemistry data for significant chemicals and has options for plotting chemistry data distributions to compare Hit == TRUE
and Hit == FALSE
subsets. Setting plot = TRUE
in FPM
turns on plot outputs (by passing that argument to chemSigSelect
).
As an example, see the following figures generated by plot.chemSigSelect
(identical to those generated by FPM
when plot = TRUE
) for a significant chemical (Cr
), shown with blue points, and a non-significant chemical (Al
), shown with green points:
plot(chemSigSelect(h.northport[, c("Al", "Cr", "Hit")], paramList = c("Al", "Cr")), type = "boxplot")
There are also many arguments that can be adjusted in FPM
that affect how chemSig
and chemSigSelect
function or how the FPM algorithm itself functions. Users can adjust the test methods/assumptions for selecting chemicals, force the FPM to accept the assumptions of the original Excel-based function (i.e., ExcelMode == TRUE
), etc. As noted above, the output of FPM
is a list; additional items can be appended to the list if the user sets the densInfo
, lockInfo
, or hitInfo
arguments equal to TRUE
. Respectively, these provide information about 1) how much each of the benchmarks varied within the FPM algorithm, 2) what caused each benchmark to lock into place and in what order, and 3) what the predicted Hit
values would be based on the calculated benchmarks. We have been interested in each of these components at various times of model development, but expect the typical user to not need this information.
See ?FPM
and ?chemSig
for more information.
In RFPM
, there are currently two optimization functions available:
- optimFPM
is intended to find an optimal input (to FPM
) of the FN_crit
and/or alpha.test
parameter. These optimal levels are based on the full dataset (input as data
argument) and, therefore, may be over-specific to that data. The FN_crit
and alpha.test
arguments should be input either as ranges or single values depending on how you want the optimization to run. Inputting a single value for FN_crit
and a range for alpha.test
would optimize the alpha.test
value only and vice-versa. Inputting ranges for both values results in slightly more complex output owing to the 2-dimensional optimization.
- cvFPM
is similarly parameterized to optimize FN_crit
and/or alpha.test
inputs using a cross-validation (CV) approach. The results are "best" values that account for uncertainty in future data. The user can adjust the k
parameter to change the number of folds in the CV algorithm (up to k = n, where n is the number of the samples). This affects the smoothness of the mean/median optimization curves by generating more hypothetical datapoints; it also is likely to expand the min/max range of optimization values by providing more chances for extreme outcomes. By using larger training datasets (i.e., with larger k
inputs), the CV-based optimum will converge toward the empirical optimization curve generated by optimFPM
(and shown in the output of cvFPM
). While we generally recommend using cvFPM
over optimFPM
when possible, there are cases when cvFPM
cannot be used. These include the analysis of small datasets and/or when k
is set to a small number; in either of these cases, FPM
may fail because of a lack of significant chemicals being identified chemSig
(dependent on N).
This is an example of the empirical optimization approach with optimFPM
:
## one-way optimization of the FN Limit - vertical lines show best values based on two metrics optimFPM(h.northport, p.northport, FN_crit = seq(0.1, 0.9, 0.05), alpha.test = 0.05) ## two-way optimization of both FN Limit and alpha - black squares show best values based on two metrics optimFPM(h.northport, p.northport, FN_crit = seq(0.1, 0.9, 0.05), alpha.test = seq(0.01, 0.2, 0.01))
The resulting plots show a mix of potential outcomes. When only one parameter is being optimized (i.e., length(alpha.test) == 1 | length(FN_crit) == 1
), then the first plot is produced, showing fit statistics over the evaluated range. Vertical lines identify optimal values in OR
, sensitivity/specificity ratio (equal to min(c(sens, spec))/max(c(sens, spec))
, FM
, or MCC
.
In the case that two parameters are being optimized together (i.e., length(alpha.test) > 1 & length(FN_crit) > 1
), then a matrix of results is provided representing with heat colors (by default) to indicate the optimization outcome (yellow = suboptimal, red = optimal). Squares indicate the selected values, which are also output into the console.
This is an example of the CV-based optimization approach with cvFPM
and 10-folds:
cvFPM(h.northport, p.northport, k = 10, FN_crit = seq(0.1, 0.9, 0.05), alpha.test = 0.05, which = 2)
Optimization statistics from multiple CV runs are shown (as well as the empirical "all data" result from optimFPM
) as lines. When run with multiple alpha.test
and FN_crit
inputs, cvFPM
will generate a heat map simliar to optimFPM
with values representing on the mean result from CV runs.
Based on the output from either optimFPM
or cvFPM
, the practitioner may decide to adjust and rerun the FPM
using optimized inputs. The decision of whether this is appropriate will depend on the specific management scenario. For example, the optimal FN Limit could end up being 80%, but that lack of conservatism may be unacceptable to a regulator. Similarly, the optimal alpha.test could be 0.5, meaning that the regulator would need to accept a 50% probability of error when selecting significant chemicals; this is a very high uncertainty, 10-fold higher than typical. Regardless, optimization provides useful context to the RFPM
user.
The chemVI
function outputs several potentially useful metrics for evaluating individual metals in the sediment quality benchmark set. These metrics are:
- chemDensity
: how little the benchmark "floated" within the FPM algorithm. High density values correspond to benchmarks that remained at a relatively low concentrations (i.e., did not float up), suggesting relative importance
- MADP
: average change in other chemical's benchmarks resulting from removal of a chemical from data
- dOR
: change in the overall reliability resulting from removal of the chemical from data
- dFM
: change in the Folkes-Mallows Index resulting from removal of the chemical from data
- dMCC
: change in the Matthew's COrrelation Coefficient resulting from removal of the chemical from data
The MADP
and dOR
metrics are the more useful for identifying important chemicals; for example:
chemVI(h.northport, p.northport)
Here we see that copper (Cu
) has an MADP
, dOR
, dFM
and dMCC
equal to 0
, meaning that there was no change in the benchmarks or their ability to predict toxicity after removing copper. Clearly copper is, therefore, not important in this case and could be excluded from further consideration. However, that isn't to say that copper isn't related to toxicity; it's very important to keep in mind that the FPM is a classification tool that attempts to accurately predict Hit
values. The significant chemicals in this list are mostly well correlated, so toxicity could be related to one, all, or perhaps none of the chemicals (i.e., an unmeasured but related chemical or non-chemical stressor). The inclusion of new chemicals in (or exclusion of important chemicals from) data
can result in markedly different benchmarks and chemVI
metrics (particularly for the chemDensity
metric). Therefore, we recommend pre-screening data
to remove chemicals that are unlikely to cause toxicity (e.g., due to poor bioavailability, low toxicity, or uniformly low concentrations). These chemicals, while potentially "significant" with respect to having higher concentrations when Hit == TRUE
, may lead to problems down the road when predicting toxicity more generally. In other words, try to the extent possible to establish a weight of evidence pointing toward causation rather than letting pure correlation lead your analysis. In the current example, we perhaps should have pre-screened chemicals like iron (Fe
) and aluminum (Al
), which we expected to have low toxicity at natural levels. Aluminum was ultimately screened out by FPM
as non-significant, but iron was significant and retained.
chemVI
can be helpful with respect to developing a parsimonious benchark set by identifying benchmarks that do not affect prediction at the site. In our h.northport
example, Cu
and Fe
will be removed because they don't affect toxicity predictions (either positively or negatively). Whenever any chemicals would be removed from consideration through an iterative FPM benchmark development process, FPM
must be rerun to generate new benchmarks. We caution the user to drop only one chemical at a time, rerunning FPM
each time. It is possible that dropping one chemical can result in unexpected results, such as other chemicals becoming more important.
Compare the two outputs:
FPM(data = h.northport, paramList = p.northport) FPM(data = h.northport, paramList = c("Cr", "Zn"))
For the FPM benchmarks generated for h.northport
, the MADP
for Fe
was >0, and the benchmark for Zn
shifted after removing Fe
from consideration. Although the benchmark changed, the classification errors did not, as reflected by the unchanged OR
. The final FPM benchmark set including only Cr
and Zn
is, therefore, the most parsimonious, by which we mean that it classifies Hit
values just as accurately as the larger set of benchmarks but does so with fewer benchmarks. We recommend using a parsimonious set of benchmarks on a site-specific basis, whenever possible, because this should avoid problems of "overfitting" that might cause confounding variables to trigger erroneous conclusions.
You may have noticed that we didn't use optimized inputs in the example above for chemVI
. We omitted that to provide an example where relatively unimportant chemicals could be screened out using the metrics generated by chemVI
. If we instead change the inputs of chemVI
to the optimized FN_crit
and alpha.test
(using the OR
optimization approach), the chemVI
results look like this instead:
FPM(h.northport, p.northport, FN_crit = 0.25, alpha.test = 0.15)$FPM chemVI(h.northport, p.northport, FN_crit = 0.25, alpha.test = 0.15)
In this case, adjusting alpha.test
has allowed for 2 new chemicals to be added, aluminum and lead (Pb
), which were not selected when alpha.test = 0.05
. After including these chemicals (and slightly adjusting the FN_crit
), the final OR
increased to 67%, a 10% improvement in accuracy over the default values (not shown). MCC
increased by a similar margin. Based on the results of chemVI
shown above, there is a marked shift in the relative importance of metals for toxicity predictions. Removing any of the chemicals would result in a roughly 10% reduction in accuracy (based on the dOR
metric), and removing Cu
now would have an effect on the other benchmarks values (as shown by the MADP
metric >0). The chemDensity
value for Cr
has dropped to <1%, meaning that the benchmark is now close to the maximum possible value, whereas it had one of the highest chemDensity
values before.
As you can see, variable importance is dependent on the set of benchmarks, not individual benchmarks in isolation. The addition of new chemicals impacted not only on the relative importance of each chemical but the overall reliability of the set of chemicals and on the FPM benchmark values themselves.
We want to emphasize several findings from our analyses of the characteristics, behavior, and sensitivity of the FPM
(the subject of two anticipated manuscripts).
FPM benchmarks should be treated as a set of values, not values to be used singularly or mixed and matched between runs of the FPM
using different species, toxicity test endpoints, sites, etc. There may be a desire or an inclination to select the most sensitive species and test endpoints among chemicals. Doing so, however, would invalidate the various assumptions made when generating FPM benchmarks, including the projected FN Limit (i.e., intended FN_crit
and actual pFN
) and the accuracy of benchmarks (i.e., OR
, FM
, and MCC
). If there are multiple species that need to be accounted for by the model, consider different approaches to defining Hit
values that capture multiple species and/or endpoints prior to running FPM
. For example, assign Hit == TRUE
if any of several endpoints indicates toxicity in a given sample. This approach will result in a single benchmark set for all endpoints rather than generating multiple sets of benchmarks (as was done by Ecology 2011). The resulting benchmarks may be conservative, but they would not deviate from the intended statistical underpinnings of FPM outputs in the same way that mixing-and-matching multiple benchmark sets would.
FPM benchmarks should be considered as predictive of toxicity classifications (i.e., Hit
values) rather than continuous toxicity test results (e.g., 10% mortality, 27% biomass, etc.). For this reason, we would generally argue against the application of FPM benchmarks within a Natural Resource Damage Assessment (NRDA) context. We recognize that future approaches might be proposed for applying FPM benchmarks in a NRDA context; while theoretically possible, such attempts should be considered carefully, particularly with regard to how Hits are defined (and whether the definition reasonably corresponds with definitions of injury).
FPM benchmarks are most powerful when they are site-specific. By using site toxicity and sediment chemistry data, and by implicitly considering factors like bioavailability, FPM benchmarks should improve upon existing default benchmarks (Ecology 2011). Therefore, we recommend using RFPM
and site-specific data to the extent practicable to derive new benchmarks.
While it is most common to generate benchmarks based on bulk sediment concentrations, it is not necessary that this be the case. For example, benchmarks could reflect organic carbon-normalized sediment concentrations, pore water concentrations, etc. These measurements might improve the relevance of toxicity predictions by taking bioavailability more explicitly into account. Attempts to generate benchmarks based on these types of data simply complicate the application of the benchmarks by requiring all future samples be analyzed and concentrations reported in the same way. Moreover, some approaches will not be possible such as the simultaneously extracted metal - acid volatile sulfide method, which can generate non-positive data (that cannot be handled by FPM
).
While there may be a concern about mixing chemicals of different types and mechanisms/modes of action (e.g., metals and polycyclic aromatic hydrocarbons) when generating FPM benchmarks, it does not seem to us that such a distinction should matter. From a conceptual standpoint, the FPM can be used to predict any variable that is distilled to a TRUE
/FALSE
classification from a set of one or more independent variables (chemical or otherwise). The key limitations that arise when mixing chemicals (and/or non-chemical stressors) are:
FPM
via data
need to have a common directionality with respect to Hits
. In other words, toxicity is expected to increase as, for example, copper increases. Therefore, you could not also include pore water acidity, which increases toxicity at both high and low pH. This limitation is related to the chemical selection process, which (by default) assumes that concentrations will be higher among toxic samples. It also relates to the iteratively increasing nature of the FPM
algorithm; concentrations start at a low level and then shift in only one direction (up/increasing). The inclusion of many correlated variables may increase the number of irrelevant benchmarks. For example, in the analysis of h.northport
dataset presented above (using default FPM
inputs), iron was correlated with other metals (not shown) and, as a result, was selected for inclusion among the benchmarks despite being relatively non-toxic and contributing not at all to the predictive accuracy of the benchmark set. As described above, chemVI
can help to identify potential simplifications to the benchmark set when correlated variables lead to unnecessary complexity. The user is also free to pre-screen their data to remove redundant variables.
FPM
has been found to work poorly or not at all with missing data. Consider removing analytes with incomplete datasets, omit samples with partial analysis, and/or impute values to fill data gaps.
FPM
does not currently address non-detection in a meaningful way. As the model is largely based on percentiles and ranges rather than averages, this shouldn't generally lead to major problems as long as there isn't a high degree of non-detection (e.g., >20%). We leave it to the user to address how non-detects will be handled and to use FPM benchmarks based on non-detected data with caution. High degrees of non-detection of a chemical is a good reason for excluding it from data
(or paramList
), as there is not a reasonable way to use such a chemical to predict Hits. The use of more advanced censored data imputation methods (e.g., Kaplan-Meier or Regression on Order Statistics [ROS]) may provide useful, but should also be applied with care; using these methods (in addition to treating non-detects as half of detection limits or as zero) have the potential to result in benchmarks that fall below detection limits. We would argue that it is unreasonable to apply benchmarks that fall at or below detection limits.
As with other toxicity-based benchmarks, it may be important to consider background chemical concentrations and reference area toxicity levels when developing and applying FPM
benchmarks. Reference area toxicity can be incorporated explicitly into Hits
definitions by normalizing toxicity data to the reference condition prior to classifying Hits
, and background can be used to qualify chemical exceedances of FPM benchmarks that fall below background levels. Examples of applying a reference condition would include dividing site toxicity by mean reference area toxicity (then applying a default effect threshold like 80%) or using a percentile of reference toxicity as the defining threshold for Hits (e.g., site toxicity >80th percentile of reference effect).
Avocet. 2003. Development of freshwater sediment quality values for use in Washington State. Phase II report: Development and recommendation of SQVs for freshwater sediments in Washington State. Publication No. 03-09-088. Prepared for Washington Department of Ecology. Avocet Consulting, Kenmore, WA.
Ecology. 2011. Development of benthic SQVs for freshwater sediments in Washington, Oregon, and Idaho. Publication no. 11-09-054. Toxics Cleanup Program, Washington State Department of Ecology, Olympia, WA.
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.