library(EML) library(emld) library(lubridate) library(tibble) library(dplyr) library(tidyr) library(ggplot2) library(EFIstandards)
Visualizing the simple, example forecast of population growth of two interacting species
========================================================
First load the data using EML
.
eml_filename <- system.file("extdata", "forecast-eml.xml", package="EFIstandards") md <- read_eml(eml_filename) #you could replace the variable here with a filename form a downloaded data package from a repository
Then to examine the details, use EML::eml_get
. First, the title and abstract and who made it.
eml_get(md, "title") eml_get(md, "abstract") eml_get(md, "creator")
Then diving in, the coverage and keywords
eml_get(md, "coverage") eml_get(md, "keywordSet")
Looks like it's a population level timeseries forecast of r eml_get(md, "geographicDescription")[[1]]
for two species:
# these seem a bit of a pain to pull out maybe there's an easier way spps <- eml_get(md, "taxonRankValue") sp1 <- paste(spps[1:2], collapse=" ") sp2 <- paste(spps[3:4], collapse=" ") sp1 sp2
Load the additional metadata (see documentation of EFI forecasting standard):
methods_md <- eml_get(md, "additionalMetadata") methods_md
This metadata contains details about the system forecast including number of state variables and parameters, and the methods used, including propagation of uncertainty. For example, details of the treatment of process error in this model are given by
methods_md %>% eml_get("process_error") # note could do this directly on \`md` object n_ensembles <- methods_md %>% eml_get("process_error") %>% eml_get("size") n_ensembles <- n_ensembles[[1]]
From this metadata, the output of this forecast should have error propagation via an ensemble of size r n_ensembles
, process error for two state varible dimensions with no covariance between these. Further, one can check if the forecast used data assimilation (no):
eml_get(md, "assimilation")
Finally, I'll dive into the metadata for the output data itself. Wait, which output data? All we've seen so far is metadata!
First remember that EML specifies a tag "physical" for the location of an actual data file. There's plenty of metadata on the file itself including authentication format, etc
dt_md <- eml_get(md, "dataset") eml_get(dt_md, "physical")
To get the data parse the filename and load it!
datafilename <- eml_get(dt_md, "objectName")[[1]] # this steps are necessary because this is in an R package if you have a downloaded data repository # you coudl just pass 'datafilename' into read.csv here datafile <- system.file("extdata", datafilename, package="EFIstandards") dt <- as_tibble(read.csv(datafile, row.names=1)) dt
OK! This has the column names and actual data. To understand these column names,
recall the output dataset metadata, which was loaded earlier. In particular the
attributeList
contains fine details like units and definitions for each column:
dt_md_cols <- dt_md %>% eml_get("attributeList") %>% get_attributes() dt_md_cols$attributes
Cross-referencing this documentation of EFI forecasting standards, note that
data_assimilation
column is zeroed out so this performed no data assimilation
and forecast
column is zeroed out so this was a hindcast. As expected, there are
r length(unique(dt$ensemble))
ensemble members in the forecast. As consumer of
this type of forecast this is the only way to understand the uncertainty
in the forecast.
The output includes densities (number per meter squared) for both species 1 and
species 2. I assume these are in the same order as the entries in
taxononmicCoverage
: r sp1
and r sp2
at several depths measured in meters.
Glancing at column depth
there are only three depths represented. While
reshaping the data, I will rename columns with useful information about their units
and construct a categorical variable with the actual species names:
dtl <- pivot_longer(dt, c("species_1", "species_2"), values_to="density (m^-2)") %>% mutate(time = as.POSIXct(time), name=recode(name, species_1=sp1, species_2=sp2), `depth (m)` = depth)
Given the ensembles, order statitics or quantiles seem like a natural choice to summarize uncertainty, so compute the median and upper and lower quartiles:
dtl <- dtl %>% group_by(name, depth, time) %>% mutate(density_50 = median(`density (m^-2)`), density_25 = quantile(`density (m^-2)`, .25), density_75 = quantile(`density (m^-2)`, .75) )
Putting these pieces together to visualize and describe the forecast and its uncertainty:
ggplot(dtl) + geom_ribbon(aes(time, ymin=density_25, ymax=density_75, fill=name), alpha=0.3) + geom_line(aes(time, `density (m^-2)`, color=name, group=interaction(name, ensemble)), alpha=0.5) + #geom_smooth(aes(group=name), method='gam', formula= y ~ s(x, bs = "cs", k=30)) + geom_line(aes(time, density_50, color=name), size=1.25) + facet_grid(`depth (m)` ~ ., labeller=label_both) + guides(color=guide_legend(title='Median density'), fill=guide_legend(title='25th-75th quantile density')) + labs(title=eml_get(md, 'title')[[1]], subtitle=paste("Forecast from", n_ensembles, "ensembles (thin lines), with uncertainty from quantiles."), caption=paste("Demo visualization from EFIstandards package for an \nimaginary forecast at", eml_get(md, "geographicDescription")[[1]]))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.