knitr::opts_chunk$set(echo = TRUE) library(MATSS) library(dplyr) library(tidyr) library(ggplot2) library(drake) library({{{package}}})
This report was compiled and generated by the MATSS
R package [@{{{matss_ref_key}}}].
As a result of running the Drake pipeline, we have objects for each of the "targets" in the Drake cache. First, let's examine what the names of those targets are:
cached()
Note that we have two sets of results, one for running the compute_simpson_index()
function on each dataset, and one for running the compute_linear_trend()
function on each dataset (producing results for each time series in each dataset).
We can use loadd()
to load specific targets (or all of them) into the R environment; similar to the base load()
function for loading in .Rdata files.
Alternatively, we can use readd()
to return a target directly; similar to the base readRDS()
function for reading in .RDS files.
First, let's look at the compute_simpson_index()
results:
results_simpson_index <- readd("results_simpson_index") results_simpson_index
The object is a tibble with the output of the calculations stored in the results
column:
Because the output of compute_simpson_index()
is a numeric vector corresponding to Simpson's index, computed at each time step, these vectors are the elements of the results
list-column.
Related information about the dataset and additional args are stored in the other columns of this tibble.
Now, let's look at the compute_linear_trend()
results:
results_linear_trend <- readd("results_linear_trend") results_linear_trend
Again, the object is a tibble with a similar structure as previously, with the results
list-column containing the tibble outputs from compute_linear_trend
.
We encourage the use of Tidyverse
for extracting and handling the output. In particular, there are some useful examples of dealing with complex output structures in list-columns described in https://github.com/jennybc/row-oriented-workflows.
Our goal is to plot a time series of Simpson's Index for each separate dataset.
This suggests the following processing procedure: extract the values for each dataset, to make a single long-format data.frame construct a "time" variable, which will serve as the x-axis in plotting
to_plot <- results_simpson_index %>% select(dataset, results) %>% unnest(cols = results) %>% rename(value = results) %>% group_by(dataset) %>% mutate(t = row_number()) %>% ungroup()
Plotting is mostly straightforward. Note that we allow the x-axis scale to vary for each dataset separately, because the number of time points varies across datasets, and having them all be aligned across facets would give the (false) impression of synchronicity.
ggplot(to_plot, aes(x = t, y = value)) + geom_line() + facet_wrap(~dataset, scales = "free_x") + theme_bw() + labs(x = "Time", y = "Simpson's Index (1-D)")
Our goal is to plot the distribution of species trends over time in each dataset.
Processing is a bit simpler, since we really only need to subset the results that we want:
to_plot <- results_linear_trend %>% select(dataset, results) %>% unnest(cols = results) %>% select(dataset, id, t)
When plotting, we note that the range in slopes varies a lot, and depends on abundance scaling, so we allow the facets to have different y-axis scales, making sure to add in the 0 line, and jittering the points for the raw data, too.
ggplot(to_plot, aes(x = 0, y = t)) + geom_violin() + geom_point(position = position_jitter(width = 0.1, height = 0), shape = 5) + geom_hline(mapping = aes(yintercept = 0), size = 1, lty = 2, color = "red") + facet_wrap(~dataset, scales = "free_y") + theme_bw() + labs(x = "Density", y = "Slope of Population Trendline")
cat(paste("1.", readd(citations)), sep = "\n")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.