tidyindex
In tidyindex: A Tidy Data Pipeline to Construct, Compare, and Analyse Indexes

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

library(tidyindex)
library(dplyr)
library(lubridate)
library(lmomco)
library(ggplot2)
library(tsibble)

The tidyindex package provides functionality to construct indexes in a data pipeline, align with the tidyverse paradigm. The pipeline approach is universally applicable to indexes of all kinds. It allows indexes to be broken down into a set of defined building blocks (modules) and hence provides means to standardise the workflow to construct, compare, and analyse indexes.

Decomposing an index into steps

Here we present an example to calculate one of the most widely used drought index: Standardised Precipitation Index (SPI). The index is composed to three steps:

step 1: aggregate the precipitation series in a rolling window
step 2: fit a distribution (usually gamma), per month, to the aggregated precipitation
step 3: normalise the fitted values to a standard normal distribution as the index

Pipeline design

These three steps correspond to three modules in the tidyindex pipeline (temporal_aggregate(), distribution_fit(), and normalise()). Each module uses a tidyverse-mutate style to calculate a step within the module. For example, the following code fits a gamma distribution to the variable .agg. Different distributions are available and prefixed with dist_*() and additional distribution can be added by the user following a similar style to the existing dist_*() steps. The step dist_*() can also be evaluated standalone and seen as a recipe of the step:

distribution_fit(.fit = dist_gamma(...))

dist_gamma(var = ".agg")

Standardised Precipitation Index (SPI): An example

Here we select a single station, Texas Post Office, where is heavily impacted during the 2019/20 bushfire season, in Queensland, Australia, to demonstrate the calculation.

texas_post_office <- queensland %>% 
  filter(name == "TEXAS POST OFFICE") %>% 
  mutate(month = lubridate::month(ym)) 

dt <- texas_post_office |>
  init(id = id, time = ym, group = month) |> 
  temporal_aggregate(.agg = temporal_rolling_window(prcp, scale = 24)) |> 
  distribution_fit(.fit = dist_gamma(var = ".agg")) |>
  tidyindex::normalise(.index = norm_quantile(.fit))
dt

The results contain a summary of the steps used and the data with intermediate variables (.agg, .fit, and .fit_obj) and the index (.index). We can plot the result using ggplot2 as:

dt$data |> 
  ggplot(aes(x = ym, y = .index)) + 
  geom_hline(yintercept = -2, color = "red",  linewidth = 1) + 
  geom_line() + 
  scale_x_yearmonth(name = "Year", date_break = "2 years", date_label = "%Y") +
   theme_bw() +
  facet_wrap(vars(name), ncol = 1) + 
  theme(panel.grid = element_blank(), 
        legend.position = "bottom") + 
  ylab("SPI")

What's more

There are many different things you can do with the package, for example:

to switch from SPI to Standardized Precipitation-Evapotranspiration Index (SPEI), simply add an variable transformation step to compute evapotranspiration from temperature data: variable_trans(.pet = trans_thornthwaite(.tavg = tavg, .lat = lat))
a set of existing drought indexes are available as idx_spi(), idx_spei(), idx_edi(), and idx_rdi()
to compute multiple indexes at once, check compute_indexes()
to calculate parameter uncertainty with the distribution fit, check the .n_boot argument in the distribution_fit()