knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) lds <- ldsr:::NPlds cv <- ldsr:::NPcv u <- v <- t(ldsr::NPpc)
This package implements the Linear Dynamical System Expectation Maximization (LDS-EM) algorithm presented in Nguyen and Galelli (2018) to reconstruct streamflow (and possibily other climate variables) from paleoclimate proxies. The streamflow-proxy relationship is modeled as a linear dynamical system (LDS), following the set of equations
$$ \begin{align} x_{t+1} &= Ax_t + Bu_t + q_t\ y_t &= Cx_t + Dv_t + r_t\ q_t &\sim \mathcal{N}(0,Q)\ r_t &\sim \mathcal{N}(0,R)\ x_1 &\sim \mathcal{N}(\mu_1, V_1) \end{align} $$ where $x_t$ is the system state (the flow regime of the catchment), $y_t$ the (log-transformed) centralized streamflow, $u_t$ and $v_t$ the exogenous inputs, and $q_t$ and $r_t$ white noises. The system parameters are $\theta = (A, B, C, D, Q, R, \mu_1, V_1)$. Often, $u$ and $v$ are taken to be the same. For detail, please refer to Nguyen and Galelli (2018).
This package is the key workshorse behind Nguyen and Galelli (2018) and Nguyen et al (in prep). ldsr stands for Linear Dynamical System Reconstruction.
We will demonstrate the package using a part of Nguyen et al (in prep). Here, we reconstruct streamflow for the station Nakhon Phanom located along the Mekong River. The climate proxy is the portion of the Monsoon Asia Drought Atlas (MADA) (Cook et al, 2010) version 2 (Cook et al, 2015). The necessary data are already included in the package:
NPannual
is a data frame recording annual streamflow measured at station P1, which was obtained from the Thai Royal Irrigation Department. This record spans the period 1960--2005.NPpc
is the three principal components (PCs 1, 9, and 13) selected from the MADA region surrounding Nakhon Phanom, following the procedure described in Nguyen et al (in prep). This record spans the period 1200--2012.Some preparations:
We load the packages that are used frequently in this vignette. Other packages will be referred to with ::
when necessary.
library(ldsr) # This package library(data.table) # Data wrangling library(ggplot2) # Plotting library(patchwork) # Arranging multiple plots
Preview data
head(NPannual)
NPpc
Since EM is a local search routine, we run it with multiple restarts, each of which has a different initial condition. From our experience, about 20-50 restarts is sufficient. Computations can be sped up using parallel computing, and users can setup any parallel backend according to their system. We recommend the doFuture
backend. On a 3.4 GHz quad-core desktop, the training procedure takes about a second with 20 restarts.
# Setup doFuture as the parallel computing backend doFuture::registerDoFuture() future::plan(future::multiprocess) # Learn LDS u <- v <- t(NPpc) lds <- LDS_reconstruction(NPannual, u, v, start.year = 1200, num.restarts = 20)
lds$theta
p1 <- ggplot(lds$rec[year %in% NPannual$year]) + geom_ribbon(aes(year, ymin = Ql, ymax = Qu), fill = 'gray90') + geom_line(aes(year, Q, colour = 'LDS')) + geom_line(aes(year, Qa, colour = 'Observation'), data = NPannual) + scale_colour_manual(name = NULL, values = c('black', 'darkorange')) + labs(x = NULL, y = 'Mean annual flow [m\u00b3/s]') + theme_classic() + theme(axis.ticks.x = element_blank(), axis.line.x = element_blank(), axis.text.x = element_blank()) p2 <- ggplot(lds$rec[year %in% NPannual$year]) + geom_ribbon(aes(year, ymin = Xl, ymax = Xu), fill = 'gray90') + geom_line(aes(year, X)) + geom_hline(yintercept = 0) + theme_classic() + labs(x = 'Year', y = 'Catchment state [-]') p1 / p2 + plot_layout(heights = c(1, 0.6))
The river has gone through distinct wet and dry epochs.
p1 <- ggplot(lds$rec) + geom_ribbon(aes(year, ymin = Ql, ymax = Qu), fill = 'gray90') + geom_hline(aes(yintercept = mean(Q)), colour = 'salmon') + geom_line(aes(year, Q)) + labs(x = NULL, y = 'Mean annual flow [m\u00b3/s]') + theme_classic() + theme(axis.ticks.x = element_blank(), axis.line.x = element_blank(), axis.text.x = element_blank()) p2 <- ggplot(lds$rec) + geom_ribbon(aes(year, ymin = Xl, ymax = Xu), fill = 'gray90') + geom_hline(yintercept = 0, colour = 'salmon') + geom_line(aes(year, X)) + theme_classic() + labs(x = 'Year', y = 'Catchment state [-]') p1 / p2 + plot_layout(heights = c(1, 0.6))
Make a set of cross-validation folds.
set.seed(100) Z <- make_Z(NPannual$Qa, nRuns = 30, frac = 0.25, contiguous = TRUE)
Run cross-validation
cv <- cvLDS(NPannual, u, v, start.year = 1600, num.restarts = 20, Z = Z)
Cross-validation scores
cv$metrics
Since LDS is a new method, which has not been through the test of time, we encourage users to thoroughly check the results, including comparing it against linear regressin. The package has some functions to do reconstruct streamflow with principal component linear regression (PCR).
# Build PCR model pcr <- PCR_reconstruction(NPannual, NPpc, start.year = 1200) # Cross validate with the same folds as before cvpcr <- cvPCR(NPannual, NPpc, start.year = 1200, Z = Z, metric.space = 'original')
Mean performance scores
rbind(lds = cv$metrics, pcr = cvpcr$metrics)
Performance scores over all cross-validation runs
dt1 <- as.data.table(cvpcr$metrics.dist) dt1[, model := 'PCR'] dt2 <- as.data.table(cv$metrics.dist) dt2[, model := 'LDS'] dt <- rbind(dt1, dt2) dt <- melt(dt, id.vars = 'model', variable.name = 'metric') ggplot(dt, aes(model, value)) + geom_boxplot() + stat_summary(geom = 'point', fun = mean, colour = 'red') + facet_wrap(vars(metric), scales = 'free', nrow = 1) + labs(x = NULL, y = NULL) + theme_classic() + theme(strip.background = element_rect(fill = 'gray90', colour = NA))
p1 <- ggplot(lds$rec[year %in% NPannual$year]) + geom_ribbon(aes(year, ymin = Ql, ymax = Qu), fill = 'gray90') + geom_line(aes(year, Q, colour = 'LDS', linetype = 'LDS')) + geom_line(aes(year, Q, colour = 'PCR', linetype = 'PCR'), data = pcr$rec[year %in% NPannual$year]) + geom_line(aes(year, Qa, colour = 'Observation', linetype = 'Observation'), data = NPannual) + scale_colour_manual(name = NULL, values = c('black', 'darkorange', 'black')) + scale_linetype_manual(name = NULL, values = c(1, 1, 2)) + labs(x = NULL, y = 'Mean annual flow [m\u00b3/s]') + theme_classic() + theme(axis.ticks.x = element_blank(), axis.line.x = element_blank(), axis.text.x = element_blank()) p2 <- ggplot(lds$rec[year %in% NPannual$year]) + geom_ribbon(aes(year, ymin = Xl, ymax = Xu), fill = 'gray90') + geom_line(aes(year, X)) + geom_hline(yintercept = 0) + theme_classic() + labs(x = 'Year', y = 'Catchment state [-]') p1 / p2 + plot_layout(heights = c(1, 0.6))
p1 <- ggplot(lds$rec) + geom_ribbon(aes(year, ymin = Ql, ymax = Qu), fill = 'gray90') + geom_line(aes(year, Q, colour = 'LDS', linetype = 'LDS')) + geom_line(aes(year, Q, colour = 'PCR', linetype = 'PCR'), data = pcr$rec) + scale_colour_manual(name = NULL, values = c('black', 'steelblue')) + scale_linetype_manual(name = NULL, values = c(1, 2)) + labs(x = NULL, y = 'Mean annual flow [m\u00b3/s]') + theme_classic() + theme(axis.ticks.x = element_blank(), axis.line.x = element_blank(), axis.text.x = element_blank()) p2 <- ggplot(lds$rec) + geom_ribbon(aes(year, ymin = Xl, ymax = Xu), fill = 'gray90') + geom_line(aes(year, X)) + geom_hline(yintercept = 0) + theme_classic() + labs(x = 'Year', y = 'Catchment state [-]') p1 / p2 + plot_layout(heights = c(1, 0.6))
An advantage of the LDS model is that it can be used readiliy as a stochastic streamflow generator.
Generate stochastic replicates
set.seed(100) reps <- LDS_rep(lds$theta, u, v, years = lds$rec$year, mu = mean(log(NPannual$Qa)))
Plot the replicates
# Plot streamflow p <- ggplot(reps) + geom_line(aes(year, simQ, group = rep), colour = 'gray80') + geom_line(aes(year, Q), data = lds$rec, colour = 'black') + labs(x = 'Year', y = 'Q [m\u00b3/s]') + theme_classic() # Plot catchment state q <- ggplot(reps) + geom_line(aes(year, simX, group = rep), colour = 'gray80') + geom_line(aes(year, X), data = lds$rec, colour = 'black') + labs(x = 'Year', y = 'Catchment state [-]') + theme_classic() p / q + plot_layout(heights = c(1, 0.6))
Nguyen, H. T. T., & Galelli, S. (2018). A linear dynamical systems approach to streamflow reconstruction reveals history of regime shifts in northern Thailand. Water Resources Research, 54, 2057–2077. \url{https://doi.org/10.1002/2017WR022114}
Nguyen, H. T. T., Turner, S. W., Buckley, B. M., & Galelli, S. (in prep). Coherent streamflow variability in Monsoon Asia over the past eight centuries---links to oceanic drivers. https://doi.org/10.31223/osf.io/5tg68
Cook, E.R., Anchukaitis, K.J., Buckley, B.M., D'Arrigo, R.D., Jacoby, G.C., and Wright, W.E. (2010). Asian monsoon failure and megadrought during the last millennium. Science, 328, 486-489.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.