In samdchamberlain/salinity_study: Reproducible manuscript evaluating salinity impacts to Sacramento-San Joaquin Delta restored wetlands

^1^Department of Environmental Science, Policy, and Management, University of California, Berkeley, California, USA

#setwd to main package directory
knitr::opts_knit$set(root.dir = normalizePath("../"))
options(digits=2)

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>")

knitr::opts_chunk$set(echo = FALSE, warning = FALSE, message = FALSE)

# packages used
library(knitr)
library(tidyverse)
library(lubridate)
library(gridExtra)
library(scales)
library(zoo)
library(caret)
library(forecast)
library(entropy)
library(stringr)

# mutual info and transfer entropy functions
source("R/MI_functions.R")
source("R/TR_functions.R")
source("R/transformations.R")
source("R/signficance_tests.R")

Abstract

Salinity gradients across estuaries influence wetland carbon storage, methane (CH~4~) biogeochemistry, and plant productivity. Estuarine freshwater wetlands may experience increases in salinity during drought; however, the impact of salinization on greenhouse gas (GHG) emissions is uncertain. We measured ecosystem-scale GHG emissions from a wetland experiencing salinization during the 2011-2017 California drought, and used information theory analyses to quantify couplings and information transfer between environmental drivers and CH~4~ fluxes. Machine learning models were used to estimate salinization-induced changes in CH~4~ fluxes and plant productivity. We observed dynamic CH~4~ flux-driver relationships across the dynamic disturbance time frame, where temperature connections strengthened and productivity connections dampened during salinization. Annual photosynthesis reduced 58% during peak salinization, whereas annual CH~4~ emissions reduced 10%, suggesting that other CH~4~ substrate sources compensated for reductions in photosynthates. Our results demonstrate the value of applying these approaches to ecological analyses and suggest drought-induced salinization may increase GHG emissions from estuarine freshwater wetlands. (153 words)

Body: 3907/4000 words

Keywords

eddy covariance, carbon dioxide, productivity, peatland, transfer entropy, complex systems, saltwater intrusion

1. Introduction

Peatland restoration is proposed as a natural climate mitigation strategy due to long-term sequestration of carbon dioxide (CO~2~) in anoxic sediments [@griscom_natural_2017; @leifeld_underappreciated_2018], which also inhibits CO~2~ emissions from the drained peat soils [@hatala_greenhouse_2012; @knox_agricultural_2015]. From a climate mitigation perspective, it is often assumed that methane (CH~4~) emissions from re-flooded landscapes are offset by carbon (C) storage over multi-century time scales [@mitsch_wetlands_2013; @griscom_natural_2017; @leifeld_underappreciated_2018]. Despite long-term climate benefits, CH~4~ emissions are critical to assessing the greenhouse gas (GHG) benefit of wetland restoration given the short-term GHG forcing strength of CH~4~ emissions relative to CO~2~ uptake [@neubauer_moving_2015] and the rapid response of radiative forcing to global CH~4~ emissions [@feldman_observationally_2018]. A growing literature, where ecosystem fluxes of CO~2~ and CH~4~ are measured in situ by eddy covariance [@baldocchi_measuring_2014], demonstrate that many wetlands are net GHG sources due to CH~4~ emissions over 20 to 100 year timescales [@petrescu_uncertain_2015; @hemes_compromise_2018]. To engineer systems with a limited climate footprint, we need to better understand wetland responses to disturbance and management.

Estuarine wetlands and river deltas have experienced large-scale drainage and land surface subsidence [@syvitski_sinking_2009], and wetland restoration reverses this subsidence [@miller_subsidence_2008] while promoting C sequestration [@kroeger_restoring_2017]. The Sacramento-San Joaquin Delta in the San Francisco Estuary of California has experienced substantial subsidence following drainage [@deverel_historic_2010], and wetlands are being restored to promote multiple ecosystem services, including sediment accretion, GHG emission reductions, and habitat creation [@deverel_implications_2017]. These freshwater restored wetlands generally experience stable, low salinity levels due to upstream water infrastructure that stores runoff for release during dry periods. However, salinity levels increase during droughts when freshwater resources are limited and saltwater intrudes further up the San Francisco Bay and into the Delta [@enright_salinity_2009].

The ecological impacts of saltwater intrusion on wetlands are far reaching [@herbert_global_2015], impacting C cycling and export [@chambers_effect_2013; @ardon_drought_2016], nutrient cycling and biogeochemistry [@morrissey_salinity_2014; @dijk_salinization_2015], plant productivity [@glenn_effects_1995], and biological community composition [@nielsen_modified_2009]. Salinity gradients in San Francisco Bay are the dominant control of wetland plant communities, where species distributions respond directly to changes in salinity [@watson_abundance_2009]. Methane emissions are also responsive to changing salinity, as CH~4~ production is inhibited in the presence of sulfate in seawater [@poffenbarger_salinity_2011]. For this reason, tidal wetlands are commonly the focus of "blue carbon" restoration efforts, where enhanced C storage [@krauss_role_2018] and limited CH~4~ emissions yield a limited GHG footprint compared to freshwater wetlands [@kroeger_restoring_2017]. However, increasing salinity may increase overall GHG emissions due to stress-induced reductions in photosynthesis that offset reduced CH~4~ emissions [@krauss_component_2016]. Interpreting wetland responses to salinization therefore requires direct observation of multiple C fluxes and their interactions, as gross ecosystem photosynthesis (GEP) provides substrate for wetland CH~4~ production [@hatala_gross_2012], yet the synergistic effects on methane biogeochemistry and productivity are not well understood.

Understanding such responses can be challenging, as ecosystems are complex systems where non-linear, lagged, and feedback interactions are common [@ruddell_ecohydrologic_2009; @sturtevant_identifying_2016]. In some cases, findings from small-scale experiments may not be relevant to whole ecosystem dynamics [@schindler_dilemma_2012], requiring the quantification of dominant or causal interactions in situ. A number of statistical approaches common to neuroscience and econometrics, yet relatively novel to ecology, are well-suited to this task [@rinderer_assessing_2018] and are useful for identifying key processes to include in ecosystem models [@larsen_appropriate_2016; @getz_wayne_m_making_2017]. These include non-parametric information theory metrics, such as mutual information (MI), that can identify dominant, lagged and non-linear relationships in flux time series [@sturtevant_identifying_2016; @chamberlain_soil_2018; @knox_direct_2018]. Other metrics aimed at identifying causal relationships, such as Granger causality, have been used to identify directed relationships between GHG fluxes in forest [@detto_causality_2012] and rice ecosystems [@hatala_gross_2012]. Transfer entropy (TE), a non-parametric measure equivalent to Granger causality under Gaussian conditions [@barnett_granger_2009], has also been applied to understand functional changes in ecosystems during drought [@ruddell_ecohydrologic_2009; @ruddell_ecohydrologic_2009-1] and how C cycling responds to restoration practices [@larsen_disrupted_2017]. Transfer entropy provides a clear benefit over Granger causality when applied to complex, non-linear, as observed in CH~4~ flux timeseries, due to its non-parametric nature and relaxed underlying assumptions. These methods provide a powerful means to understand and predict ecosystem responses to change through observational data.

The objectives of this study were to determine the effect of drought-induced salinization on GHG emissions from restored freshwater wetlands in the Sacramento-San Joaquin Delta of California, USA. We measured ecosystem-scale CO~2~ and CH~4~ fluxes by eddy covariance from a restored wetland that experienced a six-fold rise in salinity over the course of the 2011-2017 California drought, ranging from near-freshwater oligohaline to mesohaline brackish waters. We used (1) information theory metrics to identify how couplings and causal relationships between temperature, GEP, and CH~4~ fluxes (F~CH4~) varied over the course of wetland salinization, and (2) random forest models to quantify the overall reduction in GEP and CH~4~ fluxes due to salinization. We focused on the relative role GEP and temperature play in determining CH~4~ fluxes because these two variables are dominant biophysical drivers of CH~4~ emissions [@bridgham_methane_2013; @yvon-durocher_methane_2014], and previous studies within this region have demonstrated that wetland CH~4~ fluxes scale with both GEP [@hatala_gross_2012; @hemes_compromise_2018] and temperature [@knox_biophysical_2016]. This work builds upon previous literature that has denomstrated the dominant interactions governing CH~4~ fluxes and the time scales at which they occur [@sturtevant_identifying_2016; @chamberlain_soil_2018], and aims to assess the dominant causal driver of CH~4~ production rates, and consequent emissions, within these wetlands. We focus on GEP and temperature, as they are direct controls of CH~4~ production rates, rather than additional variables, such as evapotranspriation, that are indicators of CH~4~ transport processes. We hypothesized that both GEP and CH~4~ fluxes would be reduced during peak salinization, where GEP would decline as salt-intolerant plant communities experienced stress. We expected to observed a similar, or larger, relative reduction in CH~4~ emissions, as GEP provides substrate for CH~4~ production and sulfate within intruding saltwater directly inhibits methanogenesis. Understanding and predicting the response of wetland GHG exchange to salinization events is important from a policy perspective, as wetland restoration in the region is funded through California's Cap-and-Trade Program [@deverel_implications_2017; @oikawa_evaluation_2017] and freshwater is projected to become an increasingly limited resource in the state's future [@vicuna_sebastian_sensitivity_2007].

2. Methods

2.1 Study Site and Flux Measurements

We measured ecosystem-scale fluxes of CO~2~ and CH~4~ for seven years from a restored wetland in the Sacramento-San Joaquin Delta of California, USA (Ameriflux site Us-Myb; N 38.0498, W 121.7650). The wetland was restored in 2010 on a former pasture, and our measurements began at the point of initial restoration and re-flooding. The site is predominantly vegetated with cattails (Typha spp.) and some tules (Schoenoplectus acutus), and the water table is managed to stay above the land surface. We compared findings from this site to a nearby reference wetland site (Ameriflux site Us-Tw1; N 38.1074, W 121.6469), restored in 1999, that did not experience an increase in salinity. These wetlands are impounded basins and do not experience tidal flows found in the adjacent estuary. Detailed site descriptions can be found in @eichelmann_effect_2018. The canopy height of the salinization wetland was, on average, 3.4 m and the canopy height of the reference wetland was 2.6 m.

Fluxes were measured by eddy covariance and integrated over half-hour periods from the covariance of vertical wind velocity and gas concentrations measured at 20 Hz frequency using 3-dimensional sonic anemometers (Gill WindMaster; Gill Instruments Ltd, Lymington, Hampshire, England) and open-path infrared gas analyzers (LI-7500A & LI-7700; LI-COR, Lincoln, NE, USA). Standardized flux corrections and quality control were applied, including high-frequency data despiking, 2-D coordinate rotations, density corrections, and friction velocity (u$^\ast$) filtering. Air density corrections to CH~4~ fluxes typically account for a 10% adjustment in flux magnitudes [@chamberlain_evaluation_2017]. Data gaps were filled using artificial neural networks (ANNs), and net CO~2~ exchange was partitioned into ecosystem respiration (ER) and GEP by training an ANN on nighttime CO~2~ fluxes and predicting ER at all times. GEP was estimated as the difference between NEE and ER. Site instrumentation is described in greater detail in @eichelmann_effect_2018, quality control is described in more detail in @knox_agricultural_2015 and @chamberlain_evaluation_2017, and gap-filling and partitioning in @baldocchi_does_2015. Water column conductivity was measured using stainless-steel electrodes (CS547A-L; Campbell Scientific, Logan, UT, USA) and air temperature using shielded and aspirated thermistor and capacitance sensors (HMP45; Viasala, Vantaa, Finland). Conductivity was converted to practical salinity units (psu) using relationships with water temperature based on the Practical Salinity Scale [@fofonoff_seawater_1983]. Flux quality is periodically compared to reference standards from the Ameriflux network.

2.2 Information Theory Methods

We used mutual information (MI) to identify undirected couplings between variables and transfer entropy (TE) to infer directed, or causal, interactions between variables. Mutual information quantifies the amount of information shared between two variables X and Y, or the reduction of uncertainty given knowledge of another variable, described using the following equation: $$MI_{X,Y}=H_X+H_Y-H_{X,Y}$$ where H~X~ and H~Y~ are marginal Shannon entropies of each variable, and H~X,Y~ is the joint Shannon entropy of X and Y. Marginal entropies are calculated from the probability distributions of a variable as follows: $$H_Y=-\sum_{y_t}p(y_t)log_2p(y_t)$$ And joint entropies are calculated from the joint probability distributions of two variables as follows: $$H_{X,Y}=-\sum_{x_t}\sum_{y_t}p(x_t,y_t)log_2p(x_t,y_t)$$ where x~t~ and y~t~ are states within the overall distribution of X and Y.

We calculated mutual information (MI) for one-month blocks across the entire flux time series to quantify the change in couplings across the salinity disturbance. For each month, we calculated probability distributions by discretizing continuous data into 10 bins of equivalent length across each variables range. Mutual information was calculated for each variable pair of interest (Ta,F~CH4~ and GEP,F~CH4~), and significance thresholds were computed from the 95^th^ percentile (P < 0.05) of 1000 Monte Carlo reshufflings of the independent variable (Ta or GEP). Because the reshuffled values (MI') vary with the number of CH~4~ flux observations per month (Figure S1) we present MI-MI', similar to @larsen_disrupted_2017, such that any values in Figs. 1 and 2 greater than zero are significant couplings.

Transfer entropy quantifies directional information transfer, or the extent to which knowledge of a variable X reduces the uncertainty of the present state of Y given knowledge of the history of Y. If a value of TE is above some significance threshold, it is deduced that information flows from X to the current state of Y [@ruddell_ecohydrologic_2009]. Transfer entropy can also be described by joint and marginal entropies incorporating time lags as follows [@knuth_revealing_2013; @ruddell_ecohydrologic_2009]: $$T(X>Y, \tau) = H(X_{t-\tau}, Y_{t-1}) + H(Y_{t}, Y_{t-1}) - H(Y_{t-1}) - H(X_{t-\tau}, Y_{t}, Y_{t-1})$$ In the above formulation, we assume the immediate history of Y (t-1; i.e. the preceding half-hour in eddy flux time series) contributes the most information to the current state of Y, and we calculate TE across multiple lags of X ($\tau$) to identify dominant timescale of interactions [@ruddell_ecohydrologic_2009; @larsen_disrupted_2017]. More detail on applying transfer entropy (TE) to ecosystem-atmosphere fluxes can be found in @ruddell_ecohydrologic_2009, and we follow the basic protocols outlined in this work. This includes building discrete probability distributions using 10 bins for each variable as described above, transforming variables prior to TE calculations using a 5-day anomaly filter, and again establishing significance thresholds using Monte Carlo reshuffling of the flux time series. We calculate TE for information flow from Ta to F~CH4~ and GEP to F~CH4~, and did so across multiple lags of the independent variable up to 60 hours in order to assess whether environmental conditions from previous days impacted current F~CH4~ rates. Similarly, we calculated significance thresholds for TE using 100 reshufflings (due to computational demands) at each lag. Calculations of transfer entropy are data intensive [@ruddell_ecohydrologic_2009], so we calculate TE for entire growing seasons each year (DOY 100-300) similar to @sturtevant_identifying_2016 and @chamberlain_soil_2018. We present both MI and TE in their relative form, normalized by the entropy of the Y variable (F~CH4~), so that values represent reductions in the uncertainty of CH~4~ fluxes [@larsen_disrupted_2017]. Both mutual information and transfer entropy have been applied understand process interactions in eddy covariance flux time series in wetland [@sturtevant_identifying_2016; @chamberlain_soil_2018] and grassland ecosystems [@ruddell_ecohydrologic_2009]. Additionally, information theory methods have been applied extensively to study ecosystem process from eddy covariance sites across the globe[@yu_global_2019]. All information theory calculations were conducted using R 3.3.2 [@r_citation] with the 'entropy' package [@entropy] and home built functions to calculate TE and significance thresholds available online with this manuscript (https://github.com/samdchamberlain/salinity_study).

2.3 Salinity Impacts on Annual Fluxes

To quantify the reduction in GEP and F~CH4~ due to salinity effects, we trained random forest models on fluxes measured during years prior to the salinity rise (2012-2014) using common meteorological drivers of each flux. Predictions are therefore representative of salinity 'naive' ecosystem function. Random forests are an ensemble tree-based method where numerous trees are constructed using bootstrap samples from the training dataset, and the trees are averaged to generate a final prediction. For each tree, a random subset of the predictors (m~try~) are chosen at each split to reduce overfitting [@breiman_random_2001]. We used an ensemble of 500 trees, and the m~try~ value (ranging from 2-4) was chosen by using the model formulation with the lowest root mean squared error following five-fold cross-validation [@kuhn_applied_2013]. Any missing values in the predictor data were imputed using medians.

We trained the models using common meteorological predictor variables that would not be affected by changes in local salinity, with the exception of GEP in the CH~4~ flux model based on previous findings in this region [@hatala_gross_2012]. For the GEP model, this included Ta, photosynthetically active radiation (PAR), vapor pressure deficit (VPD), air pressure (PA), and friction velocity (u$^\ast$), and the CH~4~ flux model included GEP, Ta, VPD, PA, and u$^\ast$. The tuned random forest models were then used to generate predictions of CH~4~ flux and GEP during the high salinity years (2015-2016), and we quantified the effect of salinization as the difference between observed fluxes and model predictions. Random forests were fit using the 'caret' [@caret] and 'randomForest' [@randomForest] packages. All additional data processing, statistical analysis, and visualization were conducted using the 'tidyverse' [@tidyverse], 'zoo' [@zoo], 'gridExtra' [@gridExtra], 'lubridate' [@lubridate], and 'scales' [@scales] packages. This manuscript is reproducible via R Markdown, and its code can be found at https://github.com/samdchamberlain/salinity_study.

3. Results

#load and process wetland sites into half-hourly, daily, and annual sums
source("R/peat6_processing.R") #wetland experiencing salinity rise
source("R/peat19_processing.R") #reference wetland w/o salinity rise

We observed a substantial increase in salinity over the course of the observation period, where waters were predominately fresh, oligohaline the first year following restoration (r mean(peat6_daily$Sal[which(peat6_daily$year == 2012)], na.rm = T) psu in 2012) and salinity levels rose thereafter to a peak in 2016 (r max(peat6_daily$Sal[which(peat6_daily$year == 2016)], na.rm = T) psu), at which point waters were mesohaline brackish (Figures 1 and 2). Salinity levels were relatively stable from 2012 through 2013, began to rise through 2014, peaked in 2016, and reduced rapidly in 2017 when managers flushed the wetland with freshwater (Figure 1). Flushing reduced salinity levels to r min(peat6_daily$Sal[which(peat6_daily$year == 2017)], na.rm = T) psu, within the range of freshwater wetlands. Substantial inter-annual variability was observed in GEP and ER fluxes, where the first-year post-restoration (2012) fluxes were very large, followed by low fluxes in 2013. Methane fluxes were more stable but reduced across the measurement period (Figure 1). In 2016 when salinity peaked, all late season fluxes were low compared to previous years (Figure 1). Fluxes remained low into 2017, and late season fluxes increased substantially coincident with freshwater flushing (Figure 1).

time_series <- theme_bw() +
  theme(panel.grid.minor=element_blank(), panel.grid.major.y = element_blank(),
        axis.title.x = element_blank(), axis.title.y = element_text(size=10))

vegetated_monthly <- subset(peat6_monthly, year > 2011)
vegetated_monthly$Year <- as.factor(vegetated_monthly$year)

a <- ggplot(vegetated_monthly, aes(x=month, y=gCH4, color=Year)) +
  geom_line() +
  scale_x_continuous(breaks = seq(2, 12, by=2)) +
  ylab(expression(F[CH4]~" "~"("~g~C~m^{-2}~mn^{-1}~")")) +
  time_series + theme(legend.position = "none")

b <- ggplot(vegetated_monthly, aes(x=month, y=gGEP, color=Year)) +
  geom_line() +
  scale_x_continuous(breaks = seq(2, 12, by=2)) +
  ylab(expression("GEP "~"("~g~C~m^{-2}~mn^{-1}~")")) +
  time_series + theme(legend.position = "none")

c <- ggplot(vegetated_monthly, aes(x=month, y=gER, color=Year)) +
  geom_line() +
  scale_x_continuous(breaks = seq(2, 12, by=2)) +
  ylab(expression("ER "~"("~g~C~m^{-2}~mn^{-1}~")")) +
  time_series +
  theme(legend.margin = margin(0, 0, 0, 0),
        legend.position=c(.14, .62), legend.key.size =  unit(0.14, "in"),
        legend.text = element_text(size=8), legend.title = element_blank()) 

d <- ggplot(vegetated_monthly, aes(x=month, y=mSal, color=Year)) +
  geom_line() +
  scale_x_continuous(breaks = seq(2, 12, by=2)) +
  ylab("Salinty (psu)") + xlab("Month") +
  theme_bw() +
  theme(panel.grid.minor=element_blank(), panel.grid.major.y = element_blank(),
        axis.title.y = element_text(size=10), legend.position = "none")

#Multiple plots aligned
iA <- ggplotGrob(a)
iB <- ggplotGrob(b)
iC <- ggplotGrob(c)
iD <- ggplotGrob(d)

maxWidth = grid::unit.pmax(iA$widths[2:5], iB$widths[2:5], iC$widths[2:5], iD$widths[2:5])
iA$widths[2:5] <- as.list(maxWidth); iB$widths[2:5] <- as.list(maxWidth); 
iC$widths[2:5] <- as.list(maxWidth); iD$widths[2:5] <- as.list(maxWidth);

grid.arrange(iA, iB, iC, iD, ncol=1)

Couplings between driver variables (GEP and Ta) and F~CH4~ were similar in the pre-disturbance period (2012 through 2014) where coupling stregth, determined via MI, varied seasonally and was of similar magnitude for MI~Ta,FCH4~ and MI~GEP,FCH4~ (Figure 2). During 2014, couplings between CH~4~ and driver variables did not display a similar seasonality, presumably due to a large-scale insect outbreak that affected the wetland, though a late season increase in coupling strength was still observed for both MI~Ta,FCH4~ and MI~GEP,FCH4~. In 2015 and 2016 during peak salinization, we observed a strong decoupling between GEP and F~CH4~, and a strengthened coupling between Ta and F~CH4~ (Figure 2). The decoupling of GEP and F~CH4~ was not driven by reduced GEP, as GEP magnitudes in 2015 were similar to previous years (Figure 1). During 2017 when salinity levels dropped steeply, couplings between GEP and F~CH4~ began to recover but were highly variable throughout the season (Figure 2). Increasing GEP to F~CH4~ couplings continued into the early 2018 growing season, where coupling strength was similar in magnitude to those observed in early 2013 (Figure 2). The loss of GEP couplings in 2015 and 2016 were coincident with the transition from oligohaline (< 3 psu) to mesohaline (> 3 psu) brackish water conditions. These dynamics are not clearly observable in conventional parametric and non-parameteric correlation statistics, such as Pearson's correlation and Kendall's tau, where seasonal patterns are obscured by large negative coefficients in winter periods (Figure S2).

## Processing for high salinity wetland
#create a growing month stamp
#peat6_all <- subset(peat6_all, dday < 3043)
peat6_all$month <- month(peat6_all$datetime)
peat6_all$m_yr <- paste(peat6_all$year, peat6_all$month, sep = "")

#create data list by month
by_month <- peat6_all %>%
  filter(year > 2011) %>%
  group_by(m_yr) %>%
  nest()

#plot that needs to be dealt with later
clipped_monthly <- subset(peat6_monthly, year > 2011)
clipped_monthly$date <- as.Date(clipped_monthly$datetime)
clipped_monthly$GPP_CH4_mi <- mi_timeseries(gpp_ANNnight, wm, data_list = by_month, bins = 10)
clipped_monthly$Tair_CH4_mi <- mi_timeseries(TA.x, wm, data_list = by_month, bins = 10)

# Monte Carlo significance thresholds (dont run everytime as they take a while)
sig_gpp <- conf_series(gpp_ANNnight, wm, data_list = by_month, type = "MI",
                       runs = 10, alpha = 0.05, bins = 10)
sig_Tair <- conf_series(TA.x, wm, data_list = by_month, type = "MI",
                        runs = 10, alpha = 0.05, bins = 10)

# Correct MI values of bias due to sample size, now anything over zero is significant at p < 0.05
clipped_monthly$TairMI_corrected <- clipped_monthly$Tair_CH4_mi - sig_Tair
clipped_monthly$gppMI_corrected <- clipped_monthly$GPP_CH4_mi - sig_gpp

a <- ggplot(clipped_monthly) +
  geom_point(aes(x = date, y = TairMI_corrected), color = "red", alpha = 0.5) +
  geom_line(aes(x = date, y = ma(TairMI_corrected, 3)), color = "red", size = 1) +
  geom_point(aes(x = date, y = gppMI_corrected), color = "dark green", alpha = 0.5) +
  geom_line(aes(x = date, y = ma(gppMI_corrected, 3)), color = "dark green", size = 1) +  
  scale_x_date(date_breaks = "1 year", labels = date_format("%Y")) +
  scale_y_continuous(limits = c(0, 0.3)) +
  ylab(expression(bolditalic(MI)~"(x,"~F[CH4]~")")) + xlab("") + ggtitle("Salinization wetland") +
  theme_bw()

b <- ggplot(subset(peat6_daily, year > 2011), aes(x=as.Date(datetime), y=Sal)) +
  geom_line(color = "blue") +
  scale_y_continuous(limits = c(0, 7)) +
  scale_x_date(date_breaks = "1 year", labels = date_format("%Y")) +
  ylab("Salinity (psu)") + xlab("Year") +
  theme_bw()

#Multiple plots aligned
iA <- ggplotGrob(a)
iB <- ggplotGrob(b)

maxWidth = grid::unit.pmax(iA$widths[2:5], iB$widths[2:5])
iA$widths[2:5] <- as.list(maxWidth); iB$widths[2:5] <- as.list(maxWidth); 

grid.arrange(iA, iB, ncol=1)

These dynamics stand in contrast to the stable couplings between driving variables and F~CH4~ observed at the nearby wetland that did not experience salinization, where we observed similar magnitudes and patterns for MI~Ta,FCH4~ and MI~GEP,FCH4~ across all years (Figure 3). No decoupling of F~CH4~ and GEP occurred during 2015 and 2016 when salinity levels did not increase at this wetland. Flux magnitudes were often similar to or larger than those at the salinization wetland, particularly during the high salinity periods (Figure S3). These wetlands area located within 13 km of one another and experience the same climate [@chamberlain_soil_2018].

#create a growing month stamp
peat19_all$month <- month(peat19_all$datetime)
peat19_all$m_yr <- paste(peat19_all$year, peat19_all$month, sep = "")

#create data list by month
by_month <- peat19_all %>%
  filter(year > 2012 & year < 2018) %>%
  group_by(m_yr) %>%
  nest()

#plot that needs to be dealt with later
clipped_monthly <- subset(peat19_monthly, year > 2012 & year < 2018)
clipped_monthly$date <- as.Date(clipped_monthly$datetime)
clipped_monthly$GPP_CH4_mi <- mi_timeseries(gpp_ANNnight, wm, data_list = by_month, bins = 10)
clipped_monthly$Tair_CH4_mi <- mi_timeseries(TA.x, wm, data_list = by_month, bins = 10)

# Monte Carlo significance thresholds (dont run everytime as they take a while)
sig_gpp <- conf_series(gpp_ANNnight, wm, data_list = by_month, type = "MI", 
                       runs = 1000, alpha = 0.05, bins = 10)
sig_Tair <- conf_series(TA.x, wm, data_list = by_month, type = "MI",
                        runs = 1000, alpha = 0.05, bins = 10)

# Correct MI values of bias due to sample size, now anything over zero is significant at p < 0.05
clipped_monthly$TairMI_corrected <- clipped_monthly$Tair_CH4_mi - sig_Tair
clipped_monthly$gppMI_corrected <- clipped_monthly$GPP_CH4_mi - sig_gpp

a <- ggplot(clipped_monthly) +
  geom_point(aes(x = date, y = TairMI_corrected), color = "red", alpha = 0.5) +
  geom_line(aes(x = date, y = ma(TairMI_corrected, 3)), color = "red", size = 1) +
  geom_point(aes(x = date, y = gppMI_corrected), color = "dark green", alpha = 0.5) +
  geom_line(aes(x = date, y = ma(gppMI_corrected, 3)), color = "dark green", size = 1) +  
  scale_x_date(date_breaks = "1 year", labels = date_format("%Y")) +
  scale_y_continuous(limits = c(0, 0.3)) +
  ylab(expression(bolditalic(MI)~"(x,"~F[CH4]~")")) + xlab("") + ggtitle("Reference wetland") +
  theme_bw()

b <- ggplot(subset(peat19_daily, year > 2012 & year < 2018), aes(x=as.Date(datetime), y=Sal)) +
  geom_line(color = "blue") +
  scale_x_date(date_breaks = "1 year", labels = date_format("%Y")) +
  scale_y_continuous(limits = c(0, 7)) +
  ylab("Salinity (psu)") + xlab("Year") +
  theme_bw()

#Multiple plots aligned
iA <- ggplotGrob(a)
iB <- ggplotGrob(b)

maxWidth = grid::unit.pmax(iA$widths[2:5], iB$widths[2:5])
iA$widths[2:5] <- as.list(maxWidth); iB$widths[2:5] <- as.list(maxWidth); 

grid.arrange(iA, iB, ncol=1)

We observed a similar transition between the low salinity period (2012-2013) and the high salinity period (2015-2016) in TE from both drivers to F~CH4~ (Figure 4). Significant TE from driver variables to F~CH4~ can be interpreted as causal, or directed, interactions between drivers and F~CH4~, rather than more traditional correlation analyses that do not determine the direction of interactions. During the low salinity years, we observed similar magnitude causal interactions from GEP to F~CH4~ (TE(GEP > F~CH4~)) and from Ta to F~CH4~ (TE(Ta > F~CH4~)). Significant TE(GEP > F~CH4~) occurred between 0-12 hour time lags and at similar lag periods from the previous two days (Figure 4). These transfers were still observable during 2015, though over condensed lag periods. During peak salinity in 2016, only weak significant TE(GEP > F~CH4~) was observed at 0-3 hour lags and a small window near 48 hours (Figure 4). Significant TE spikes should be interpreted with caution because ~ 6 values should occur by chance at P < 0.05 across the 120 lags evaluated. These analyses suggest that current F~CH4~ rates are affected by recent GEP, as well as GEP that occurred during the previous day. We assessed time lags for over two days to assess whether GEP and temperature fluctuations from previous days was predictive of current F~CH4~ rates.

We observed a very different dynamic in TE(Ta > F~CH4~) that mirrors how coupling strengths varied across time, where TE(Ta > F~CH4~) was of similar magnitude to TE(GEP > F~CH4~) during the low salinity years, and TE(Ta > F~CH4~) strengthened during the high salinity years (Figure 4). Under most conditions, Ta at lags up to ~30 hours affected current CH~4~ fluxes, with additional significant transfers around 48 hours (Figure 4). Transfer entropy from 2014 and 2017 is not presented because the wetland experienced an insect outbreak and flushing disturbance, respectively, and obscure our ability to evaluate the effect of salinization alone. We observed no significant transfer of information from F~CH4~ to Ta (Figure S4), as expected since biological trace gas fluxes should not alter regional meteorological conditions. These analyses suggest that temperature became a more dominant control of wetland F~CH4~ as the salinity disturbance progressed.

#apply anomaly filter
anom_data <- peat6_all %>%
  arrange(year, time) %>%
  mutate(CH4_anom = anom_filter(wm_gf),
         TA_anom = anom_filter(TA.x),
         GPP_anom = anom_filter(gpp_ANNnight))

anom_data <- arrange(anom_data, datetime)
anom_data$CH4na_anom <- ifelse(is.na(anom_data$wm), NA, anom_data$CH4_anom) # re-insert missing values

# Focus only on high v. low salinity years without additional disturbance
# other disturbance includes the 2014 insect outbreak and 2017 flushing events
anom_data <- subset(anom_data, year %in% c(2012, 2013, 2015, 2016))

#create data list by annual growing season
by_year <- anom_data %>%
  filter(DOY > 100 & DOY < 300) %>%
  group_by(year) %>%
  nest()

#calculate lagged transfer entropy time series (up to 2 day lag at half-hourly timestep)
# for GPP -> CH4
tr_wm_gpp <- tr_timeseries(GPP_anom, CH4na_anom, data_list = by_year, type = "observed",
                           lags = 120, bins = 10, normalize = T)

mc_wm_gpp <- tr_timeseries(GPP_anom, CH4na_anom, data_list = by_year, type = "shuffled",
                           lags = 120, bins = 10, runs = 10, alpha = 0.05, normalize = T)
# for TA -> CH4
tr_wm_ta <- tr_timeseries(TA_anom, CH4na_anom, data_list = by_year, type = "observed",
                           lags = 120, bins = 10, normalize = T)

mc_wm_ta <- tr_timeseries(TA_anom, CH4na_anom, data_list = by_year, type = "shuffled",
                           lags = 120, bins = 10, runs = 10, alpha = 0.05, normalize = T)

#unlist data to tidy dataframe
gpp.wm_df <- tidy_list(tr_wm_gpp); gpp.wm_shuf <- tidy_list(mc_wm_gpp)
ta.wm_df <- tidy_list(tr_wm_ta); ta.wm_shuf <- tidy_list(mc_wm_ta)

#add mean annual conductivity to dataframes by year
mean_sal <- anom_data %>%
  filter(DOY > 100 & DOY < 300) %>%
  group_by(year) %>%
  summarise(`Sal. (psu)` = mean(sal, na.rm = T))

gpp.wm_df <- merge(gpp.wm_df, mean_sal, all.x = T)
ta.wm_df <- merge(ta.wm_df, mean_sal, all.x = T)

a <- ggplot() +
  geom_line(data = gpp.wm_df, aes(x = hour, y = tr)) +
  geom_line(data = gpp.wm_shuf, aes(x = hour, y = tr), color = "red") +
  facet_wrap(~year, nrow = 1) +
  scale_x_continuous(breaks = seq(0, 60, by = 12)) +
  scale_y_continuous(limits = c(0.005, 0.043)) +
  xlab("") + ylab(expression(bolditalic(TE)~"("~GEP %->% F[CH4]~")")) +
  theme(legend.margin = margin(0, 0, 0, 0),
        legend.position=c(.82, .7), legend.key.size =  unit(0.1, "in"),
        legend.text = element_text(size=8), legend.title = element_text(size=8.5, face = "bold"),
        legend.background = element_rect(fill = "transparent"))

b <- ggplot() +
  geom_line(data = ta.wm_df, aes(x = hour, y = tr)) +
  geom_line(data = ta.wm_shuf, aes(x = hour, y = tr), color = "red") +
  facet_wrap(~year, nrow = 1) +
  scale_y_continuous(limits = c(0.005, 0.043)) +
  scale_x_continuous(breaks = seq(0, 60, by = 12)) +
  xlab("Lag (hours)") + ylab(expression(bolditalic(TE)~"("~T[a] %->% F[CH4]~")")) +
  theme(legend.position = "none")

#Multiple plots aligned
iA <- ggplotGrob(a)
iB <- ggplotGrob(b)

maxWidth = grid::unit.pmax(iA$widths[2:5], iB$widths[2:5])
iA$widths[2:5] <- as.list(maxWidth); iB$widths[2:5] <- as.list(maxWidth);

grid.arrange(iA, iB, ncol=1)

# train nn model on pre-salinity rise years and use to calculate reduction in component fluxes 
pre_rise <- peat6_daily %>%
  filter(year > 2011 & year < 2015) %>%
  select(mgCH4, gGEP, gER, mET, PAR, H, Tair, Tw, Ts, mVPD, PA, u., Rain, WTD)

set.seed(1000)
# Random forest model for GEP trained pre-salinity rise
pre_GEP <- train(gGEP ~ Tair + PAR + mVPD + PA + u., data = pre_rise,
                      method = "rf",
                      trControl = trainControl(method = "cv", number = 5),
                      preProcess = c("medianImpute"),
                      tuneGrid = expand.grid(.mtry = c(2, 3, 4)),
                      na.action = na.pass)

set.seed(2000)
# Random forest model for FCH4 trained pre-salinity rise
pre_CH4 <- train(mgCH4 ~ gGEP + Tair + mVPD + PA + u., data = pre_rise,
                      method = "rf",
                      trControl = trainControl(method = "cv", number = 5),
                      preProcess = c("medianImpute"),
                      ntree = 1000,
                      tuneGrid = expand.grid(.mtry = c(2, 3, 4)),
                      na.action = na.pass)

# extract data to make predictions on the salinity rise years
rise_years <- peat6_daily %>%
  filter(year > 2014 & year < 2017) %>%
  select(datetime, year, mgCH4, gGEP, gER, mET, PAR, H, Tair, Tw, Ts, mVPD, PA, u., Rain, WTD)

# generate predictions for high salinity years for GEP and FCH4
rise_years$GEPpred <- predict.train(pre_GEP, rise_years, na.action = na.pass)
rise_years$CH4pred <- predict.train(pre_CH4, rise_years, na.action = na.pass)

#plotting all the measures
a <- ggplot(rise_years) +
  geom_line(aes(x=datetime, y=mgCH4)) +
  geom_line(aes(x=datetime, y=CH4pred), color = "red") +
  ylab(expression(F[CH4]~" (mg C"~m^{-2}~d^{-1}~")")) + xlab("") +
  ggtitle("High salinity period (2015-2016)") +
  theme_classic()

b <- ggplot(rise_years) +
  geom_line(aes(x=datetime, y=gGEP*-1)) +
  geom_line(aes(x=datetime, y=GEPpred*-1), color = "red") +
  ylab(expression("GEP "~"("~g~C~m^{-2}~d^{-1}~")")) + xlab("Date") +
  theme_classic()

grid.arrange(a, b, ncol = 1)

#calculate annual budgets for RF predictions and observations during salinity years
annual_budgets <- rise_years %>%
  group_by(year) %>%
  summarise(CH4yr = mean(mgCH4)*365/1000,
            CH4yr_rf = mean(CH4pred)*365/1000,
            GEPyr = mean(gGEP)*365,
            GEPyr_rf = mean(GEPpred)*365) %>%
  mutate(CH4pc = abs(CH4yr - CH4yr_rf)*100/CH4yr,
         GEPpc = abs(GEPyr - GEPyr_rf)*100/abs(GEPyr))

Salinity 'naive' model predictions during the high salinity period demonstrate that CH~4~ fluxes were not substantially lower than expected if the impact of salinity was not taken into account (Figure 5). During 2015, annual CH~4~ flux observations were only r annual_budgets$CH4pc[annual_budgets$year == 2015]% below the model predictions, though during 2016, observed annual budgets were r annual_budgets$CH4pc[annual_budgets$year == 2016]% below the model predictions. A much larger effect was observed for GEP, where GEP was substantially below salinity 'naive' model predictions (Figure 5). During 2015, annual GEP budgets was r annual_budgets$GEPpc[annual_budgets$year == 2015]% lower than model predictions, and during 2016 annual GEP was r annual_budgets$GEPpc[annual_budgets$year == 2016]% lower than model predictions. The GEP model explained r max(pre_GEP$results$Rsquared)*100% of observed variance (r^2^; m~try~ = 4), and the F~CH4~ model explained r max(pre_CH4$results$Rsquared)*100% of observed variance (r^2^; m~try~ = 3) in the training data. Observed annual GEP budgets were r peat6_yearly$tGEP[which(peat6_yearly$year == 2015)] and r peat6_yearly$tGEP[which(peat6_yearly$year == 2016)] g C m^-2^ in 2015 and 2016, respectively, and annual CH~4~ budgets were r peat6_yearly$tCH4[which(peat6_yearly$year == 2015)] and r peat6_yearly$tCH4[which(peat6_yearly$year == 2016)] g C m^-2^ in 2015 and 2016, respectively.

4. Discussion

Our study suggests that episodic salinization affecting freshwater restored wetlands disproportionately reduces GEP compared to CH~4~ fluxes (Figure 5), leading to an overall increase in wetland GHG emissions during peak salinity [@chamberlain_soil_2018]. The dominant plant species in these wetlands, Typha spp., are generally found in low salinity reaches of the San Francisco Estuary [@watson_abundance_2009], and their growth and recruitment is considerably reduced at the salinity levels observed here [@beare_cattail_1987; @glenn_effects_1995]. These large reductions in productivity caused the wetland to become a CO~2~ source in 2016 (r peat6_yearly$tNEE[which(peat6_yearly$year == 2016)] g C m^-2^ yr^-1^), standing in contrast to persistent CO~2~ uptake at a nearby mesohaline brackish tidal marsh (-225 g C m^-2^ yr^-1^), vegetated with a more salt-tolerant plant community, that experienced similar salinities [@knox_direct_2018]. Additionally, the most pronounced effects of salinity on both GEP and CH~4~ fluxes in the 2016 growing season (Figure 5) notably lagged the initial salinity rise during 2015, demonstrating potential slow responses of ecosystem fluxes to salinity disturbance. However, we also observed a rapid increase in CH~4~ and GEP flux rates following the managed freshwater flushing in 2017 to flux rates similar to the undistrubed wetland (Figure S3), suggesting that these freshwater wetland ecosystems may be faster to regain normal ecosystem function following such salinity disturbances.

Salinity levels were also within the range where suppression of CH~4~ emissions is expected in seawater, as observed by eddy covariance [@holm_ecosystem_2016] and chamber studies [@poffenbarger_salinity_2011], suggesting the driver of wetland salinization was enhanced evaporation rather than continuous saltwater intrusion that would introduce sulfate to inhibit CH~4~ production. Characterizing the ionic composition of wetland waters over the disturbance would allow for a more direct identification of salinization drivers. Regardless, our findings suggest that GHG emissions from these restored wetlands may increase with drought-induced salinization due to small decreases in CH~4~ emissions relative to reductions in CO~2~ uptake.

Despite small changes in CH~4~ flux magnitudes, we observed strong changes in couplings (Figure 2) and causal interactions (Figure 4) between driving processes and CH~4~ fluxes during salinization. During high salinity, couplings and information transfer from Ta to F~CH4~ strengthened, while coupling and causal interactions from GEP to F~CH4~ declined (Figures 2 and 4, respectively). Similar dynamics were not observed at the reference wetland, where GEP,F~CH4~ couplings were dominant to Ta,F~CH4~ across all years (Figure 3). These observations, taken with our findings of small reductions in daily to annual F~CH4~ (Figure 5), suggest that either (1) microbial utilization of other wetland C substrates, such as soil organic C, compensated for reduced availability of recent photosynthates, or (2) recent photosynthates were never dominant driver of CH~4~ emissions within these wetlands. These conclusion stands in contrast to previous eddy covariance studies from rice within the region, where GEP was the dominant driver of CH~4~ fluxes compared to temperature [@hatala_gross_2012; @knox_biophysical_2016]. However, such CH~4~ flux temperature dependence is consistent with findings across a number of ecosystems and observation scales [@yvon-durocher_methane_2014]. The contrast between our findings and rice suggests that dominant biogeochemical processes may vary across ecosystem types, and caution should be taken in extrapolating a mechanistic understanding from one system to another. In a similar vein, our previous work has demonstrated highly divergent ecosystem CH~4~ flux controls [@chamberlain_soil_2018] and GEP, CH~4~ flux scaling relationships [@hemes_compromise_2018] driven by spatially variable soil properties across these restored wetlands. Similarly, we observed CH~4~ flux-driver relationships conditional on environmental conditions, where GEP couplings reduced as the wetland transitioned from oligohaline to high salinity, mesohaline conditions (Figure 2).

This study also highlights the utility of applying information theory and machine learning approaches to understanding ecological responses in situ. Information theory metrics are particularly suited to resolve connectivity and causation in complex systems due to their lack of parametric assumptions and ability to resolve scale-emergent interactions [@sturtevant_identifying_2016; @larsen_disrupted_2017; @chamberlain_soil_2018; @knox_direct_2018; @rinderer_assessing_2018]. Similarly, machine learning approaches, such as random forests applied here and ANNs used elsewhere, have increasingly been applied to quantify dominant ecosystem greenhouse gas exchange controls where non-linear, complex responses are expected [@knox_biophysical_2016; @albert_climate_2017; @stocker_quantifying_2018]. All of the above approaches are generally data intensive [@ruddell_ecohydrologic_2009; @getz_wayne_m_making_2017], though large observational datasets are becoming publicly available through data sharing communities [@hampton_big_2013], such as FluxNet [@chu_fluxes_2017]. These tools provide a strong foundation for assessing connectivity and functional relationships in natural ecosystems, where ecosystem-scale experiments may be impractical or impossible. Instead, a combination of observational networks, natural experiments, and statistical approaches can be used to understand ecosystem-scale responses to environmental change.

It is important to note that these techniques are no substitute of an ecological understanding of one's system, as spurious transfer entropy has been documented across multiple studies [@rinderer_assessing_2018], often related to multivariate interactions masked in the bivariate statistic [@runge_escaping_2012]. For example, we observed no information transfer from F~CH4~ to Ta (Figure S4), as expected, but did observe some spurious information transfer from F~CH4~ to GEP (Figure S5). This is likely due to the presence of a shared driver, such as soil temperature, or unobserved processes that covary with CH~4~ production, like decomposition-driven soil priming. Multivariate approaches have been proposed to identify spurious links through conditional mutual information [@runge_escaping_2012; @runge_identifying_2015], and these techniques are likely better suited to systems where a priori knowledge is limited.

Our findings demonstrate dynamic controls to wetland CH~4~ emissions across a disturbance regime, where flux-driver relationships were conditional on salinity levels, and temperature controls compensate for reduced connectivity to GEP during salinization Conceptual diagram of how wetland fluxes and process interactions varied across the salinity disturbance. Wetland salinization caused large declines in GEP fluxes with notably smaller reductions in methane emission due to increased temperature dependence on methane emission rates, making up for losses in GEP dependencies. . We estimate only a small reduction in annual CH~4~ budgets, relative to annual GEP (Figure 5), suggesting that GEP substrate additions are not the dominant driver of CH~4~ production and emission. Instead, other substrates likely compensate for this reduction or recent photosynthates are not an important substrate in these wetlands. Rising salinity in freshwater-adapted wetlands does not necessarily lead to reductions in CH~4~ emissions, and consequently GHG budgets, suggesting a risk of increasing wetland GHG emissions due to salinization as water resources become more limiting in Mediterranean regions.

Acknowledgements

This research was supported in part by the U.S. Department of Energy’s Office of Science and its funding of Ameriflux core sites (Ameriflux contract 7079856), and the California Division of Fish and Wildlife, through a contract of the California Department of Water Resources (Award 4600011240).