In stineb/fvar:

devtools::load_all(".")
library(tidyverse)
library(caret)

Usage example

Get FLUXNET 2015 data

Training data is to be read in as a data frame with column names corresponding to the variable names defined by settings (see below).

Read observational data including soil moisture for one site (see column sitename) from FLUXNET 2015 data. The following loads data extracted from original files by data-raw/prepare_data_example.R.

load("./data/df_fluxnet.Rdata")

Define settings

This named list is used by several functions of the package.

settings <- list( 
  target        = "GPP_NT_VUT_REF", 
  predictors    = c("temp_day","vpd_day", "ppfd", "fpar", "swin", "netrad"), 
  varnams_soilm = "wcont_splash",
  rowid         = "date",
  nnodes_pot    = 15,
  nnodes_act    = 20,
  nrep          = 3,
  package       = "nnet"
  )

The settings are a named list that must contain the following variables:

target : Character, target variable for the NN model, must correspond to the respective column name in the training data.
predictors : Vector of characters, predictor variables (excluding soil moisture) for the NN model, must correspond to the respective column name in the training data.
varnams_soilm : (Vector of) character(s). Name of the soil moisture variable(s). Must correspond to the respective column name in the training data.
nnodes : Number of hidden nodes of the NN.
nrep : Number of repetitions of NN model training. fvar is calculated as a mean across repetitions.
package : R package name that implements the NN (for now, use "nnet", which is used in combination with caret)
rowid : Column name for uniquely identifying data points (rows)

Prepare data

To prepare training data, removing NAs and outliers, in a standardized way.

df_train <- prepare_trainingdata_fvar( df_fluxnet, settings )

Train models and predict fvar

Train models and get fvar (is returned as one column in the returned data frame). Here, we specify the soil moisture threshold to separate training data into moist and dry days to 0.6 (fraction of water holding capacity). Deriving an optimal threshold is implemented by the functions profile_soilmthreshold_fvar() and get_opt_threshold() (see below).

out <- train_predict_fvar( 
  df_train,
  settings,
  soilm_threshold    = 0.6,
  weights            = NA, 
  verbose = TRUE
  )

Test performance of NN models

Multiple aspects need to be satisfied:

NN$_\text{act}$ has no systematic bias related to the level of soil moisture.

df_test <- out$df_all %>% 
  mutate(bias_act = nn_act - obs,
         bias_pot = nn_pot - obs,
         soilm_bin = cut(soilm, 10)
         ) 
df_test %>% 
  tidyr::pivot_longer(cols = c(bias_act, bias_pot), names_to = "source", values_to = "bias") %>% 
  ggplot(aes(x = soilm_bin, y = bias, fill = source)) +
  geom_boxplot() +
  geom_hline(aes(yintercept = 0.0), linetype = "dotted") +
  labs(title = "Bias vs. soil moisture")

This can be tested using a linear regression model of bias vs. soil moisture, testing whether the slope is significantly different from zero.

linmod <- lm(bias_act ~ soilm, data = df_test)
testsum <- summary(linmod)
slope_mid <- testsum$coefficients["soilm","Estimate"]
slope_se  <- testsum$coefficients["soilm","Std. Error"]
passtest_bias_vs_soilm <- ((slope_mid - slope_se) < 0 && (slope_mid + slope_se) > 0)
print(passtest_bias_vs_soilm)
df_test %>% 
  ggplot(aes(x = soilm, y = bias_act)) +
  geom_point() +
  geom_smooth(method = "lm")

NN$\text{pot}$ and NN$\text{act}$ have no bias during moist days.

df_test %>% 
  tidyr::pivot_longer(cols = c(bias_act, bias_pot), names_to = "source", values_to = "bias") %>% 
  dplyr::filter(moist) %>% 
  ggplot(aes(y = bias, fill = source)) +
  geom_boxplot() +
  geom_hline(aes(yintercept = 0), linetype = "dotted")
df_test %>% 
  dplyr::filter(moist) %>%
  summarise(bias_act = mean(bias_act, na.rm = TRUE), bias_pot = mean(bias_pot, na.rm = TRUE))