model_trend: Modeling of indicator trends
In INDperform: Evaluation of Indicator Performances for Assessing Ecosystem States

Description Usage Arguments Details Value See Also Examples

The function models the long-term trend of each indicator (IND) based on Generalized Additive Models (GAM) and returns a tibble with IND-specific GAM outputs.

model_trend(
  ind_tbl,
  time,
  train = 1,
  random = FALSE,
  k = 4,
  family = stats::gaussian()
)

`ind_tbl`	A data frame, matrix or tibble containing only the (numeric) IND variables. Single indicators should be coerced into a data frame to keep the indicator name. If kept as vector, default name will be 'ind'.
`time`	A vector containing the actual time steps (e.g. years; should be the same for the IND data).
`train`	The proportion of observations that should go into the training data on which the GAMs are fitted. Has to be a numeric value between 0 and 1; the default is 1 (i.e. the full time series is fitted).
`random`	logical; should the observations for the training data be randomly chosen? Default is FALSE.
`k`	Choice of knots (for the smoothing function `s`); the default is 4.
`family`	A description of the error distribution and link to be used in the GAM. This needs to be defined as a family function (see also `family`). All standard family functions can be used as well some of the distribution families in the mgcv package (see `family.mgcv`; e.g.`negbin` or `nb`).

To test for linear or non-linear long-term changes, each indicator (IND) in the ind_tbl is modeled as a smoothing function of the time vector (usually years) using the gam function. The trend can be tested for the full time series (i.e. all observations are used as training data) or for a random or selected subset.

The GAMs are build using the default settings in the gam function and the smooth term function s). However, the user can adjust the distribution and link by modifying the family argument as well as the maximum level of non-linearity by setting the number of knots:

gam(ind ~ s(time, k = k), family = family, data = training_data)

The function returns a tibble, which is a trimmed down version of the data.frame(), including the following elements:

ind_id: Indicator IDs.
ind: Indicator names. These might be modified to exclude any character, which is not in the model formula (e.g. hyphens, brackets, etc. are replaced by an underscore, variables starting with a number will get an x before the number.
p_val: The p values for the smoothing term (here time).
model: A list-column of indicator-specific gam objects.
ind_train: A list-column with indicator values of the training data.
time_train: A list-column with the time values (e.g. years) of the training data.
pred: A list-column with indicator values predicted from the GAM for the training period.
ci_up: A list-column with the upper 95% confidence interval of predicted indicator values.
ci_low: A list-column with the lower 95% confidence interval of predicted indicator values.

plot_diagnostics for assessing model diagnostics, plot_trend for trend visualization, tibble and the vignette("tibble") for more information on tibbles, gam for more information on GAMs

# Using the Baltic Sea demo data in this package
ind_tbl <- ind_ex[ ,-1] # excluding the year
time <- ind_ex$Year
# Using the default settings
trend_tbl <- model_trend(ind_tbl, time)
# Change the training and test data assignment
model_trend(ind_tbl, time, train = .5, random = TRUE)
# To keep the name when testing only one indicator, coerce vector to data frame
model_trend(data.frame(MS = ind_tbl$MS), time, train = .5, random = TRUE)