model_trend: Modeling of indicator trends

Description Usage Arguments Details Value See Also Examples

View source: R/model_trend.R

Description

The function models the long-term trend of each indicator (IND) based on Generalized Additive Models (GAM) and returns a tibble with IND-specific GAM outputs.

Usage

1
2
3
4
5
6
7
8
model_trend(
  ind_tbl,
  time,
  train = 1,
  random = FALSE,
  k = 4,
  family = stats::gaussian()
)

Arguments

ind_tbl

A data frame, matrix or tibble containing only the (numeric) IND variables. Single indicators should be coerced into a data frame to keep the indicator name. If kept as vector, default name will be 'ind'.

time

A vector containing the actual time steps (e.g. years; should be the same for the IND data).

train

The proportion of observations that should go into the training data on which the GAMs are fitted. Has to be a numeric value between 0 and 1; the default is 1 (i.e. the full time series is fitted).

random

logical; should the observations for the training data be randomly chosen? Default is FALSE.

k

Choice of knots (for the smoothing function s); the default is 4.

family

A description of the error distribution and link to be used in the GAM. This needs to be defined as a family function (see also family). All standard family functions can be used as well some of the distribution families in the mgcv package (see family.mgcv; e.g.negbin or nb).

Details

To test for linear or non-linear long-term changes, each indicator (IND) in the ind_tbl is modeled as a smoothing function of the time vector (usually years) using the gam function. The trend can be tested for the full time series (i.e. all observations are used as training data) or for a random or selected subset.

The GAMs are build using the default settings in the gam function and the smooth term function s). However, the user can adjust the distribution and link by modifying the family argument as well as the maximum level of non-linearity by setting the number of knots:

gam(ind ~ s(time, k = k), family = family, data = training_data)

Value

The function returns a tibble, which is a trimmed down version of the data.frame(), including the following elements:

ind_id

Indicator IDs.

ind

Indicator names. These might be modified to exclude any character, which is not in the model formula (e.g. hyphens, brackets, etc. are replaced by an underscore, variables starting with a number will get an x before the number.

p_val

The p values for the smoothing term (here time).

model

A list-column of indicator-specific gam objects.

ind_train

A list-column with indicator values of the training data.

time_train

A list-column with the time values (e.g. years) of the training data.

pred

A list-column with indicator values predicted from the GAM for the training period.

ci_up

A list-column with the upper 95% confidence interval of predicted indicator values.

ci_low

A list-column with the lower 95% confidence interval of predicted indicator values.

See Also

plot_diagnostics for assessing model diagnostics, plot_trend for trend visualization, tibble and the vignette("tibble") for more information on tibbles, gam for more information on GAMs

Examples

1
2
3
4
5
6
7
8
9
# Using the Baltic Sea demo data in this package
ind_tbl <- ind_ex[ ,-1] # excluding the year
time <- ind_ex$Year
# Using the default settings
trend_tbl <- model_trend(ind_tbl, time)
# Change the training and test data assignment
model_trend(ind_tbl, time, train = .5, random = TRUE)
# To keep the name when testing only one indicator, coerce vector to data frame
model_trend(data.frame(MS = ind_tbl$MS), time, train = .5, random = TRUE)

saskiaotto/INDperform documentation built on Oct. 27, 2021, 10:33 p.m.