apply_gam: Apply GAM to time series and assess emerging status
In trias-project/trias: Process Data for the Project Tracking Invasive Alien Species (TrIAS)

apply_gam

R Documentation

Apply GAM to time series and assess emerging status

Description

This function applies generalized additive models (GAM) to assess emerging status for a certain time window.

Usage

apply_gam(
  df,
  y_var,
  eval_years,
  year = "year",
  taxonKey = "taxonKey",
  type_indicator = "observations",
  baseline_var = NULL,
  p_max = 0.1,
  taxon_key = NULL,
  name = NULL,
  df_title = NULL,
  x_label = "year",
  y_label = "Observations",
  saveplot = FALSE,
  dir_name = NULL,
  width = NULL,
  height = NULL,
  verbose = FALSE
)

Arguments

`df`	df. A dataframe containing temporal data.
`y_var`	character. Name of column containing variable to model. It has to be passed as string, e.g. `"occurrences"`.
`eval_years`	numeric. Temporal value(s) when emerging status has to be evaluated.
`year`	character. Name of column containing temporal values. It has to be passed as string, e.g. `"time"`. Default: `"year"`.
`taxonKey`	character. Name of column containing taxon IDs. It has to be passed as string, e.g. `"taxon"`. Default: `"taxonKey"`.
`type_indicator`	character. One of `"observations"`, `"occupancy"`. Used in title of the output plot. Default: `"observations"`.
`baseline_var`	character. Name of the column containing values to use as additional covariate. Such covariate is introduced in the model to correct research effort bias. Default: `NULL`. If `NULL` internal variable `method_em = "basic"`, otherwise `method_em = "correct_baseline"`. Value of `method_em` will be part of title of output plot.
`p_max`	numeric. A value between 0 and 1. Default: 0.1.
`taxon_key`	numeric, character. Taxon key the timeseries belongs to. Used exclusively in graph title and filename (if `saveplot = TRUE`). Default: `NULL`.
`name`	character. Species name the timeseries belongs to. Used exclusively in graph title and filename (if `saveplot = TRUE`). Default: `NULL`.
`df_title`	character. Any string you would like to add to graph titles and filenames (if `saveplot = TRUE`). The title is always composed of: `"GAM"` + `type_indicator` + `method_em` + `taxon_key` + `name` + `df_title` separated by underscore ("_"). Default: `NULL`.
`x_label`	character. x-axis label of output plot. Default: `"year"`.
`y_label`	character. y-axis label of output plot. Default: `"number of observations"`.
`saveplot`	logical. If `TRUE` the plots are automatically saved. Default: `FALSE`.
`dir_name`	character. Path of directory where saving plots. If path doesn't exists, directory will be created. Example: "./output/graphs/". If `NULL` and `saveplot` is `TRUE`, plots are saved in current directory. Default: `NULL`.
`width`	numeric. Plot width in pixels. Values are passed to ggsave. Ignored if `saveplot` = `FALSE`. If `NULL` and `saveplot` is `TRUE`, `width` is set to 1680 and a message is returned. Default: `NULL`.
`height`	numeric. Plot height in pixels. Values are passed to ggsave. Ignored if `saveplot` = `FALSE`. If `NULL` and `saveplot` is `TRUE`, `height` to 1200 and a message is returned. Default: `NULL`.
`verbose`	logical. If `TRUE` status of processing and possible issues are returned. Default: `FALSE`.

Details

The GAM modelling is performed using the mgcvb::gam(). To use this function, we pass:

a formula
a family object specifying the distribution
a smoothing parameter estimation method

For more information about all other arguments, see ⁠[mgcv::gam()]⁠.

If no covariate is used (baseline_var = NULL), the GAM formula is: n ~ s(year, k = maxk, m = 3, bs = "tp"). Otherwise the GAM formula has a second term, s(n_covariate) and so the GAM formula is n ~ s(year, k = maxk, m = 3, bs = "tp") + s(n_covariate).

Description of the parameters present in the formula above:
k: dimension of the basis used to represent the smooth term, i.e. the number of knots used for calculating the smoother. We #' set k to maxk, which is the number of decades in the time series. If less than 5 decades are present in the data, maxk is #' set to 5.
bs indicates the basis to use for the smoothing: we uses the default penalized thin plate regression splines.
m specifies the order of the derivatives in the thin plate spline penalty. We use m = 3, the default value.

We use ⁠[mgcv::nb()]⁠, a negative binomial family to perform the GAM.

The smoothing parameter estimation method is set to REML (Restricted maximum likelihood approach). If the P-value of the GAM smoother(s) is/are above threshold value p_max, GAM is not performed and the next warning is returned: "GAM output cannot be used: p-values of all GAM smoothers are above {p_max}" where p_max is the P-value used as threshold as defined by argument p_max.

If the mgcv::gam() returns an error or a warning, the following message is returned to the user: "GAM ({method_em}) cannot be performed or cannot converge.", where method_em is one of "basic" or "correct_baseline". See argument baseline_var.

The first and second derivatives of the smoother is calculated using function gratia::derivatives() with the following hard coded arguments:
type: the type of finite difference used. Set to "central".
order: 1 for the first derivative, 2 for the second derivative
level: the confidence level. Set to 0.8

eps: the finite difference. Set to 1e-4.

For more details, please check derivatives.

The sign of the lower and upper confidence levels of the first and second derivatives are used to define a detailed emergency status (em) which is internally used to return the emergency status, em_status, which is a column of the returned data.frame em_summary.

ucl-1	lcl-1	ucl-2	lcl-2	em	em_status
+	+	+	+	4	3 (emerging)
+	+	+	-	3	3 (emerging)
+	+	-	-	2	2 (potentially emerging)
-	+	+	+	1	2 (potentially emerging)
+	-	+	-	0	1 (unclear)
+	-	-	-	-1	0 (not emerging)
-	-	+	+	-2	0 (not emerging)
-	-	+	-	-3	0 (not emerging)
-	-	-	-	-4	0 (not emerging)

Value

list with six slots:

em_summary: df. A data.frame summarizing the emerging status outputs. em_summary contains as many rows as the length of input variable eval_year. So, if you evaluate GAM on three years, em_summary will contain three rows. It contains the following columns:
- "taxonKey": column containing taxon ID. Column name equal to value of argument taxonKey.
- "year": column containing temporal values. Column name equal to value of argument year. Column itself is equal to value of argument eval_years. So, if you evaluate GAM on years 2017, 2018 (eval_years = c(2017, 2018)), you will get these two values in this column.
- em_status: numeric. Emerging statuses, an integer between 0 and 3.
- growth: numeric. Lower limit of GAM confidence interval for the first derivative, if positive. It represents the lower guaranteed growth.
- method: character. GAM method, One of: "correct_baseline" and "basic". See details above in description of argument use_baseline.
model: gam object. The model as returned by gam() function. NULL if GAM cannot be applied.
output: df. Complete data.frame containing more details than the summary em_summary. It contains the following columns:
- all columns in df.
- method: character. GAM method, One of: "correct_baseline" and "basic". See details above in description of argument use_baseline.
- fit: numeric. Fit values.
- ucl: numeric. The upper confidence level values.
- lcl: numeric. The lower confidence level values.
- em1: numeric. The emergency value for the 1st derivative. -1, 0 or +1.
- em2: numeric. The emergency value for the 2nd derivative: -1, 0 or +1.
- em: numeric. The emergency value: from -4 to +4, based on em1 and em2. See Details.
- em_status: numeric. Emerging statuses, an integer between 0 and 3. See Details.
- growth: numeric. Lower limit of GAM confidence interval for the first derivative, if positive. It represents the lower guaranteed growth.
first_derivative: df. Data.frame with details of first derivatives. It contains the following columns:
- smooth: smooth identifier. Example: s(year).
- derivative: numeric. Value of first derivative.
- se: numeric. Standard error of derivative.
- crit: numeric. Critical value required such that derivative + (se * crit) and derivative - (se * crit) form the upper and lower bounds of the confidence interval on the first derivative of the estimated smooth at the specific confidence level. In our case the confidence level is hard-coded: 0.8. Then crit <- qnorm(p = (1-0.8)/2, mean = 0, sd = 1, lower.tail = FALSE).
- lower_ci: numeric. Lower bound of the confidence interval of the estimated smooth.
- upper_ci: numeric. Upper bound of the confidence interval of the estimated smooth.
- value of argument year: column with temporal values.
- value of argument baseline_var: column with the fitted values for the baseline. If baseline_var is NULL, this column is not present.
second_derivative: df. Data.frame with details of second derivatives. Same columns as first_derivatives.
plot: a ggplot2 object. Plot of observations with GAM output and emerging status. If emerging status cannot be assessed only observations are plotted.

Examples

library(dplyr)
df_gam <- tibble(
  taxonKey = rep(3003709, 24),
  canonicalName = rep("Rosa glauca", 24),
  year = seq(1995, 2018),
  n = c(
    1, 1, 0, 0, 0, 2, 0, 0, 1, 3, 1, 2, 0, 5, 0, 5, 4, 2, 1,
    1, 3, 3, 8, 10
  ),
  n_class = c(
    229, 555, 1116, 939, 919, 853, 442, 532, 623, 1178, 732, 371, 1053,
    1001, 1550, 1142, 1076, 1310, 922, 1773, 1637,
    1866, 2234, 2013
  )
)
# apply GAM to n without baseline as covariate
apply_gam(df_gam,
  y_var = "n",
  eval_years = 2018,
  taxon_key = 3003709,
  name = "Rosa glauca",
  verbose = TRUE
)
# apply GAM using baseline data in column n_class as covariate
apply_gam(df_gam,
  y_var = "n",
  eval_years = 2018,
  baseline_var = "n_class",
  taxon_key = 3003709,
  name = "Rosa glauca",
  verbose = TRUE
)
# apply GAM using n as occupancy values, evaluate on two years. No baseline
apply_gam(df_gam,
  y_var = "n",
  eval_years = c(2017, 2018),
  taxon_key = 3003709,
  type_indicator = "occupancy",
  name = "Rosa glauca",
  y_label = "occupancy",
  verbose = TRUE
)
# apply GAM using n as occupancy values and n_class as covariate (baseline)
apply_gam(df_gam,
  y_var = "n",
  eval_years = c(2017, 2018),
  baseline_var = "n_class",
  taxon_key = 3003709,
  type_indicator = "occupancy",
  name = "Rosa glauca",
  y_label = "occupancy",
  verbose = TRUE
)
# How to use other arguments
apply_gam(df_gam,
  y_var = "n",
  eval_years = c(2017, 2018),
  baseline_var = "n_class",
  p_max = 0.3,
  taxon_key = "3003709",
  type_indicator = "occupancy",
  name = "Rosa glauca",
  df_title = "Belgium",
  x_label = "time (years)",
  y_label = "area of occupancy (km2)",
  saveplot = TRUE,
  dir_name = "./data/",
  verbose = TRUE
)
# warning returned if GAM cannot be applied and plot with only observations
df_gam <- tibble(
taxonKey = rep(3003709, 24),
canonicalName = rep("Rosa glauca", 24),
year = seq(1995, 2018),
obs = c(
  1, 1, 0, 0, 0, 2, 0, 0, 1, 3, 1, 2, 0, 5, 0, 5, 4, 2, 1,
  1, 3, 3, 8, 10
),
cobs = rep(0, 24)
)

# if GAM cannot be applied a warning is returned and the plot mention it
## Not run: 
no_gam_applied <- apply_gam(df_gam,
                            y_var = "obs",
                            eval_years = 2018,
                            taxon_key = 3003709,
                            name = "Rosa glauca",
                            baseline_var = "cobs",
                            verbose = TRUE
)
no_gam_applied$plot

## End(Not run)

trias-project/trias documentation built on April 5, 2025, 11:26 p.m.

trias-project/trias index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

trias-project/trias
Process Data for the Project Tracking Invasive Alien Species (TrIAS)

apply_gam: Apply GAM to time series and assess emerging status
In trias-project/trias: Process Data for the Project Tracking Invasive Alien Species (TrIAS)

Apply GAM to time series and assess emerging status

Description

Usage

Arguments

Details

Value

Examples

Related to apply_gam in trias-project/trias...

R Package Documentation

Browse R Packages

We want your feedback!

trias-project/trias Process Data for the Project Tracking Invasive Alien Species (TrIAS)

apply_gam: Apply GAM to time series and assess emerging status In trias-project/trias: Process Data for the Project Tracking Invasive Alien Species (TrIAS)

Apply GAM to time series and assess emerging status

Description

Usage

Arguments

Details

Value

Examples

Related to apply_gam in trias-project/trias...

R Package Documentation

Browse R Packages

We want your feedback!

trias-project/trias
Process Data for the Project Tracking Invasive Alien Species (TrIAS)

apply_gam: Apply GAM to time series and assess emerging status
In trias-project/trias: Process Data for the Project Tracking Invasive Alien Species (TrIAS)