apply_gam | R Documentation |
This function applies generalized additive models (GAM) to assess emerging status for a certain time window.
apply_gam(
df,
y_var,
eval_years,
year = "year",
taxonKey = "taxonKey",
type_indicator = "observations",
baseline_var = NULL,
p_max = 0.1,
taxon_key = NULL,
name = NULL,
df_title = NULL,
x_label = "year",
y_label = "Observations",
saveplot = FALSE,
dir_name = NULL,
width = NULL,
height = NULL,
verbose = FALSE
)
df |
df. A dataframe containing temporal data. |
y_var |
character. Name of column containing variable to model. It has
to be passed as string, e.g. |
eval_years |
numeric. Temporal value(s) when emerging status has to be evaluated. |
year |
character. Name of column containing temporal values. It has to
be passed as string, e.g. |
taxonKey |
character. Name of column containing taxon IDs. It has to be
passed as string, e.g. |
type_indicator |
character. One of |
baseline_var |
character. Name of the column containing values to use as
additional covariate. Such covariate is introduced in the model to correct
research effort bias. Default: |
p_max |
numeric. A value between 0 and 1. Default: 0.1. |
taxon_key |
numeric, character. Taxon key the timeseries belongs to.
Used exclusively in graph title and filename (if |
name |
character. Species name the timeseries belongs to. Used
exclusively in graph title and filename (if |
df_title |
character. Any string you would like to add to graph titles
and filenames (if |
x_label |
character. x-axis label of output plot. Default: |
y_label |
character. y-axis label of output plot. Default: |
saveplot |
logical. If |
dir_name |
character. Path of directory where saving plots. If path
doesn't exists, directory will be created. Example: "./output/graphs/". If
|
width |
numeric. Plot width in pixels. Values are passed to
ggsave. Ignored if |
height |
numeric. Plot height in pixels. Values are passed to
ggsave. Ignored if |
verbose |
logical. If |
The GAM modelling is performed using the mgcvb::gam()
. To use this
function, we pass:
a formula
a family object specifying the distribution
a smoothing parameter estimation method
For more information about all other arguments, see [mgcv::gam()]
.
If no covariate is used (baseline_var
= NULL), the GAM formula is: n ~ s(year, k = maxk, m = 3, bs = "tp")
. Otherwise the GAM formula has a
second term, s(n_covariate)
and so the GAM formula is n ~ s(year, k = maxk, m = 3, bs = "tp") + s(n_covariate)
.
Description of the parameters present in the formula above:
k
: dimension of the basis used to represent the smooth term, i.e. the
number of knots used for calculating the smoother. We #' set k
to
maxk
, which is the number of decades in the time series. If less than 5
decades are present in the data, maxk
is #' set to 5.
bs
indicates the basis to use for the smoothing: we uses the default
penalized thin plate regression splines.
m
specifies the order of the derivatives in the thin plate spline
penalty. We use m = 3
, the default value.
We use [mgcv::nb()]
, a negative binomial family to perform the GAM.
The smoothing parameter estimation method is set to REML (Restricted
maximum likelihood approach). If the P-value of the GAM smoother(s) is/are
above threshold value p_max
, GAM is not performed and the next warning is
returned: "GAM output cannot be used: p-values of all GAM smoothers are
above {p_max}" where p_max
is the P-value used as threshold as defined
by argument p_max
.
If the mgcv::gam()
returns an error or a warning, the following message
is returned to the user: "GAM ({method_em}) cannot be performed or cannot
converge.", where method_em
is one of "basic"
or "correct_baseline"
.
See argument baseline_var
.
The first and second derivatives of the smoother is calculated using
function gratia::derivatives()
with the following hard coded arguments:
type
: the type of finite difference used. Set to "central"
.
order
: 1 for the first derivative, 2 for the second derivative
level
: the confidence level. Set to 0.8
eps
: the finite difference. Set to 1e-4.
For more details, please check derivatives.
The sign of the lower and upper confidence levels of the first and second
derivatives are used to define a detailed emergency status (em
) which is
internally used to return the emergency status, em_status
, which is a
column of the returned data.frame em_summary
.
ucl-1 | lcl-1 | ucl-2 | lcl-2 | em | em_status |
+ | + | + | + | 4 | 3 (emerging) |
+ | + | + | - | 3 | 3 (emerging) |
+ | + | - | - | 2 | 2 (potentially emerging) |
- | + | + | + | 1 | 2 (potentially emerging) |
+ | - | + | - | 0 | 1 (unclear) |
+ | - | - | - | -1 | 0 (not emerging) |
- | - | + | + | -2 | 0 (not emerging) |
- | - | + | - | -3 | 0 (not emerging) |
- | - | - | - | -4 | 0 (not emerging) |
list with six slots:
em_summary
: df. A data.frame summarizing the emerging status
outputs. em_summary
contains as many rows as the length of input variable
eval_year
. So, if you evaluate GAM on three years, em_summary
will
contain three rows. It contains the following columns:
"taxonKey"
: column containing taxon ID. Column name equal to value of
argument taxonKey
.
"year"
: column containing temporal values. Column name equal
to value of argument year
. Column itself is equal to value of
argument eval_years
. So, if you evaluate GAM on years 2017, 2018
(eval_years = c(2017, 2018)
), you will get these two values in this
column.
em_status
: numeric. Emerging statuses, an integer
between 0 and 3.
growth
: numeric. Lower limit of GAM confidence interval for the first
derivative, if positive. It represents the lower guaranteed growth.
method
: character. GAM method, One of: "correct_baseline"
and
"basic"
. See details above in description of argument use_baseline
.
model
: gam object. The model as returned by gam()
function.
NULL
if GAM cannot be applied.
output
: df. Complete data.frame containing more details than the
summary em_summary
. It contains the following columns:
all columns in df
.
method
: character. GAM method, One of: "correct_baseline"
and
"basic"
. See details above in description of argument use_baseline
.
fit
: numeric. Fit values.
ucl
: numeric. The upper confidence level values.
lcl
: numeric. The lower confidence level values.
em1
: numeric. The emergency value for the 1st derivative. -1, 0 or +1.
em2
: numeric. The emergency value for the 2nd derivative: -1, 0 or +1.
em
: numeric. The emergency value: from -4 to +4, based on em1
and
em2
. See Details.
em_status
: numeric. Emerging statuses, an integer
between 0 and 3. See Details.
growth
: numeric. Lower limit of GAM confidence interval for the first
derivative, if positive. It represents the lower guaranteed growth.
first_derivative
: df. Data.frame with details of first derivatives.
It contains the following columns:
smooth
: smooth identifier. Example: s(year)
.
derivative
: numeric. Value of first derivative.
se
: numeric. Standard error of derivative
.
crit
: numeric. Critical value required such that
derivative + (se * crit)
and derivative - (se * crit)
form
the upper and lower bounds of the confidence interval on the first
derivative of the estimated smooth at the specific confidence level. In our
case the confidence level is hard-coded: 0.8.
Then crit <- qnorm(p = (1-0.8)/2, mean = 0, sd = 1, lower.tail = FALSE)
.
lower_ci
: numeric. Lower bound of the confidence interval of the
estimated smooth.
upper_ci
: numeric. Upper bound of the
confidence interval of the estimated smooth.
value of argument year
: column with temporal values.
value of argument baseline_var
: column with the fitted values for the
baseline. If baseline_var
is NULL
, this column is not present.
second_derivative
: df. Data.frame with details of second
derivatives. Same columns as first_derivatives
.
plot
: a ggplot2 object. Plot of observations with GAM output and
emerging status. If emerging status cannot be assessed only observations are
plotted.
library(dplyr)
df_gam <- tibble(
taxonKey = rep(3003709, 24),
canonicalName = rep("Rosa glauca", 24),
year = seq(1995, 2018),
n = c(
1, 1, 0, 0, 0, 2, 0, 0, 1, 3, 1, 2, 0, 5, 0, 5, 4, 2, 1,
1, 3, 3, 8, 10
),
n_class = c(
229, 555, 1116, 939, 919, 853, 442, 532, 623, 1178, 732, 371, 1053,
1001, 1550, 1142, 1076, 1310, 922, 1773, 1637,
1866, 2234, 2013
)
)
# apply GAM to n without baseline as covariate
apply_gam(df_gam,
y_var = "n",
eval_years = 2018,
taxon_key = 3003709,
name = "Rosa glauca",
verbose = TRUE
)
# apply GAM using baseline data in column n_class as covariate
apply_gam(df_gam,
y_var = "n",
eval_years = 2018,
baseline_var = "n_class",
taxon_key = 3003709,
name = "Rosa glauca",
verbose = TRUE
)
# apply GAM using n as occupancy values, evaluate on two years. No baseline
apply_gam(df_gam,
y_var = "n",
eval_years = c(2017, 2018),
taxon_key = 3003709,
type_indicator = "occupancy",
name = "Rosa glauca",
y_label = "occupancy",
verbose = TRUE
)
# apply GAM using n as occupancy values and n_class as covariate (baseline)
apply_gam(df_gam,
y_var = "n",
eval_years = c(2017, 2018),
baseline_var = "n_class",
taxon_key = 3003709,
type_indicator = "occupancy",
name = "Rosa glauca",
y_label = "occupancy",
verbose = TRUE
)
# How to use other arguments
apply_gam(df_gam,
y_var = "n",
eval_years = c(2017, 2018),
baseline_var = "n_class",
p_max = 0.3,
taxon_key = "3003709",
type_indicator = "occupancy",
name = "Rosa glauca",
df_title = "Belgium",
x_label = "time (years)",
y_label = "area of occupancy (km2)",
saveplot = TRUE,
dir_name = "./data/",
verbose = TRUE
)
# warning returned if GAM cannot be applied and plot with only observations
df_gam <- tibble(
taxonKey = rep(3003709, 24),
canonicalName = rep("Rosa glauca", 24),
year = seq(1995, 2018),
obs = c(
1, 1, 0, 0, 0, 2, 0, 0, 1, 3, 1, 2, 0, 5, 0, 5, 4, 2, 1,
1, 3, 3, 8, 10
),
cobs = rep(0, 24)
)
# if GAM cannot be applied a warning is returned and the plot mention it
## Not run:
no_gam_applied <- apply_gam(df_gam,
y_var = "obs",
eval_years = 2018,
taxon_key = 3003709,
name = "Rosa glauca",
baseline_var = "cobs",
verbose = TRUE
)
no_gam_applied$plot
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.