| ind_init | R Documentation |
ind_init combines the time vector and the indicator (IND) and pressure data into
one tibble with defined training and test observations. All INDs are combined
with all pressures provided as input.
ind_init(ind_tbl, press_tbl, time, train = 0.9, random = FALSE)
ind_tbl |
A data frame, matrix or tibble containing only the (numeric) IND variables. Single indicators should be coerced into a data frame to keep the indicator name. If kept as vector, default name will be 'ind'. |
press_tbl |
A data frame, matrix or tibble containing only the (numeric) pressure variables. Single pressures should be coerced into a data frame to keep the pressure name. If kept as vector, default name will be 'press'. |
time |
A vector containing the actual time steps (e.g. years; should be the same as in the IND and pressure data). |
train |
The proportion of observations that should go into the training data on which the GAMs are later fitted. Has to be a numeric value between 0 and 1; the default is 0.9. |
random |
logical; should the observations for the training data be randomly chosen? Default is FALSE, so that the last time units (years) are chosen as test data. |
ind_init will combine every column in ind_tbl with every column in press_tbl
so that each row will represent one IND~press combination. The input data will be
split into a training and a test data set. The returned tibble is the basis for all
IND~pressure modeling functions.
If not all IND~pressure combinations should be modeled,
the respective rows can simply be removed from the output tibble or ind_init is
applied multiple times on data subsets and their output tibbles merged later using
e.g. bind_rows.
The function returns a tibble, which is a trimmed down version of
the data.frame(), including the following elements:
idNumerical IDs for the IND~press combinations.
indIndicator names.These might be modified to exclude any character, which is not in the model formula (e.g. hyphens, brackets, etc. are replaced by an underscore, variables starting with a number will get an x before the number.
pressPressure names.These might be modified to exclude any character, which is not in the model formula (e.g. hyphens, brackets, etc. are replaced by an underscore, variables starting with a number will get an x before the number.
ind_trainA list-column with indicator values of the training data.
press_trainA list-column with pressure values of the training data.
time_trainA list-column with the time steps of the training data.
ind_testA list-column with indicator values of the test data.
press_testA list-column with pressure values of the test data.
time_testA list-column with the time steps of the test data.
train_nalogical; indicates the joint missing values in the training IND and pressure data. That includes the original NAs as well as randomly selected test observations that are within the training period. This vector is needed later for the determination of temporal autocorrelation.
tibble and the vignette("tibble") for more
informations on tibbles
Other IND~pressure modeling functions:
find_id(),
model_gam(),
model_gamm(),
plot_diagnostics(),
plot_model(),
scoring(),
select_model(),
test_interaction()
# Using the Baltic Sea demo data in this package
press_tbl <- press_ex[ ,-1] # excl. Year
ind_tbl <- ind_ex[ ,-1] # excl. Year
time <- ind_ex[ ,1]
# Assign randomly 50% of the observations as training data and
# the other 50% as test data
ind_init(ind_tbl, press_tbl, time, train = 0.5, random = TRUE)
# To keep the name when testing only one indicator and pressure, coerce both vectors
# data frames
ind_init(ind_tbl = data.frame(MS = ind_tbl$MS), press_tbl = data.frame(Tsum = press_tbl$Tsum),
time, train = .5, random = TRUE)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.