knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) library(portalcasting)
The portalcasting package provides the ability to add functions both to a local copy of the repository for testing as well as to contribute to the base set of models provided within the package (and thus executed in the main repository). Similarly, users may often want or need to analyze the data in a slightly different configuration than what is already available. Here, we walk through the steps to add user-defined models and data sets to the directory for local use or for integration within the production pipeline.
For the purposes here, consider that you are interested in adding a model named "newmod" to the forecasting suite. While "newmod" can work on the existing data, you are are really interested in a slightly different configuration (perhaps only the long-term kangaroo rat exclosures) of the data you call "newdata".
Here, we assume the user has already run through a basic installation, set up, and evaluation of the package, as covered in the Getting Started vignette.
Unlike previous versions of portalcasting, starting with v0.9.0, there are very few formal requirements for the output of a model. The requirements have been further relaxed starting with v0.51.0, where we now leverage the model controls lists to track and document most functionality.
The forecasting pipeline will work with basically any univariate model fitting function that will run in R, and produce an object that is then capable of being processed by the forecasting function to forecast the model's predictions, so long as they can be processed via process_model_output()
, which only requires mean
, lower
, and upper
elements. The vast majority of the information saved about the forecasts is located in the metadata files and controls lists, resulting in the model itself not needing to produce much specific output to be valid.
Models can be based on either pre-existing functions (e.g., the AutoArima
model uses auto.arima
from the forecasting package) or specialized functions (such as those designed for the runjags
models in portalcasting, which are collated into fit_runjags
). Each model-dataset-species combination is run in cast()
wrapped in a tryCatch()
call, which softens any errors in implementation. The cast()
call runs a do.call
implementation of the fit and forecast functions from the model controls list and then passes the output of the two functions to process_model_output
.
To allow for a more flexible environment, here we use the setup_sandbox()
function to make and fill our directory, which we house at "~/sandbox"
:
library(portalcasting) main <- "~/sandbox" setup_sandbox(main = main)
The controls list for the standard, prefabricated ("prefab") models that come with the portalcasting package is automatically added to the directory's models
sub folder, as are the model script files for the runjags
models.
Similarly, the prefabricated data sets that come with the portalcasting package are automatically added to the data
subdirectory.
A new model is added to the pipeline via (at least) additional model controls (with additional files if needed). A starting template for model controls is provided in the portalcasting core files and accessible via model_controls_template()
:
model_controls_template()
One can create custom controls using a suite of new_model_<>
functions, each of which wraps a call to a specific component of the model_controls_template
inside an update_list()
call:
new_model_controls() new_model_metadata() new_model_metadata(name = "newmod") new_model_controls(metadata = new_model_metadata(name = "newmod", print_name = "New Model"))
At the very least, each new model will need a fitting function, which can be provided using the new_model_fit()
, with elements for the function and arguments:
new_model_fit(fun = "arima", args = list(x = "abundance")) new_model_controls(metadata = new_model_metadata(name = "newmod", print_name = "New Model"), fit = new_model_fit(fun = "arima", args = list(x = "abundance")))
Because the arima
function already has a defined forecast
method, we do not need to update the cast
element of the model controls, as it is already pre-loaded:
new_model_forecast()
By default, any new model is set to run for the three prefab datasets (all, exclosures, controls), for each relevant species (e.g, not including the kangaroo rats in the exclosures):
new_model_datasets()
If the specific model needs to use interpolated data, the new_model_interpolate()
function should be given an argument needed = TRUE
, and then a function input, which allows for invoking specialized functions. A simple example that is used for the prefab models is round_na.interp()
, which would be implemented as:
new_model_interpolate(needed = TRUE, fun = "round_na.interp")
which would just be added as the interpolate
element in new_model_controls
.
The default, however, is needed = FALSE
:
new_model_interpolate()
The response distribution for a model is a key aspect to scoring it properly. We store information about the response in the list for evaluation purposes:
new_model_controls()$response
The options for each component are as follows:
link
: normal, negative_binomial, poisson
type
: distribution, empirical
scoring_family
: normal, nbinom, poisson, sample
new_model_response(link = "normal", type = "distribution", scoring_family = "normal")
The specific updates can be called together to generate the model controls list for newmod:
new_controls <- new_model_controls(metadata = new_model_metadata(name = "newmod", print_name = "New Model"), fit = new_model_fit(fun = "arima", args = list(x = "abundance")), response = new_model_response(link = "normal", type = "distribution", scoring_family = "normal"))
Given that the directory is already established, one can use the add_new_model()
function to add the model to the directory at main, via the controls list:
new_controls <- new_model_controls(metadata = new_model_metadata(name = "newmod", print_name = "New Model"), fit = new_model_fit(fun = "arima", args = list(x = "abundance")), response = new_model_response(link = "normal", type = "distribution", scoring_family = "normal")) added <- add_new_model(main = main, new_model_controls = new_controls) names(read_models_controls(main = main)) #> [1] "AutoArima" "sAutoArima" "ESSS" #> [4] "NaiveArima" "sNaiveArima" "nbGARCH" #> [7] "nbsGARCH" "pGARCH" "psGARCH" #> [10] "pevGARCH" "jags_RW" "jags_logistic" #> [13] "jags_logistic_covariates" "jags_logistic_competition" "jags_logistic_competition_covariates" #> [16] "newmod"
And the model is directly ready to be portalcast
ed:
portalcast(main = main, models = "newmod", datasets = "all", species = c("DM", "PP", "total")) #> ------------------------------------------------------------ #> Forecasting models... #> ------------------------------------------------------------ #> This is portalcasting v0.51.0 #> ------------------------------------------------------------ #> #> - newmod for all DM #> |++++| successful |++++| #> - newmod for all PP #> |++++| successful |++++| #> - newmod for all total #> |++++| successful |++++| #> ------------------------------------------------------------ #> ...forecasting complete. #> ------------------------------------------------------------
setup
StageIncorporating the model in the establishment of the directory requires adding the controls list to the initial setup_<>
call:
main2 <- "~/sandbox2" setup_sandbox(main = main2, new_models_controls = list(newmod = new_controls), models = c(prefab_models(), "newmod"))
A new dataset is added to the pipeline much in the same way as a new model, but with only a single generating function and fewer elements in the controls list:
dataset_controls_template()
One can create custom controls using a suite of new_data_<>
functions, each of which wraps a call to a specific component of the data_controls_template
inside an update_list()
call:
new_dataset_controls() new_dataset_metadata() new_dataset_metadata(name = "newdata") new_dataset_controls(metadata = new_dataset_metadata(name = "newdata"))
All of the existing datasets use the same generating function prepare_dataset()
, which is quite flexible and ports arguments directly to the generalized summarize_rodent_data()
from the portalr package. We therefore include this function as the default in new dataset controls:
new_dataset_fun()
although that can be changed to whatever generating function a user may want to implement.
The arguments to the function can be updated via the new_dataset_args()
function:
new_dataset_args(name = "newdata")
These can all be wrapped up together in a call to new_dataset_controls()
to create the controls list that is then passed into add_new_dataset()
:
new_controls <- new_dataset_controls(metadata = new_dataset_metadata(name = "newdata"), args = new_dataset_args(name = "newdata", filename = "rodents_newdata.csv"))
Given that the directory is already established, one can use the add_new_dataset()
function to add the dataset controls to the directory at main, via the controls list, noting which existing models should have the new dataset added to their controls list:
new_controls <- new_dataset_controls(metadata = new_dataset_metadata(name = "newdata"), args = new_dataset_args(name = "newdata", filename = "rodents_newdata.csv")) added <- add_new_dataset(main = main, new_dataset_controls = new_controls, models = "AutoArima")) names(read_datasets_controls(main = main)) #> [1] "all" "controls" "exclosures" "newdata"
And the dataset can then be forecast with:
portalcast(main = main, models = "AutoArima", datasets = "newdata", species = c("DM", "PP", "total")) #> ------------------------------------------------------------ #> Forecasting models... #> ------------------------------------------------------------ #> This is portalcasting v0.51.0 #> ------------------------------------------------------------ #> #> - AutoArima for newdata DM #> |++++| successful |++++| #> - AutoArima for newdata PP #> |++++| successful |++++| #> - AutoArima for newdata total #> |++++| successful |++++| #> ------------------------------------------------------------ #> ...forecasting complete. #> ------------------------------------------------------------
setup
StageIncorporating the dataset in the establishment of the directory requires adding the controls list to the initial setup_<>
call:
main3 <- "~/sandbox3" setup_sandbox(main = main3, new_datasets_controls = list(newdata = new_controls), datasets = c(prefab_datasets(), "newdata"))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.