Description Usage Arguments Value
View source: R/model_validation.R
This function takes a few more arguments than 'epidemiar::run_epidemia()' to generate statistics on model validation. The function will evaluate a number of weeks ('total_timesteps') starting from a specified week ('date_start') and will look at the n-week ahead forecast (1 to 'timesteps_ahead' number of weeks) and compare the values to the observed number of cases. An optional 'reporting_lag' argument will censor the last known data back that number of weeks. The validation statistics include Root Mean Squared Error (RMSE) and Mean Absolute Error (MAE), and an R-squared staistic both in total and per geographic grouping (if present).
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | run_validation(
date_start = NULL,
total_timesteps = 26,
timesteps_ahead = 2,
reporting_lag = 0,
per_timesteps = 12,
skill_test = TRUE,
epi_data = NULL,
env_data = NULL,
env_ref_data = NULL,
env_info = NULL,
casefield = NULL,
groupfield = NULL,
populationfield = NULL,
obsfield = NULL,
valuefield = NULL,
fc_model_family = NULL,
report_settings = NULL,
...
)
|
date_start |
Date to start testing for model validation. |
total_timesteps |
Number of weeks from (but including) 'week_start' to run validation tests. |
timesteps_ahead |
Number of weeks for testing the n-week ahead forecasts. Results will be generated from 1-week ahead through 'weeks_ahead' number of weeks. |
reporting_lag |
Number of timesteps to simulate reporting lag. For instance, if you have weekly data, and a reporting_lag of 1 week, and are working with a timesteps_ahead of 1 week, then that is functional equivalent to reporting lag of 0, and timesteps_ahead of 2 weeks. I.e. You are forecasting next week, but you don't know this week's data yet, you only know last week's numbers. |
per_timesteps |
When creating a timeseries of validation results, create a moving window with per_timesteps width number of time points. Should be a minimum of 10 timesteps. In beta-testing. |
skill_test |
Logical parameter indicating whether or not to run validations also on two naïve models for a skill test comparison. The naïve models are "persistence": the last known value (case counts) carried forward, and "average week" where the predicted value is the average of that week of the year, as calculated from historical data. |
epi_data |
Epidemiological data with case numbers per week, with date field "obs_date". |
env_data |
Daily environmental data for the same groupfields and date
range as the epidemiological data. It may contain extra data (other
districts or date ranges). The data must be in long format (one row for each
date and environmental variable combination), and must start at absolutel
minimum |
env_ref_data |
Historical averages by week of year for environmental variables. Used in extended environmental data into the future for long forecast time, to calculate anomalies in early detection period, and to display on timeseries in reports. |
env_info |
Lookup table for environmental data - reference creation method (e.g. sum or mean), report labels, etc. |
casefield |
The column name of the field that contains disease case counts (unquoted field name). |
groupfield |
The column name of the field for district or geographic area unit division names of epidemiological AND environmental data (unquoted field name). If there are no groupings (all one area), user should give a field that contains the same value throughout. |
populationfield |
Column name of the optional population field to give
population numbers over time (unquoted field name). Used to calculated
incidence if |
obsfield |
Field name of the environmental data variables (unquoted field name). |
valuefield |
Field name of the value of the environmental data variable observations (unquoted field name). |
fc_model_family |
The |
report_settings |
This is a named list of all the report, forecasting, event detection and other settings. All of these have defaults, but they are not likely the defaults needed for your system, so each of these should be reviewed:
|
... |
Accepts other arguments that may normally part of 'run_epidemia()', but ignored for validation runs. |
Returns a nested list of validation results. Statistics are calculated on the n-week ahead forecast and the actual observed case counts. Statistics returned are Mean Absolute Error (MAE), Root Mean Squared Error (RMSE). The first object is 'skill_scores', which contains 'skill_overall' and 'skill_grouping'. The second list is 'validations', which contains lists per model run (the forecast model and then optionally the naive models). Within each, 'validation_overall' is the results overall, 'validation_grouping' is the results per geographic grouping, and 'validation_perweek' is the raw stats per week. Lastly, a 'metadata' list contains the important parameter settings used to run validation and when the results where generated.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.