as_forecast | R Documentation |
forecast
objectProcess and validate a data.frame (or similar) or similar with forecasts
and observations. If the input passes all input checks, it will be converted
to a forecast
object. The class of that object depends on the forecast
type of the input. See the details section below for more information
on the expected input formats.
as_forecast()
gives users some control over how their data is parsed.
Using the arguments observed
, predicted
, and model
, users can rename
existing columns of their input data to match the required columns for a
forecast object. Using the argument forecast_unit
, users can specify the
the columns that uniquely identify a single forecast (and remove the others,
see set_forecast_unit()
for details).
as_forecast(data, ...)
## Default S3 method:
as_forecast(
data,
forecast_unit = NULL,
forecast_type = NULL,
observed = NULL,
predicted = NULL,
model = NULL,
quantile_level = NULL,
sample_id = NULL,
...
)
data |
A data.frame (or similar) with predicted and observed values.
See the details section of |
... |
Additional arguments |
forecast_unit |
(optional) Name of the columns in |
forecast_type |
(optional) The forecast type you expect the forecasts
to have. If the forecast type as determined by |
observed |
(optional) Name of the column in |
predicted |
(optional) Name of the column in |
model |
(optional) Name of the column in |
quantile_level |
(optional) Name of the column in |
sample_id |
(optional) Name of the column in |
Depending on the forecast type, an object of the following class will be returned:
forecast_binary
for binary forecasts
forecast_point
for point forecasts
forecast_sample
for sample-based forecasts
forecast_quantile
for quantile-based forecasts
Various different forecast types / forecast formats are supported. At the moment, those are:
point forecasts
binary forecasts ("soft binary classification")
Probabilistic forecasts in a quantile-based format (a forecast is represented as a set of predictive quantiles)
Probabilistic forecasts in a sample-based format (a forecast is represented as a set of predictive samples)
Forecast types are determined based on the columns present in the input data. Here is an overview of the required format for each forecast type:
All forecast types require a data.frame or similar with columns observed
predicted
, and model
.
Point forecasts require a column observed
of type numeric and a column
predicted
of type numeric.
Binary forecasts require a column observed
of type factor with exactly
two levels and a column predicted
of type numeric with probabilities,
corresponding to the probability that observed
is equal to the second
factor level. See details here for more information.
Quantile-based forecasts require a column observed
of type numeric,
a column predicted
of type numeric, and a column quantile_level
of type
numeric with quantile-levels (between 0 and 1).
Sample-based forecasts require a column observed
of type numeric,
a column predicted
of type numeric, and a column sample_id
of type
numeric with sample indices.
For more information see the vignettes and the example data
(example_quantile, example_sample_continuous, example_sample_discrete,
example_point()
, and example_binary).
In order to score forecasts, scoringutils
needs to know which of the rows
of the data belong together and jointly form a single forecasts. This is
easy e.g. for point forecast, where there is one row per forecast. For
quantile or sample-based forecasts, however, there are multiple rows that
belong to single forecast.
The forecast unit or unit of a single forecast is then described by the
combination of columns that uniquely identify a single forecast.
For example, we could have forecasts made by different models in various
locations at different time points, each for several weeks into the future.
The forecast unit could then be described as
forecast_unit = c("model", "location", "forecast_date", "forecast_horizon")
.
scoringutils
automatically tries to determine the unit of a single
forecast. It uses all existing columns for this, which means that no columns
must be present that are unrelated to the forecast unit. As a very simplistic
example, if you had an additional row, "even", that is one if the row number
is even and zero otherwise, then this would mess up scoring as scoringutils
then thinks that this column was relevant in defining the forecast unit.
In order to avoid issues, we recommend setting the forecast unit explicitly,
usually through the forecast_unit
argument in as_forecast()
. This will
drop unneeded columns, while making sure that all
necessary, 'protected columns' like "predicted" or "observed" are retained.
as_forecast(example_binary)
as_forecast(
example_quantile,
forecast_unit = c("model", "target_type", "target_end_date",
"horizon", "location")
)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.