Data Requirements
In nixtlar: A Software Development Kit for 'Nixtla''s 'TimeGPT'

library(httptest2)
.mockPaths("../tests/mocks")
start_vignette(dir = "../tests/mocks")

original_options <- options("NIXTLA_API_KEY"="dummy_api_key", digits=7)

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>", 
  fig.width = 7, 
  fig.height = 4
)

library(nixtlar)

This vignette explains the data requirements for using any of the core functions of nixtlar:

# Core functions of `nixtlar` 
- nixtlar::nixtla_client_forecast()
- nixtlar::nixtla_client_historic()
- nixtlar::nixtla_client_detect_anomalies()
- nixtlar::nixtla_client_cross_validation()
- nixtlar::nixtla_client_plot()

1. Input Requirements

nixtlar now supports the following data structures: data frames, tibbles, and tsibbles. The output format will always be a data frame.

Regardless of your data structure, the following two columns must always be included when using any core functions of nixtlar:

Date Column: This column must contain timestamps formatted as YYYY-MM-DD or YYYY-MM-DD hh:mm:ss, either as characters or date-time objects. For date-time objects, we recommend using the as.POSIX* functions from base R, although as.Date is also supported. The default name for this column is ds. If your dataset uses a different name, please specify it by setting the parameter time_col="your_time_column_name".
Target Column: This column should contain the numeric target variable for forecasting. The default name for this column is y. If your dataset uses a different name, specify it by setting the parameter target_col="your_target_column_name".

2. Multiple Series

If you are working with multiple series, you must include a column with a unique identifier for each series. This column can contain characters or integers, and its default name is unique_id. If your dataset uses a different name for the identifier column, please specify it by setting the parameter id_col="your_id_column_name". If your dataset contains only one series and does not need an identifier, set id_col to NULL.

Please be aware that in earlier versions of nixtlar, the default name for id_col was NULL, but it is now unique_id.

# sample valid input 
df <- nixtlar::electricity
head(df)
str(df)

3. Exogenous Variables

When using exogenous variables, nixtlar distinguishes between historical and future exogenous variables:

Historical Exogenous Variables: These should be included in the input data immediately following the id_col, ds, and y columns. If your dataset contains additional columns that are not exogenous variables, you must remove them before using any core functions of nixtlar.
Future Exogenous Variables: These correspond to the X_df parameter and should cover the entire forecast horizon. This dataset must include columns with the appropriate timestamps and, if applicable, unique identifiers, formatted as described in the previous sections.

# sample valid input with exogenous variables 
df <- nixtlar::electricity_exo_vars
head(df)

future_exo_vars <- nixtlar::electricity_future_exo_vars
head(future_exo_vars)

To learn more about how to use exogenous variables, please refer to the Exogenous variables vignette.

4. Missing values

When using TimeGPT via nixtlar, ensure the following:

No Missing Values in the Target Column: The target column must not contain any missing values (NA).
Continuous Date Sequence: The dates must be continuous, without any gaps, from the start date to the end date, matching the frequency of the data.

Currently, nixtlar does not provide any functionality to fill missing values or dates. To learn more about this, please refer to the vignette on Special Topics.

5. Minimum data requirements

The minimum size per series to obtain results from nixtlar::nixtla_client_forecast is one, regardless of the frequency of the data. Keep in mind, however, that this will produce results with limited accuracy.

For certain scenarios, more than one observation may be necessary:

When using the parameters level, quantiles, or finetune_steps.
When incorporating exogenous variables.
When including historical forecasts by setting add_history=TRUE.

The minimum data requirement varies with the frequency of the data, detailed in the official TimeGPT documentation.

When using nixtlar::nixtla_client_cross_validation, you also need to consider the forecast horizon (h), the number of windows (n_windows) and the step size (step_size). The formula for the minimum data points required per series is:

\begin{equation} \text{Min per series} = \text{Min per frequency}+h+\text{step_size}*(\text{n_windows}-1) \end{equation}

Here, $\text{Min per frequency}$ refers to the values specified in the table from the official documentation.

options(original_options)
end_vignette()

Any scripts or data that you put into this service are public.

nixtlar documentation built on Oct. 30, 2024, 5:07 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

nixtlar
A Software Development Kit for 'Nixtla''s 'TimeGPT'

Data Requirements
In nixtlar: A Software Development Kit for 'Nixtla''s 'TimeGPT'

1. Input Requirements

2. Multiple Series

3. Exogenous Variables

4. Missing values

5. Minimum data requirements

Try the nixtlar package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

nixtlar A Software Development Kit for 'Nixtla''s 'TimeGPT'

Data Requirements In nixtlar: A Software Development Kit for 'Nixtla''s 'TimeGPT'

1. Input Requirements

2. Multiple Series

3. Exogenous Variables

4. Missing values

5. Minimum data requirements

Try the nixtlar package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

nixtlar
A Software Development Kit for 'Nixtla''s 'TimeGPT'

Data Requirements
In nixtlar: A Software Development Kit for 'Nixtla''s 'TimeGPT'