model_missing_data: Model missing time series data

View source: R/model_missing_data.R

model_missing_dataR Documentation

Model missing time series data

Description

Returns an object of class "tsrobprep" which contains the original data and the modelled missing values to be imputed. The function model_missing_data models missing values in a time series data using a robust time series decomposition with the weighted lasso or the quantile regression. The model uses autoregression on the time series as explanatory variables as well as the provided external variables. The function is designed for numerical data only.

Usage

model_missing_data(
  data,
  S,
  tau = NULL,
  no.of.last.indices.to.fix = S[1],
  indices.to.fix = NULL,
  replace.recursively = TRUE,
  p = NULL,
  mirror = FALSE,
  lags = NULL,
  extreg = NULL,
  n.best.extreg = NULL,
  use.data.as.ext = FALSE,
  lag.externals = FALSE,
  consider.as.missing = NULL,
  whole.period.missing.only = FALSE,
  debias = FALSE,
  min.val = -Inf,
  max.val = Inf,
  Cor_thres = 0.5,
  digits = 3,
  ICpen = "BIC",
  decompose.pars = list(),
  ...
)

Arguments

data

an input vector, matrix or data frame of dimension nobs x nvars containing missing values; each column is a variable.

S

a number or vector describing the seasonalities (S_1, ..., S_K) in the data, e.g. c(24, 168) if the data consists of 24 observations per day and there is a weekly seasonality in the data.

tau

the quantile(s) of the missing values to be estimated in the quantile regression. Tau accepts all values in (0,1). If NULL, then the weighted lasso regression is performed.

no.of.last.indices.to.fix

a number of observations in the tail of the data to be fixed, by default set to first element of S.

indices.to.fix

indices of the data to be fixed. If NULL, then it is calculated based on the no.of.last.indices.to.fix parameter. Otherwise, the no.of.last.indices.to.fix parameter is ignored.

replace.recursively

if TRUE then the algorithm uses replaced values to model the remaining missings.

p

a number or vector of length(S) = K indicating the order of a K-seasonal autoregressive process to be estimated. If NULL, chosen data-based.

mirror

if TRUE then autoregressive lags up to order p are not only added to the seasonalities but also subtracted.

lags

a numeric vector with the lags to use in the autoregression. Negative values are accepted and then also the "future" observations are used for modelling. If not NULL, p and mirror are ignored.

extreg

a vector, matrix or data frame of data containing external regressors; each column is a variable.

n.best.extreg

a numeric value specifying the maximal number of considered best correlated external regressors (selected in decreasing order). If NULL, then all variables in extreg are used for modelling.

use.data.as.ext

logical specifying whether to use the remaining variables in the data as external regressors or not.

lag.externals

logical specifying whether to lag the external regressors or not. If TRUE, then the algorithm uses the lags specified in parameter lags.

consider.as.missing

a vector of numerical values which are considered as missing in the data.

whole.period.missing.only

if FALSE, then all observations which correspond to the values of consider.as.missing are treated as missings. If TRUE, then only consecutive observations of specified length are considered (length is defined by first element of S).

debias

if TRUE, the recursive replacement is additionally debiased.

min.val

a single value or a vector of length nvars providing the minimum possible value of each variable in the data. If a single value, then it applies to all variables. By default set to -Inf.

max.val

a single value or a vector of length nvars providing the maximum possible value of each variable in the data. If a single value, then it applies to all variables. By default set to Inf.

Cor_thres

a single value providing the correlation threshold from which external regressors are considered in the quantile regression.

digits

integer indicating the number of decimal places allowed in the data, by default set to 3.

ICpen

is the information criterion penalty for lambda choice in the glmnet algorithm. It can be a string: "BIC", "HQC" or "AIC", or a fixed number.

decompose.pars

named list containing additional arguments for the robust_decompose function.

...

additional arguments for the glmnet or rq.fit.fnb algorithms.

Details

The function uses robust time series decomposition with weighted lasso or quantile regression in order to model missing values and prepare it for imputation. In this purpose the robust_decompose function together with the glmnet are used in case of mean regression, i.e. tau = NULL. In case of quantile regression, i.e. tau != NULL the robust_decompose function is used together with the rq.fit.fnb function. The modelled values can be imputed using impute_modelled_data function. \insertNoCite*tsrobprep

Value

An object of class "tsrobprep" which contains the original data, the indices of the data that were modelled, the given quantile values, a list of sparse matrices with the modelled data to be imputed and a list of the numbers of models estimated for every variable.

References

\insertAllCited

See Also

robust_decompose, impute_modelled_data, detect_outliers, auto_data_cleaning

Examples

## Not run: 
model.miss <- model_missing_data(
    data = GBload[,-1], S = c(48,7*48),
    no.of.last.indices.to.fix = dim(GBload)[1], consider.as.missing = 0,
    min.val = 0
)
model.miss$estimated.models
model.miss$replaced.indices
new.GBload <- impute_modelled_data(model.miss)

## End(Not run)

tsrobprep documentation built on March 18, 2022, 6:09 p.m.