interpol_splines: interpolated splines algorithm to fill missing values

Description Usage Arguments Details Value Author(s) Examples

Description

This main function fills gaps in monovariate or multivariate data by SVD-imputation which is closely related to expectation-maximization (EM) algorithm with splines interpolation

Usage

1
2
interpol_splines(y, nembed = 1, nsmo = 0, ncomp = 0, threshold1 = 1e-05,
  niter = 30, displ = F)

Arguments

y

a numeric data.frame or matrix of data with gaps

nembed

integer value controlling embedding dimension (must be > 1 for monovariate data)

nsmo

integer value controlling cutoff time scale in number of samples. Set it to 0 if only one single time scale is desired.

ncomp

controls the number of significant components. It has to be specified for running in automatic mode. Default (0) leads to manual selection during the algorithm

threshold1

numeric value controllingthe stop of the iterations after the relative energy change is < threshold

niter

numeric value controlling the maximum number of iterations

displ

boolean controlling the display of some information in the console during the algorithm

Details

The method decomposes the data into two time scales, which are processed separately and then merged at the end. The cutoff time scale (nsmo) is expressed in number of samples. A splines "filter" is used for filtering. Monovariate data must be embedded first (nembed>1). In the initial data set, gaps are supposed to be filled in with NA !!.

The three tuneable (hyper)parameters are :

ncomp
nsmo
nembed

Value

A list with the following elements:

y.filled

The same dataset as y but with gaps filled

w.distSVD

The distribution of the weights of the SVD

errorByComp

Numeric vector of length niter (??) containing the errors associated to each iterations( or comp?)

But only the first one really affects the outcome. A separation into two scales only (with a threshold between 50–100 days) isenough to properly capture both short- and long-term evolutions, and embedding dimensions of D = 2−5 are usually adequate for reconstructing daily averages. The determination of the optimum parameters and validation of the results is preferably made by cross-validation.

Author(s)

Antoine Pissoort, antoine.pissoort@student.uclouvain.be

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
# Take this for input, as advised in the test.m file
y <- sqrt(data.mat2.fin+1) # Selected randomly here, for testing

options(mc.cores=parallel::detectCores()) # all available cores
z_splines <- interpol_splines(y, nembed = 2, nsmo = 8, ncomp = 4,
                             niter = 30, displ = F)
# 80 sec for the whole dataset
z_splines <- z_splines$y.filled
z_splines = z_splines*z_splines - 1
z_splines[z_splines<0] <- 0
ssn_splines <- z_splines

proto4426/ValUSunSSN documentation built on May 26, 2019, 10:31 a.m.