interpolsvd_em: interpolated SVD EM algorithm to fill missing values

Description Usage Arguments Details Value Author(s) Examples

Description

This main function fills gaps in monovariate or multivariate data by SVD-imputation which is closely related to expectation-maximization (EM) algorithm.

Usage

1
2
interpolsvd_em(y, nembed = 1, nsmo = 0, ncomp = 0, threshold1 = 1e-05,
  niter = 30, displ = F)

Arguments

y

a numeric data.frame or matrix of data with gaps

nembed

integer value controlling embedding dimension (must be > 1 for monovariate data)

nsmo

integer value controlling cutoff time scale in number of samples. Set it to 0 if only one single time scale is desired.

ncomp

controls the number of significant components. It has to be specified for running in automatic mode. Default (0) leads to manual selection during the algorithm

threshold1

numeric value controllingthe stop of the iterations after the relative energy change is < threshold

niter

numeric value controlling the maximum number of iterations

displ

boolean controlling the display of some information in the console during the algorithm

Details

The method decomposes the data into two time scales, which are processed separately and then merged at the end. The cutoff time scale (nsmo) is expressed in number of samples. A gaussian filter is used for filtering. Monovariate data must be embedded first (nembed>1). In the initial data set, gaps are supposed to be filled in with NA !!.

The three tuneable (hyper)parameters are :

ncomp
nsmo
nembed

Value

A list with the following elements:

y.filled

The same dataset as y but with gaps filled

w.distSVD

The distribution of the weights of the initial SVD

But only the first one really affects the outcome. A separation into two scales only (with a threshold between 50–100 days) isenough to properly capture both short- and long-term evolutions, and embedding dimensions of D = 2−5 are usually adequate for reconstructing daily averages. The determination of the optimum parameters and validation of the results is preferably made by cross-validation.

Author(s)

Antoine Pissoort, antoine.pissoort@student.uclouvain.be

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
# Take this for input, as advised in the test.m file
y <- sqrt(data.mat2.fin+1) # Selected randomly here, for testing

options(mc.cores=parallel::detectCores()) # all available cores

z <- interpolsvd_em(y, nembed = 2, nsmo = 81, ncomp = 4,
                    niter = 30, displ = F)
# 393 sec for the whole dataset (with some stations discarded)

# Then do the inverse transformation to obtain final dataset with filled values
z <- z$y.filled
z_final = z*z - 1
z_final[z_final<0] <- 0

proto4426/ValUSunSSN documentation built on May 26, 2019, 10:31 a.m.