gapfillSSA: Fill gaps in a vector (time-series) with SSA

Description Usage Arguments Details Value Author(s) References See Also Examples

Description

gapfillSSA applies the iterative gap filling procedure proposed by Kondrashov and Ghil (2006) in a fast and optimized way developed by Korobeynikov (2010). Generally spoken, major periodic components of the time series are determined and interpolated into gap positions. An iterative cross validation scheme with artificial gaps is used to determine these periodic components.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
gapfillSSA(amnt.artgaps = c(0.05, 0.05), DetBestIter = ".getBestIteration", 
    debugging = FALSE, amnt.iters = c(10, 10), amnt.iters.start = c(1, 
        1), fill.margins = FALSE, first.guess = c(), GroupEigTrpls = "grouping.auto", 
    groupingMethod = "wcor", kind = c("auto", "1d-ssa", "2d-ssa")[1], 
    M = floor(length(series)/3), matrix.best.iter = "perf.all.gaps", 
    MeasPerf = "RMSE", n.comp = 2 * amnt.iters[1], open.plot = TRUE, 
    plot.results = FALSE, plot.progress = FALSE, pad.series = c(0, 
        0), print.stat = TRUE, remove.infinite = FALSE, scale.recstr = TRUE, 
    series, seed = integer(), size.biggap = 20, SSA.methods = c("nutrlan", 
        "propack", "eigen", "svd"), tresh.convergence = 0.01, 
    tresh.min.length = 5, z.trans.series = TRUE)

Arguments

amnt.artgaps

numeric vector: The relative ratio (amount gaps/series length) of artificial gaps to include to determine the iteration with the best prediction (c(ratio big gaps, ratio small gaps)). If this is set to c(0,0), the cross validation step is excluded and the iteration is run until amnt.iters.

DetBestIter

function: Function to determine the best outer and inner iteration to use for reconstruction. If no function is given, the standard way is used. (see ?.getBestIteration)

debugging

logical: If set to TRUE, workspaces to be used for debugging are saved in case of (some) errors or warnings.

amnt.iters

integer vector: Amount of iterations performed for the outer and inner loop (c(outer,inner)).

amnt.iters.start

integer vector: Index of the iteration to start with c(outer, inner). If this value is > 1, the reconstruction (!) part is started with this iteration. Currently it is only possible to set this to values > 1 if amnt.artgaps != 0 as this would cause a cross validation loop.

fill.margins

logical: Whether to fill gaps at the outer margins of the series, i.e. to extrapolate before the first and after the last valid value. Doing this most probably produces unreliable results (i.e. a strong build up of amplitude).

first.guess

numeric vector/matrix: First guess for the gap values. The mean/zero is used if no value is supplied. Has to have the same dimensions and lengths as series.

GroupEigTrpls

character string: Name of the function used to group the eigentriples. This function needs to take a ssa object as its first input and other inputs as its ... argument. It has to return a list with the length of the desired amount of SSA groups. Each of its elements has to be a integer vector indicating which SSA eigentriple(s) belong(s) to this group. The function 'grouping.auto' uses the methods supplied by the Rssa package (See argument groupingMethod to set the corresponding argument for the method). Another possibility is 'groupSSANearestNeighbour' which uses a rather ad-hoc method of detecting the nearest (Euclidian) neighbour of each eigentriple. 2D SSA automatically uses the nearest neighbor method as grouping was not (yet) implemented for 2D SSA.

groupingMethod
kind

character string: Whether to calculate one or two dimensional SSA (see the help of ssa()). Default is to determine this automatically by determining the dimensions of series.

M

integer: Window length or embedding dimension [time steps]. If not given, a default value of 0.33*length(timeseries) is computed. For 2d SSA a vector of length 2 has to be supplied. If only one number is given, this is taken for both dimensions. (see ?ssa, here the parameter is called L)

matrix.best.iter

character string: Which performance matrix to use (has to be one of recstr.perf.a, recstr.perf.s or recstr.perf.b (see ?.getBestIteration)).

MeasPerf

character string: Name of a function to determine the 'goodness of fit' between the reconstruction and the actual values in the artificial gaps. The respective function has to take two vectors as an input and return one single value. Set to the "Residual Mean Square Error" (RMSE) by default.

n.comp

integer: Amount of eigentriples to extract (default if no values are supplied is 2*amnt.iters[1]) (see ?ssa, here the parameter is called neig).

open.plot

logical: Whether to open a new layout of plots for the performance plots.

plot.results

logical: Whether to plot performance visualization for artificial gaps?

plot.progress

logical: whether to visualize the iterative estimation of the reconstruction process during the calculations.

pad.series

integer vector (length 2): Length of the part of the series to use for padding at the start (first value) and at the end of the series. Values of zero cause no padding. This feature has not yet been rigorously tested!

print.stat

logical: Whether to print status information during the calculations.

remove.infinite

logical: Whether to remove infinite values prior to the calculation.

scale.recstr

logical: whether to scale the reconstruction to sd = 1 at the end of each outer loop step.

series

numeric vector/matrix: equally spaced input time series or matrix with gaps (gap = NA)

seed

integer: Seed to be taken for the randomized determination of the positions of the artificial gaps and the nutrlan ssa algorithm. Per default, no seed is set.

size.biggap

integer: Length of the big artificial gaps (in time steps)

SSA.methods

character vector: Methods to use for the SSA computation. First the first method is tried, when convergence fails the second is used and so on. See the help of ssa() in package Rssa for details on the methods. The last two methods are relatively slow!

tresh.convergence

numeric value: Threshold below which the last three sums of squared differences between inner iteration loops must fall for the whole process to be considered to have converged.

tresh.min.length

integer: minimum length the series has to have to do computations.

z.trans.series

logical: whether to perform z-transformation of the series prior to the calculation.

Details

Artificial Gaps: The amount of artificial gaps to be included is determined as follows: amnt.artgaps determines the total size of the artificial gaps to be included. The number (0-1) determines the number a relative ratio of the total amount of available datapoints. To switch off the inclusion of either small or biggaps, set respective ratio to 0. In general the ratios determine a maximum amount of gaps. size.biggap sets the size of the biggaps. Subsequently the number of biggaps to be included is determined by calculating the maximum possible amount of gaps of this size to reach the amount of biggaps set by amnt.artgaps[1]. The amount of small gaps is then set according to the ratio of amnt.artgaps[1]/amnt.artgaps[2].

Iteration performance measure: The DetBestIter function should take any of the RMSE matrices (small/big/all gaps) as an input and return i.best with best inner loops for each outer loop and h.best as the outer loop until which should be iterated. Use the default function as a reference.

Visualize results: If plot.per == TRUE an image plot is produced visualizing the RMSE between the artificial gaps and the reconstruction for each iteration. A red dot indicates the iteration chosen for the final reconstruction.

Padding: For padding the series should start and end exactly at the start and end of a major oscillation (e.g. a yearly cycle and the length to use for padding should be a integer multiple of this length. The padding is solved internally by adding the indicated part of the series at the start and at the end of the series. This padded series is only used internally and only the part of the series with original data is returned in the results. Padding is not (yet) possible for two dimensional SSA.

Multidimensional SSA: 1d or 2d SSA is possible. If a vector is given, one dimensional SSA is computed. In case of a matrix as input, two dimensional SSA is performed. For the two dimensional case two embedding should be given (one in the direction of each dimension). If 'big gaps' are set to be used for the cross validation, quadratic blocks of gaps with the size 'size.biggap'*'size.biggap' are inserted.

Value

list with components

error.occoured

logical: whether a non caught error occoured in one of the SSA calculations.

filled.series

numeric vector/matrix: filled series with the same length as series but without gaps. Gaps at the margins of the series can not be filled and will occur in filled.series (and reconstr).

i.best

integer matrix: inner loop iteration for each outer loop step in which the process has finally converged (depending on the threshold determined by tresh.convergence). If the RMSE between two inner loop iterations has been monotonously sinking (and hence, the differences between SSA iterations can be expected to be rather small), this is set to amnt.iters[2]. If not, the process most likely has been building up itself, this is set to 0. In both cases iloop.converged is set FALSE.

iloop.converged

logical matrix: Whether each outer loop iteration has converged (see also i.best).

iter.chosen

integer vector: iterations finally chosen for the reconstruction.

perf.all.gaps

numeric matrix: performance (RMSE) for the filling of all artificial gaps.

perf.small.gaps

numeric matrix: performance (RMSE) for the filling of the small artificial gaps.

perf.big.gaps

numeric matrix: performance (RMSE) for the filling of the big artificial gaps.

process.converged

logical: Whether the whole process has converged. For simplicity reasons, this only detects whether the last outer loop of the final filling process has converged.

reconstr

numeric vector/matrix: filtered series or reconstruction finally used to fill gaps.

recstr.diffsum

numeric matrix: RMSE between two consecutive inner loop iterations. This value is checked to be below tresh.convergence to determine whether the process has converged.

settings

list: settings used to perform the calculation.

Author(s)

Jannis v. Buttlar

References

Kondrashov, D. & Ghil, M. (2006), Spatio-temporal filling of missing points in geophysical data sets, Nonlinear Processes In Geophysics,S 2006, Vol. 13(2), pp. 151-159 Korobeynikov, A. (2010), Computation- and space-efficient implementation of SSA. Statistics and Its Interface, Vol. 3, No. 3, Pp. 257-268

See Also

ssa

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
## create series with gaps
series.ex <- sin(2 * pi * 1:1000 / 100) +  0.7 * sin(2 * pi * 1:1000 / 10) +
  rnorm(n = 1000, sd = 0.4)
series.ex[sample(c(1:1000), 30)] <- NA
series.ex[c(seq(from = sample(c(1:1000), 1), length.out = 20),
            seq(from = sample(c(1:1000), 1), length.out = 20))]<-NA
indices.gaps <- is.na(series.ex)

## prepare graphics
layout(matrix(c(1:5, 6, 6, 6, 6, 6, 7, 7, 7, 7, 7), ncol = 5, byrow = TRUE), 
       widths = c(1, 1, 1, 0.1, 0.1))
par(mar = c(2, 0, 0, 0.2), oma = c(0, 3, 2, 0.2), tcl = 0.2, mgp = c(0, 0, 100),
    las = 1)

## perform gap filling
data.filled <- gapfillSSA(series = series.ex, plot.results = TRUE, open.plot = FALSE)

## plot series and filled series
plot(series.ex, xlab = '', pch = 16)
plot(data.filled$filled.series, col = indices.gaps+1, xlab = '', pch = 16)
points(data.filled$reconstr, type = 'l', col = 'blue')
mtext(side = 1, 'Index', line = 2)
legend(x = 'topright', merge = TRUE, pch = c(16, 16, NA), lty = c(NA, NA, 1), 
       col = c('black', 'red', 'blue'),
       legend = c('original values', 'gap filled values', 'reconstruction'))

spectral.methods documentation built on May 29, 2017, 9:12 a.m.