fillMissing: Fill Missing Values

View source: R/smwrBase_fillMissing.R

fillMissingR Documentation

Fill Missing Values

Description

Replace missing values in time-series data by interpolation.

Usage

fillMissing(x, span = 10, Dates = NULL, max.fill = 10)

Arguments

x

the sequence of observations. Missing values are permitted and will be replaced.

span

the maximum number of observations on each side of each range of missing values to use in constructing the time-series model. See Details.

Dates

an optional vector of dates/times associated with each value in x. Useful if there are gaps in dates/times.

max.fill

the maximum gap to fill.

Details

Missing values at the beginning and end of x will not be replaced.

The argument span is used to help set the range of values used to construct the StructTS model. If span is set small, then the variance of epsilon dominates and the estimates are not smooth. If span is large, then the variance of level dominates and the estimates are linear interpolations. The variances of level and epsilon are components of the state-space model used to interpolate values, see StructTS for details. See Note for more information about the method.

If span is set larger than 99, then the entire time series is used to estimate all missing values. This approach may be useful if there are many periods of missing values. If span is set to any number less than 4, then simple linear interpolation will be used to replace missing values.

Added from smwrBase.

Value

The observations in x with missing values replaced by interpolation.

Note

The method used to interpolate missing values is based on tsSmooth constructed using StructTS on x with type set to "trend." The smoothing method basically uses the information (slope) from two values previous to missing values and the two values following missing values to smoothly interpolate values accounting for any change in slope. Beauchamp (1989) used time-series methods for synthesizing missing streamflow records. The group that is used to define the statistics that control the interpolation is very simply defined by span rather than the more in-depth measures described in Elshorbagy and others (2000).

If the data have gaps rather than missing values, then fillMissing will return a vector longer than x if Dates is given and the return data cannot be inserted into the original data frame. If Dates is not given, then the gap will be recognized and not be filled. The function insertMissing can be used to create a data frame with the complete sequence of dates.

References

Beauchamp, J.J., 1989, Comparison of regression and time-series methods for synthesizing missing streamflow records: Water Resources Bulletin, v. 25, no. 5, p. 961–975.

Elshorbagy, A.A., Panu, U.S., and Simonovic, S.P., 2000, Group-based estimation of missing hydrological data, I. Approach and general methodology: Hydrological Sciences Journal, v. 45, no. 6, p. 849–866.

See Also

tsSmooth, StructTS

Examples

## Not run: 
#library(smwrData)
data(Q05078470)
# Create missing values in flow, the first sequence is a peak and the second is a recession
Q05078470$FlowMiss <- Q05078470$FLOW
Q05078470$FlowMiss[c(109:111, 198:201)] <- NA
# Interpolate the missing values
Q05078470$FlowFill <- fillMissing(Q05078470$FlowMiss)
# How did we do (line is actual, points are filled values)?
par(mfrow=c(2,1), mar=c(5.1, 4.1, 1.1, 1.1))
with(Q05078470[100:120, ], plot(DATES, FLOW, type="l"))
with(Q05078470[109:111, ], points(DATES, FlowFill))
with(Q05078470[190:210, ], plot(DATES, FLOW, type="l"))
with(Q05078470[198:201, ], points(DATES, FlowFill))

## End(Not run)

leppott/baytrends documentation built on Nov. 2, 2024, 6:42 p.m.