na_approx: Replace NA by Interpolation

Description Usage Arguments Details Value See Also Examples

View source: R/na_approx.R

Description

Generic functions for replacing each NA with interpolated values.

Usage

1
na_approx(object, ...)

Arguments

object

object in which NAs are to be replaced

...

further arguments passed to methods. The n argument of approx is currently not supported.

x, xout

Variables to be used for interpolation as in approx.

na_rm

logical. If the result of the (spline) interpolation still results in NAs, should these be removed?

maxgap

maximum number of consecutive NAs to fill. Any longer gaps will be left unchanged. Note that all methods listed above can accept maxgap as it is ultimately passed to the default method.

along

deprecated.

Details

Missing values (NAs) are replaced by linear interpolation via approx or cubic spline interpolation via spline, respectively.

It can also be used for series disaggregation by specifying xout.

By default the index associated with object is used for interpolation. Note, that if this calls index.default this gives an equidistant spacing 1:NROW(object). If object is a matrix or data.frame, the interpolation is done separately for each column.

If obj is a plain vector then na_approx(obj, x, y, xout, ...) returns approx(x = x[!na], y = coredata(obj)[!na], xout = xout, ...) (where na indicates observations with NA) such that xout defaults to x.

If obj is a zoo, zooreg or ts object its coredata value is processed as described and its time index is xout if specified and index(obj) otherwise. If obj is two dimensional then the above is applied to each column separately. For examples, see below.

If obj has more than one column, the above strategy is applied to each column.

Value

An object of similar structure as object with (internal) NAs replaced by interpolation. Leading or trailing NAs are omitted if na_rm = TRUE or not replaced if na_rm = FALSE.

See Also

zoo, approx, na_contiguous, na_locf, na_omit, na_trim, spline, stinterp

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
z <- zoo(c(2, NA, 1, 4, 5, 2), c(1, 3, 4, 6, 7, 8))

## use underlying time scale for interpolation
na_approx(z) 
## use equidistant spacing
na_approx(z, 1:6)

# with and without na_rm = FALSE
zz <- c(NA, 9, 3, NA, 3, 2)
na_approx(zz, na_rm = FALSE)
na_approx(zz)

d0 <- as.Date("2000-01-01")
z <- zoo(c(11, NA, 13, NA, 15, NA), d0 + 1:6)

# NA fill, drop or keep leading/trailing NAs
na_approx(z)
na_approx(z, na_rm = FALSE)

# extrapolate to point outside of range of time points
# (a) drop NA, (b) keep NA, (c) extrapolate using rule = 2 from approx()
na_approx(z, xout = d0 + 7)
na_approx(z, xout = d0 + 7, na_rm = FALSE)
na_approx(z, xout = d0 + 7, rule = 2)

# use splines - extrapolation handled differently
z <- zoo(c(11, NA, 13, NA, 15, NA), d0 + 1:6)
na_spline(z)
na_spline(z, na_rm = FALSE)
na_spline(z, xout = d0 + 1:6)
na_spline(z, xout = d0 + 2:5)
na_spline(z, xout = d0 + 7)
na_spline(z, xout = d0 + 7, na_rm = FALSE)

## using na_approx for disaggregation
zy <- zoo(1:3,  2000:2001)

# yearly to monthly series
zmo <- na_approx(zy, xout = as.yearmon(2000+0:13/12))
zmo

# monthly to daily series
sq <- seq(as.Date(start(zmo)), as.Date(end(zmo), frac = 1), by = "day")
zd <- na_approx(zmo, x = as.Date, xout = sq)
head(zd)

# weekly to daily series
zww <- zoo(1:3, as.Date("2001-01-01") + seq(0, length = 3, by = 7))
zww
zdd <- na_approx(zww, xout = seq(start(zww), end(zww), by = "day"))
zdd

# The lines do not show up because of the NAs
plot(cbind(z, z), type = "b", screen = 1)
# use na_approx to force lines to appear
plot(cbind(z, na_approx(z)), type = "b", screen = 1)

# Workaround where less than 2 NAs can appear in a column
za <- zoo(cbind(1:5, NA, c(1:3, NA, 5), NA)); za

ix <- colSums(!is.na(za)) > 0
za[, ix] <- na_approx(za[, ix]); za

# using na_approx to create regularly spaced series
# z has points at 10, 20 and 40 minutes while output also has a point at 30
if(require("chron")) {
  tt <- as.chron("2000-01-01 10:00:00") + c(1, 2, 4) * as.numeric(times("00:10:00"))
  z <- zoo(1:3, tt)
  tseq <- seq(start(z), end(z), by = times("00:10:00"))
  na_approx(z, xout = tseq)
}

decisionpatterns/na.actions documentation built on Aug. 25, 2020, 8:04 p.m.