sample_dat: Sample time series data

Description Usage Arguments Value Examples

Description

Sample time series using completely at random (MCAR) or at random (MAR)

Usage

1
2
sample_dat(datin, smps = "mcar", repetition = 10, b = 10, blck = 50,
  blckper = TRUE, plot = FALSE)

Arguments

datin

input numeric vector

smps

chr sring of sampling type to use, options are "mcar" or "mar"

repetition

numeric for repetitions to be done for each missPercent value

b

numeric indicating the total amount of missing data as a percentage to remove from the complete time series

blck

numeric indicating block sizes as a proportion of the sample size for the missing data

blckper

logical indicating if the value passed to blck is a proportion of missper, i.e., blocks are to be sized as a percentage of the total size of the missing data

plot

logical indicating if a plot is returned showing the sampled data, plots only the first repetition

Value

Input data with NA values for the sampled observations if plot = FALSE, otherwise a plot showing the missing observations over the complete dataset.

The missing data if smps = 'mar' are based on random sampling by blocks. The start location of each block is random and overlapping blocks are not counted uniquely for the required sample size given by b. Final blocks are truncated to ensure the correct value of b is returned. Blocks are fixed at 1 if the proportion is too small, in which case "mcar" should be used. Block sizes are also truncated to the required sample size if the input value is too large if blckper = FALSE. For the latter case, this is the same as setting blck = 1 and blckper = TRUE.

For all cases, the first and last oservation will never be removed to allow comparability of interpolation schemes. This is especially relevant for cases when b is large and smps = 'mar' is used. For example, method = na.approx will have rmse = 0 for a dataset where the removed block includes the last n observations. This result could provide misleading information in comparing methods.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
a <- rnorm(1000)

# default sampling
sample_dat(a)

# use mar sampling
sample_dat(a, smps = 'mar')

# show a plot of one repetition
sample_dat(a, plot = TRUE)

# show a plot of one repetition, mar sampling
sample_dat(a, smps = 'mar', plot = TRUE)

# change plot aesthetics
library(ggplot2)
p <- sample_dat(a, plot = TRUE)
p + scale_colour_manual(values = c('black', 'grey'))
p + theme_minimal()
p + ggtitle('Example of simulating missing data')

neerajdhanraj/imputeTestbench documentation built on May 23, 2019, 1:31 p.m.