simulate_dataset_ts: Simulate a dataframe of time series data
In lhz1029/fakeR: Simulates Data from a Data Frame of Different Variable Types

Description Usage Arguments Details Value Author(s) Examples

View source: R/fakeR.R

This function simulates clustered numeric time series data from an ARIMA model fit.

1
2
3

simulate_dataset_ts(dataset, digits=2, n=NA, cluster=NA, time.variable=NA,
                    date.index=FALSE, complete.panel=FALSE,
                    stealth.level=2, level3.noise=FALSE, use.miss=TRUE, ignore=NA)

`dataset`	the data frame from which to generate a randomized version
`digits`	the number of digits after the decimal point to include in the new values
`n`	number of rows in the new data frame. Equal to the number of rows in the original if set to NA, the default.
`cluster`	the column names of the time series variables. Argument should be in the form of a list if multiple values.
`time.variable`	the column name(s) of the time variables corresponding to each time series variable. Should be the same length as cluster, even if that means including the same time variable multiple times.
`date.index`	whether the time variable is a date and should be treated as a Date object.
`complete.panel`	when set to TRUE, indicates a preprocessing step needed to complete the time series columns. Specifically, inserts all missing dates and zero values at each of those time points.
`stealth.level`	when set to 2 (default), simulates independent time series observations. When set to 3, does not take into account any covariances between time points and instead randomly samples from a uniform distribution ranging from the min to the max of the data for each variable. No option 1.
`level3.noise`	when set to TRUE, add Gaussian noise to the min and max parameter for the uniform distribution in stealth.level 3. The noise term has a variance of one fourth of the range of the data for any particular variable.
`use.miss`	when set to TRUE, inserts the missing data like is present in the original.
`ignore`	specifies which columns to ignore (i.e. to leave as is instead of simulate). Takes in a list of column names as input.

Note that this function is specific to two types of numeric time series, stationary ones and zero-inflated count ones. For modeling clustered numeric data assuming a multivariate normal distribution, look at simulate_dataset. Note that this function only accepts numeric observation types.

The function assumes each time series process is independent of the others, and allows for a different time variable to be associated with each series. Thus, there is no stealth level of 1 for simulate_dataset_ts(), as this function does not simulate multivariate time series. Columns not part of time series values or time indices are ignored and not simulated.

A data frame. Columns alternating time variable and cluster variable, with the cluster/time variable pairs in the order inputted into the function arguments.

Lily Zhang Dustin Tingley

# An example using the treering dataset from the R datasets package
tree_ring <- data.frame(treering)
tree_ring$year <- c(1: nrow(tree_ring))
sim_tree_ring <- simulate_dataset_ts(tree_ring, 
                                     cluster="treering", 
                                     time.variable="year")
par(mfrow = c(2, 1), mar = c(3, 3, 4, 2), mgp = 0.9 * 2:0)
plot (tree_ring$year, tree_ring$treering, type='l', 
      main=paste("Original","Normalized ring width"),
      ylab="Ring width", xlab="Year index")
plot (tree_ring$year, tree_ring$treering, type='l', 
      main=paste("Simulated","Normalized ring width"),
      ylab="Ring width", xlab="Year index")