assemble_ts: Simulate a time series
In edwardlavender/Tools4ETS: Tools for Ecological Time Series

assemble_ts

R Documentation

Simulate a time series

Description

This function assembles a basic dataframe structure that defines, for each factor level (i.e., individual) inputted, a sequence of time steps, possibly differing in duration and resolution across factor levels. Since this function was inspired by the simulation of depth time series, commonly assumed drivers of depth (namely, sex, length, sun angle, lunar phase and Julian day) can be included in the dataframe by supplying these to the covariates argument. In this case, sex is simulated for each individual from a discrete distribution with a user-defined parameter (the probability of sampling a female) and length is simulated for each individual from a Gamma distribution with user-specified parameters. Sun angle, lunar phase and Julian day are calculated for each time stamp using getSunlightPosition, lunar.phase and yday respectively. For other covariates/ecological time series, the user can use this dataframe to define covariate values. In both cases, this information can then be used simulate values of a response (see sim_ts).

Usage

assemble_ts(
  start_date,
  start_date_variable = TRUE,
  max_duration_days,
  duration_days_variable = FALSE,
  resolution_minutes,
  n_individuals,
  longitude,
  latitude,
  tz = "UTC",
  covariates = NULL,
  parameters = list(start_date = list(ndays = max_duration_days - 1, prob = NULL, replace
    = TRUE), duration_days = list(prob = NULL, replace = TRUE), resolution_mins =
    list(prob = NULL, replace = TRUE), sex = list(Pf = 0.5, replace = TRUE), length =
    list(shape = 25, scale = 4, plot_density_curve = TRUE))
)

Arguments

`start_date`	A character specifying the date of the first observation, specified as "yyyy-mm-dd".
`start_date_variable`	A logical input specifying whether or not the start date for simulated time series should differ among individuals (if multiple individuals are specified). This is controlled via the `parameters` argument.
`max_duration_days`	A number specifying the maximum duration, in days, over which to simulate data.
`duration_days_variable`	A logical input specifying whether or not the duration of time series for each factor level should vary (if multiple levels have been specified).
`resolution_minutes`	A number or vector specifying the duration, in minutes, between consecutive simulated time stamps. If a single number is specified, the resolution is taken to be the same across all individuals. If more than one number is defined, supplied numbers are taken to be the resolutions at which individual time series will be sampled. The duration between consecutive simulated time stamps for each individual is sampled randomly from this vector, according to the specifications in the `parameters` list (see below).
`n_individuals`	A number specifying the number of individuals (factor levels) for which to simulate data.
`longitude`	A number specifying the longitude (decimal degrees) of the simulated location. This is required to calculate the covariate `sun_angle` (see below).
`latitude`	A number specifying the latitude (decimal degrees) of the simulated location. This is required to calculate the covariate `sun_angle` (see below).
`tz`	A character specifying the time zone. The default is `"UTC"`.
`covariates`	A character vector specifying the covariates to be included in the dataframe. Currently supported covariates are: (1) `"sex"`, a factor which distinguishes sexes (F, female; M, male); (2) `"length"` (cm); (3) `"sun_angle"`, the angle (degrees) of the sun above the horizon (see `getSunlightPosition`); (4) `"lunar_phase"`, the lunar phase (radians; see `lunar.phase`); (5) `"julian_day"` (the number of days since January 1st).
`parameters`	A nested list specifying additional parameters. This currently supports the following elements. (1) An element which adjusts the variation in `start_date` among individuals, if `start_date_variable = TRUE`. `ndays` is the maximum number of days, from the `start_date`, at which an individual time series can begin. Start dates are simulated between the dates defined by `start_date` and `start_date + ndays` using `sample`. `prob` is a vector of probabilities which defines the probability of sampling any given date between the `start_date` and `start_date + ndays`. By default `prob = NULL`; i.e. start dates are sampled from a uniform distribution between `start_date` and `start_date + ndays`. If `prob` is specified, this should be a vector of length `ndays +1`. `replace` defines whether or not to sample the vector of possible start dates with, or without replacement. By default, `replace = TRUE`. (2) An element which defines the parameters for `sample` in order to simulate variation in the duration (days) of each time series, if `duration_days_variable = TRUE`. (3) An element which defines the parameters for `sample` in order to simulate variation in the resolution (minutes) of each time series, if a vector of length > 1 is supplied to `resolution_minutes`. (4) An element that defines the distribution from which sex is simulated. `"Pf"` defines the probability that any given individual simulated is a female (and, therefore, the probability of any given individual being male, which is 1 - Pf). (5) An element which adjusts the distribution from which individual lengths (cm) are simulated. Lengths are assumed to be drawn from a Gamma distribution, defined by parameters `shape` and `scale` (see `GammaDist`). `plot_density_curve` is a logical input which, if `TRUE`, causes the function to return a theoretical density curve of the distribution from which lengths are simulated.

Details

A dataframe comprising a sequence of time stamps (as possible covariate values) is simulated in order to set up a dataframe which can be used to simulate values of a response. sim_ts provides a starting framework to simulate the response.

Value

The function outputs a dataframe, with the following columns: (1) 'individual', an integer which distinguishes each unique individual; (2) 'timestamp', a time in POSIXct format, which defines each unique observation/time step, at the specified resolution, (3) 'hourofday', an integer which defines the hour of day; and (4) columns for each of the inputted covariates (if applicable).

Examples

# Simulate a dataframe for a single individual:
assemble_ts(start_date = "2017-01-01",
             start_date_variable = FALSE,
             max_duration_days = 10,
             duration_days_variable = FALSE,
             resolution_minutes = 720,
             n_individuals = 1,
             longitude = 5,
             latitude = 65,
             tz = "UTC",
             covariates = c("sex", "length", "sun_angle", "lunar_phase", "julian_day"),
             parameters = list(
                          sex = list(Pf = 0.5, replace = TRUE),
                          length = list(shape = 10, scale = 4, plot_density_curve = TRUE)
                          )
)

# Simulate data from  multiple individuals with variable
# .. start dates,  durations and resolutions
assemble_ts(start_date = "2018-01-01",
             start_date_variable = TRUE,
             max_duration_days = 21,
             duration_days_variable = TRUE,
             resolution_minutes = c(2, 30, 60),
             n_individuals = 3,
             longitude = 5,
             latitude = 65,
             tz = "UTC",
             covariates = c("sex", "length", "sun_angle", "lunar_phase", "julian_day"),
             parameters = list(start_date = list(ndays = 100, prob = NULL, replace = TRUE),
                               duration_days = list(prob = NULL, replace = TRUE),
                               resolution_mins = list(prob = NULL, replace = TRUE),
                               sex = list(Pf = 0.5, replace = TRUE),
                               length = list(shape = 10, scale = 4,
                                             plot_density_curve = TRUE)
                          )
)

edwardlavender/Tools4ETS documentation built on Nov. 29, 2022, 7:41 a.m.