assemble_ts: Simulate a time series

View source: R/assemble_ts.R

assemble_tsR Documentation

Simulate a time series

Description

This function assembles a basic dataframe structure that defines, for each factor level (i.e., individual) inputted, a sequence of time steps, possibly differing in duration and resolution across factor levels. Since this function was inspired by the simulation of depth time series, commonly assumed drivers of depth (namely, sex, length, sun angle, lunar phase and Julian day) can be included in the dataframe by supplying these to the covariates argument. In this case, sex is simulated for each individual from a discrete distribution with a user-defined parameter (the probability of sampling a female) and length is simulated for each individual from a Gamma distribution with user-specified parameters. Sun angle, lunar phase and Julian day are calculated for each time stamp using getSunlightPosition, lunar.phase and yday respectively. For other covariates/ecological time series, the user can use this dataframe to define covariate values. In both cases, this information can then be used simulate values of a response (see sim_ts).

Usage

assemble_ts(
  start_date,
  start_date_variable = TRUE,
  max_duration_days,
  duration_days_variable = FALSE,
  resolution_minutes,
  n_individuals,
  longitude,
  latitude,
  tz = "UTC",
  covariates = NULL,
  parameters = list(start_date = list(ndays = max_duration_days - 1, prob = NULL, replace
    = TRUE), duration_days = list(prob = NULL, replace = TRUE), resolution_mins =
    list(prob = NULL, replace = TRUE), sex = list(Pf = 0.5, replace = TRUE), length =
    list(shape = 25, scale = 4, plot_density_curve = TRUE))
)

Arguments

start_date

A character specifying the date of the first observation, specified as "yyyy-mm-dd".

start_date_variable

A logical input specifying whether or not the start date for simulated time series should differ among individuals (if multiple individuals are specified). This is controlled via the parameters argument.

max_duration_days

A number specifying the maximum duration, in days, over which to simulate data.

duration_days_variable

A logical input specifying whether or not the duration of time series for each factor level should vary (if multiple levels have been specified).

resolution_minutes

A number or vector specifying the duration, in minutes, between consecutive simulated time stamps. If a single number is specified, the resolution is taken to be the same across all individuals. If more than one number is defined, supplied numbers are taken to be the resolutions at which individual time series will be sampled. The duration between consecutive simulated time stamps for each individual is sampled randomly from this vector, according to the specifications in the parameters list (see below).

n_individuals

A number specifying the number of individuals (factor levels) for which to simulate data.

longitude

A number specifying the longitude (decimal degrees) of the simulated location. This is required to calculate the covariate sun_angle (see below).

latitude

A number specifying the latitude (decimal degrees) of the simulated location. This is required to calculate the covariate sun_angle (see below).

tz

A character specifying the time zone. The default is "UTC".

covariates

A character vector specifying the covariates to be included in the dataframe. Currently supported covariates are: (1) "sex", a factor which distinguishes sexes (F, female; M, male); (2) "length" (cm); (3) "sun_angle", the angle (degrees) of the sun above the horizon (see getSunlightPosition); (4) "lunar_phase", the lunar phase (radians; see lunar.phase); (5) "julian_day" (the number of days since January 1st).

parameters

A nested list specifying additional parameters. This currently supports the following elements. (1) An element which adjusts the variation in start_date among individuals, if start_date_variable = TRUE. ndays is the maximum number of days, from the start_date, at which an individual time series can begin. Start dates are simulated between the dates defined by start_date and start_date + ndays using sample. prob is a vector of probabilities which defines the probability of sampling any given date between the start_date and start_date + ndays. By default prob = NULL; i.e. start dates are sampled from a uniform distribution between start_date and start_date + ndays. If prob is specified, this should be a vector of length ndays +1. replace defines whether or not to sample the vector of possible start dates with, or without replacement. By default, replace = TRUE. (2) An element which defines the parameters for sample in order to simulate variation in the duration (days) of each time series, if duration_days_variable = TRUE. (3) An element which defines the parameters for sample in order to simulate variation in the resolution (minutes) of each time series, if a vector of length > 1 is supplied to resolution_minutes. (4) An element that defines the distribution from which sex is simulated. "Pf" defines the probability that any given individual simulated is a female (and, therefore, the probability of any given individual being male, which is 1 - Pf). (5) An element which adjusts the distribution from which individual lengths (cm) are simulated. Lengths are assumed to be drawn from a Gamma distribution, defined by parameters shape and scale (see GammaDist). plot_density_curve is a logical input which, if TRUE, causes the function to return a theoretical density curve of the distribution from which lengths are simulated.

Details

A dataframe comprising a sequence of time stamps (as possible covariate values) is simulated in order to set up a dataframe which can be used to simulate values of a response. sim_ts provides a starting framework to simulate the response.

Value

The function outputs a dataframe, with the following columns: (1) 'individual', an integer which distinguishes each unique individual; (2) 'timestamp', a time in POSIXct format, which defines each unique observation/time step, at the specified resolution, (3) 'hourofday', an integer which defines the hour of day; and (4) columns for each of the inputted covariates (if applicable).

See Also

GammaDist, sample, sim_ts

Examples

# Simulate a dataframe for a single individual:
assemble_ts(start_date = "2017-01-01",
             start_date_variable = FALSE,
             max_duration_days = 10,
             duration_days_variable = FALSE,
             resolution_minutes = 720,
             n_individuals = 1,
             longitude = 5,
             latitude = 65,
             tz = "UTC",
             covariates = c("sex", "length", "sun_angle", "lunar_phase", "julian_day"),
             parameters = list(
                          sex = list(Pf = 0.5, replace = TRUE),
                          length = list(shape = 10, scale = 4, plot_density_curve = TRUE)
                          )
)

# Simulate data from  multiple individuals with variable
# .. start dates,  durations and resolutions
assemble_ts(start_date = "2018-01-01",
             start_date_variable = TRUE,
             max_duration_days = 21,
             duration_days_variable = TRUE,
             resolution_minutes = c(2, 30, 60),
             n_individuals = 3,
             longitude = 5,
             latitude = 65,
             tz = "UTC",
             covariates = c("sex", "length", "sun_angle", "lunar_phase", "julian_day"),
             parameters = list(start_date = list(ndays = 100, prob = NULL, replace = TRUE),
                               duration_days = list(prob = NULL, replace = TRUE),
                               resolution_mins = list(prob = NULL, replace = TRUE),
                               sex = list(Pf = 0.5, replace = TRUE),
                               length = list(shape = 10, scale = 4,
                                             plot_density_curve = TRUE)
                          )
)


edwardlavender/Tools4ETS documentation built on Nov. 29, 2022, 7:41 a.m.