date.simulate: Simulate date distributions.

Description Usage Arguments Value Examples

View source: R/date.simulate.R

Description

Simulates chronological distributions from a table of entities with defined date ranges, based on assumption of uniform probability between limits, then (optionally) simulates a dummy set of the same size drawing from a specified distribution.

Usage

1
2
3
4
5
date.simulate(data, probs = 1, weight = 1, ds.fun = sum, real = TRUE,
  dummy = FALSE, comp.field = NULL, comp.values = NULL,
  context.fields = c("ID"), quant.list = c(0.025, 0.25, 0.5, 0.75, 0.975),
  start.date = NULL, end.date = NULL, a = 1, b = 1, bin.width = 100,
  reps = 100, RoC = NULL, summ = TRUE)

Arguments

data

Data table (or object that can be coerced to one) with, minimally, two numeric columns called Start and End.

probs

Numeric vector defining a null model from which to sample the dummy set. Will be recycled up to nrow(data), so passing a single value results in a uniform null model. If length > 1 then probs is used to set number of bins, overriding bin.width. Defaults to 1.

weight

Numeric vector: the weight to be applied to each row in 'data' (and to its counterpart in the dummy set), or a constant weight to be applied to all. Defaults to 1.

ds.fun

Function: the summary function to be applied to the entities in each bin during each simulation run. Defaults to sum, i.e. calculates frequency distributions, but can be set to e.g. mean or median to deal with e.g. metrical data.

real

Logical: should the date distribution of the empirical data be simulated? Defaults to TRUE.

dummy

Logical: should a dummy set be simulated in addition to the empirical data? Defaults to FALSE.

comp.field

Character: optional name of column to be used to subset 'data'. Defaults to NULL.

comp.values

Optional vector specifying values of 'comp.field' to compare. Defaults to NULL, in which case all unique values (exlcuding blanks and NAs) are compared if comp.field is not NULL.

context.fields

Character vector specifying the column(s) in data which define the minimal stratigraphic entities to analyse. Add more column names if you want to group the data by additional criteria prior to simulations - For example, should different taxa be treated separately rather than lumped together when analysing bone remains from a table of contexts? Defaults to "ID".

quant.list

Numeric vector of quantiles to be calculated in a summary table. Defaults to c(0.025,0.25,0.5,0.75,0.975).

start.date

Numeric: the start of time period to be considered. Defaults to lowest value in data$Start.

end.date

Numeric: the end of time period to be considered. Defaults to highest value in data$End.

a

Numeric vector: alpha parameter of beta distribution for each row in 'data', or a constant parameter to be used in each case. Must be positive (negative values will be converted). Defaults to 1, for uniformity.

b

Numeric vector: beta parameter of beta distribution for each row in 'data', or a constant parameter to be used in each case. Must be positive (negative values will be converted).Defaults to 1, for uniformity.

bin.width

Numeric: the resolution of the analysis, in units of time. Defaults to 100.

reps

Integer: the number of times the simulation will be run. Defaults to 100.

RoC

Rate of Change. Character: how should rates of change between adjacent bins be calculated alongside the raw counts? In absolute terms ("a") or relative to the current bin ("r"). Defaults to NULL, in which case not calculated at all.

summ

Logical: should a summary table be calculated (allowing plotting with poly.chron, for example)? Defaults to TRUE.

Value

A list with two named elements: "full" is a long-format data table with at least five named columns: 'rep.no', integer specifying simulation run; 'bin', character specifying chronological bin in terms of date range; 'bin.no' integer specifying number of bin, counting from earliest; 'count', numeric giving the number of entities (or total weight) assigned to the given bin in the given simulation run; 'dummy', giving the number of entities (or total weight) assigned to a bin in the dummy version of a given simulation run. If RoC=TRUE there will be two more columns: 'RoC.count' and 'RoC.dummy' give the rate of change between this bin and the next for 'count' and 'dummy' respectively. "summary" is a second long format data table with four named columns: 'bin', as above; 'V1', the relevant value for the given bin at a given quantile; 'quantile', the quantile at which V1 is calculated; 'id', character specifying which column from "full" V1 is based on: e.g. "count", "dummy", "RoC.count", "RoC.dummy".

Examples

1
2
date.ranges <- data.table(ID=c(1, 2, 3), Start=c(450, 450, 600), End=c(700, 800, 650))
x <- date.simulate(date.ranges, weight=date.ranges$frag.count, context.fields=NULL)

davidcorton/archSeries documentation built on May 4, 2021, 10:09 p.m.