modelTest: Monte-Carlo simulation test for SPDs

View source: R/tests.R

modelTestR Documentation

Monte-Carlo simulation test for SPDs

Description

Comparison of an observed summed radiocarbon date distribution (aka SPD) with simulated outcomes from a theoretical model.

Usage

modelTest(
  x,
  errors,
  nsim,
  bins = NA,
  runm = NA,
  timeRange = NA,
  backsight = 50,
  changexpr = expression((t1/t0)^(1/d) - 1),
  gridclip = TRUE,
  raw = FALSE,
  model = c("exponential"),
  method = c("uncalsample"),
  predgrid = NA,
  normalised = NA,
  datenormalised = NA,
  spdnormalised = FALSE,
  ncores = 1,
  fitonly = FALSE,
  a = 0,
  b = 0,
  edgeSize = 500,
  verbose = TRUE
)

Arguments

x

A CalDates object containing calibrated radiocarbon ages

errors

A vector of errors corresponding to each radiocarbon age

nsim

Number of simulations

bins

A vector indicating which bin each radiocarbon date is assigned to.

runm

A number indicating the window size of the moving average to smooth both observed and simulated SPDs. If set to NA no moving average is applied.Default is NA.

timeRange

A vector of length 2 indicating the start and end date of the analysis in cal BP. The fitting process is applied considering the SPD within the interval defined by this parameter. If no values are supplied the earliest and latest median calibrated dates of the observed data will be used.

backsight

A single numeric value defining the distance in time between the focal year and the backsight year for computing the rate of change. Default is 10.

changexpr

An expression for calculating the rate of change in SPD between the focal year and a backsight year. Available input options are t1 (the SPD for the focal year), t0 (the SPD for the backsight year), d (the distance between t0 and t1), and any other standard constants and mathematical operators. A sensible default is provided.

gridclip

Whether the sampling of random dates is constrained to the observed range (TRUE) or not (FALSE). Default is TRUE.

raw

A logical variable indicating whether all permuted SPDs should be returned or not. Default is FALSE.

model

A vector indicating the model to be fitted. Currently the acceptable options are 'uniform', 'linear', 'exponential' and 'custom'. Default is 'exponential'.

method

Method for the creation of random dates from the fitted model. Either 'uncalsample' or 'calsample'. Default is 'uncalsample'. See below for details.

predgrid

A data.frame containing calendar years (column calBP) and associated summed probabilities (column PrDens). Required when model is set to 'custom'.

normalised

Whether the simulated dates should be normalised or not. Default based on whether x is normalised or not.

datenormalised

Argument kept for backward compatibility with previous versions.

spdnormalised

A logical variable indicating whether the total probability mass of the SPD is normalised to sum to unity for both observed and simulated data.

ncores

Number of cores used for for parallel execution. Default is 1.

fitonly

A logical variable. If set to TRUE, only the the model fitting is executed and returned. Default is FALSE.

a

Starter value for the exponential fit with the nls function using the formula y ~ exp(a + b * x) where y is the summed probability and x is the date. Default is 0.

b

Starter value for the exponential fit with the nls function using the formula y ~ exp(a + b * x) where y is the summed probability and x is the date. Default is 0.

edgeSize

Controls edge effect by expanding the fitted model beyond the range defined by timeRange.

verbose

A logical variable indicating whether extra information on progress should be reported. Default is TRUE.

Details

The function implements a Monte-Carlo test for comparing a theoretical or fitted statistical model to an observed summed radiocarbon date distribution (aka SPD) and associated rates of changes. A variety of theoretical expectations can be compared to the observed distribution by setting the model argument, for example to fit basic 'uniform' (the mean of the SPD), 'linear' (fitted using the lm function) or model='exponential' models (fitted using the nls function). Models are fitted to the period spanned by timeRange although x can contain dates outside this range to mitigate possible edge effects (see also bracket). Notice that estimating growth model parameters directly from SPD can be biased, resulting into a null hypothesis that is not necessarily representative (see Crema 2022). It is also possible for the user to provide a model of their own by setting model='custom' and then supplying a two-column data.frame to predgrid. The function generates nsim theoretical SPDs from the fitted model via Monte-Carlo simulation, this is then used to define a 95% critical envelope for each calendar year. The observed SPD is then compared against the simulation envelope; local departures from the model are defined as instances where the observed SPD is outside such an envelope, while an estimate of the global significance of the observed SPD is also computed by comparing the total areas of observed and simulated SPDs that fall outside the simulation envelope. The theoretical SPDs can be generated using two different sampling approaches defined by the parameter method. If method is set to 'uncalsample' each date is drawn after the fitted model is backcalibrated as a whole and adjusted for a baseline expectation; if it is set to 'calsample' samples are drawn from the fitted model in calendar year then individually back calibrated and recalibrated (the approach of Timpson et al. 2014). For each simulation, both approaches produces n samples, with n equal to the number of bins or number of dates (when bins are not defined). Differences between these two approaches are particularly evident at dates coincident with steeper portions of the calibration curve. If more than one type of calibration curve is associated with the observed dates, at each Monte-Carlo iteration, the function randomly assigns each bin to one of the calibration curves with probability based on the proportion of dates within the bin associated to the specific curves. For example, if a bin is composed of four dates and three are calibrated with 'intcal20' the probability of that particular bin being assigned to 'intcal20' is 0.75.

Value

An object of class SpdModelTest with the following elements

  • result A four column data.frame containing the observed probability density (column PrDens) and the lower and the upper values of the simulation envelope (columns lo and hi) for each calendar year column calBP

  • result.roc A four column data.frame containing the observed rates of change (column roc) and the lower and the upper values of the simulation envelope (columns lo.roc and hi.roc) for the mid point between two chronological blocks calBP

  • sim A matrix containing the simulation results of the summed probabilities. Available only when raw is set to TRUE

  • sim.roc A matrix containing the simulation results of the rate of change of summed probabilities. Available only when raw is set to TRUE

  • pval A numeric vector containing the p-value of the global significance test for the summed probabilities

  • pval.roc A numeric vector containing the p-value of the global significance test for the rates of change

  • fit A data.frame containing the probability densities of the fitted model for each calendar year within the time range of analysis

  • fitobject Fitted model. Not available when model is 'custom'

  • n Number of radiocarbon dates.

  • nbinsNumber of bins.

  • nsimNumber of Monte-Carlo simulations.

  • backsightBacksight size.

Note

  • Windows users might receive a memory allocation error with larger time span of analysis (defined by the parameter timeRange). This can be avoided by increasing the memory limit with the memory.limit function.

  • Users experiencing a Error: cannot allocate vector of size ... error message can increase the memory size using the Sys.setenv(), for example: Sys.setenv("R_MAX_VSIZE" = 16e9).

  • The function currently supports only dates calibrated with 'intcal20',intcal13','intcal13nhpine16','shcal20','shcal13','shcal13shkauri16', 'marine20', and 'marine13'.

References

Crema, E.R. (2022). Statistical inference of prehistoric demography from frequency distributions of radiocarbon dates: a review and a guide for the perplexed. Journal of Archaeological Method and Theory. doi:10.1007/s10816-022-09559-5

Timpson, A., Colledge, S., Crema, E., Edinborough, K., Kerig, T., Manning, K., Thomas, M.G., Shennan, S., (2014). Reconstructing regional population fluctuations in the European Neolithic using radiocarbon dates: a new case-study using an improved method. Journal of Archaeological Science, 52, 549-557. doi:10.1016/j.jas.2014.08.011

Examples

## Example with Younger Dryas period Near East, including site bins
## Not run: 
data(emedyd)
caldates <- calibrate(x=emedyd$CRA, errors=emedyd$Error, normalised=FALSE)
bins <- binPrep(sites=emedyd$SiteName, ages=emedyd$CRA, h=50)
nsim=5 #toy example
expnull <- modelTest(caldates, errors=emedyd$Error, bins=bins, nsim=nsim, runm=50,
timeRange=c(16000,9000), model="exponential", datenormalised=FALSE)
plot(expnull, xlim=c(16000,9000))
round(expnull$pval,4) #p-value
summary(expnull)

## End(Not run)

rcarbon documentation built on Aug. 24, 2023, 5:11 p.m.