fauxcurrence: Simulate species occurrence data

View source: R/fauxcurrence.R

fauxcurrenceR Documentation

Simulate species occurrence data

Description

This function generates a set of randomised occurrences (long & lat coordinates) for one or more species with a spatial structure (within- and between-species distances) matching that of an observed set of coordinates.

Usage

fauxcurrence(
  coords,
  rast,
  distmat = NULL,
  use.distmat = FALSE,
  inter.spp = FALSE,
  sep.inter.spp = FALSE,
  fix.seed.pts = NULL,
  div.n.flat = 1000,
  allow.ident.conspec = FALSE,
  dist_meth = "distRcpp",
  dist_fun = "Haversine",
  trim.init.range = FALSE,
  init.range = 0.9,
  stg.1.only = FALSE,
  stg.2.only = FALSE,
  new.pt.meth.stg.2 = "dist.dens",
  new.pt.meth.stg.3 = "dist.dens",
  switch.n = 1,
  iter.max.stg1 = 10000,
  iter.max.stg2 = 2000,
  iter.max.stg3 = 1e+05,
  div.int = 10,
  break.num = 20,
  ret.seed.pts = FALSE,
  ret.all.iter = FALSE,
  logfile = NULL,
  verbose = TRUE
)

Arguments

coords

A dataframe of observed occurrences. There should be columns for longitude and latitude, named "x" and "y" respectively, and an optional third column named "species" with species identities - required only if there is more than one species.

rast

A raster object defining the extent of the study area. Also used for distance matrix calculation if use.distmat is TRUE but distmat is not provided (see ?make.distmat for more details).

distmat

A matrix of distances between all non-NA cells of the raster in order, as can be created with make.distmat(). Used if use.distmat == TRUE.

use.distmat

Logical indicating whether a precomputed distance matrix (produced with make.distmat()) should be used for inter-point distances. If TRUE, and distmat is not provided, one is created with make.distmat(). This is usually faster than use.distmat==FALSE, but the RAM requirements for larger rasters can make it computationally inviable.

inter.spp

Logical indicating whether interspecific distances should be used in addition to intraspecific distances

sep.inter.spp

Logical indicating whether interspecific distances should be seperated into pairwise species combinations. If FALSE, there is one interspecific distance measure for each species which is the distance from that species' points to all heterospecific points, whereas if TRUE there is a separate distance measure for each pairwise combination of species. Ignored if inter.spp is FALSE.

fix.seed.pts

A dataframe of points that will be used as the first point for each species and will not be moved during stage 3. This should be in the same format as coords with one point for each species.

div.n.flat

Integer specifying the number of iterations for which divergence values should remain unchanged before stopping iterations in the the stage 3 iterative improvement procedure. Should be high enough for divergence to minimise.

allow.ident.conspec

Logical indicating whether conspecific points in the same place should be allowed. This should match the input data i.e. if the input is "presence only" data, it should be set to FALSE but if it is raw occurrence data with multiple conspecific points in the same locations, it should be set to TRUE.

dist_meth

A string indicating the distance method to use. Should be either "distRcpp", "distm" or "costdist" ("costdist" is only implemented when use.distmat is TRUE).

dist_fun

A string indicating the distance function to be used for distance calculations by geosphere::distm (if dist_meth is "distm") or distRcpp::dist_mtom (if dist_methis "distRcpp"). For dist_meth=="distRcpp", this should either be "Haversine" or "Vincenty". For dist_meth=="distm", it can be the name of any loaded function which takes the same input and produces the same output as geosphere::distHaversine.

trim.init.range

Logical indicating whether the initial points generated in stage 1 should be within the central init.range quantile of the observed distribution rather than just the total range. This can speed up stage 2, especially if many species are present.

init.range

Numeric between 0 and 1 defining the central range to constrain initial points to if trim.init.range is TRUE.

stg.1.only

Produce only one point per species and skip full initial point set generation and iterative improvement.

stg.2.only

Produce only the full initial set of points and skip iterative improvement.

new.pt.meth.stg.2

String indicating the method of sampling a new point in stage 2. Either "sample", which samples a random point from the raster; "dist.obs" which randomly chooses an existing point and places a new point of the same species D distance away, where D is sampled from the observed intraspecific distances for that species; or "dist.dens" (recommended) which is similar to dist.obs but D is sampled from a density object constructed from the observed intraspecific distances for that species.

new.pt.meth.stg.3

As with new.pt.meth.stg.2 but for stage 3.

switch.n

An integer defining the number of changes which are made in each iteration of the the stage 3 iterative improvement procedure. Recommendation is to leave at 1.

iter.max.stg1

The maximum number of iterations with no improvement during stage 1 before initial point generation is restarted.

iter.max.stg2

The maximum number of iterations with no improvement during stage 2 before initial point generation is restarted.

iter.max.stg3

The maximum number of iterations for the the stage 3 iterative improvement procedure before the algorithm is stopped if div.n.flat has not been reached.

div.int

Integer indicating the interval at which to record divergence values during the stage 3 iterative improvement procedure, so that the trace can be examined to determine if it has minimised.

break.num

Integer specifying the number of breaks (to delineate bins in a histogram) for the KL calculation. Defaults to 20, but this is arbitrary.

ret.seed.pts

Logical indicating whether to output the set of 'seed points' produced in stage 2. Useful for illustrating the advantage of the stage 3 iterative improvement procedure.

ret.all.iter

Logical indicating whether to retain the points from every iteration of stage 3. Can be used to examine the stage 3 iterative improvement procedure in detail.

logfile

A string with a filename to output progress information to, if NULL it is printed to the console. Useful if running multiple runs in parallel to prevent progress info from multiple runs being mixed.

verbose

Logical indicating whether to print progress information.

Details

Within-species distances are always used. If inter.spp is TRUE, it also uses interspecies distances, and if sep.inter.spp is TRUE it separates interspecific distances into all pairwise sets of species. Thus, occurrences for each species can be simulated independently (inter.spp=FALSE, sep.inter.spp=FALSE), general distance to heterospecifics can be taken into account (inter.spp=TRUE, sep.inter.spp=FALSE), or distance relationships between individual pairs of species can be preserved (inter.spp=TRUE, sep.inter.spp=TRUE).

The algorithm starts by generating an initial set of randomised occurrences which fit within the ranges of observed interpoint distances. There are two stages to this process. In stage 1, it places one point for each species on the map provided with rast and, if inter.spp is TRUE, checks that the distances between species are within the observed range (or within the central init.range quantile of the real distribution if trim.init.range is TRUE). If interspecific distances are outside the real distribution (or init.range), it iteratively improves the initial points by randomly replacing a point and rechecking the distances until they are within the desired range. If fix.seed.pts is provided, stage 1 is skipped and these points are used instead. The points in fix.seed.points are not replaced in later stages of the algorithm, so they can be used to define the rough centroids of the species distributions. In stage 2, more points are added (using the method set by new.pt.meth.stg.2) until the correct number of points for each species is reached. As with stage 1, after each point is added, distances are checked to ensure they are within the observed range of distances. To stop stages 1 and 2 becoming stuck with a previous set of points which make step-wise iterative improvement impossible, iter.max.stg1 and iter.max.stg2 set an upper limit on the number of iterations without improvement. If these are reached, initial point generation is restarted.

The algorithm then implements an iterative improvement procedure to improve the similarity of the spatial structure between the observed and simulated points. For each iteration, one (or several if switch.n > 1) change(s) is made to the points. This consists of randomly replacing a point (using the method set by new.pt.meth.stg.3) and, if the change improves the match of the spatial structure between the null and observed points, it is kept. If the match is not improved, it is discarded and the original is retained. This procedure is repeated a for a user-defined number of iterations until either no improvement has been made for div.n.flat iterations or the total number of iterations has reached iter.max.stg3. The match between spatial structures is evaluated using the probability distribution of the interpoint distances. The comparison is made using the discrete version of the Kullback-Leibler divergence, where smaller values indicate a better match between the simulated and observed points. Since there are multiple interpoint distance distributions (i.e. intra- and inter-specific distances for multiple species or pairs of species), a weighted mean of divergence statistics across all distance distributions is used, weighted such that interpoint and intrapoint distances contribute equally. The divergence statistic should be plotted against number of iterations to ensure the statistic has been minimised (i.e. the curve has gone flat), a basic ASCII plot showing N iterations vs divergence is printed to the console (or to a log file if logfile!=NULL) if verbose == TRUE.

Value

Returns a list with the following elements:

  • points: the simulated points

  • div.vecs: a list containing vectors of the divergence metric sampled every div.int iterations. There is one for each distance measure (i.e. intraspecific distances for each species as well as interspecific distances if inter.spp == TRUE) and weighted mean divergence across all distance measures.

  • dist.obs: a list containing vectors of interpoint distances for each distance measure for the observed points

  • dist.sim: a list containing vectors of interpoint distances for each distance measure for the simulated points, can be compared to dist.obs

  • seed.pts: the seed points produced in stage 2, only outputted if ret.seed.pts == TRUE

  • pts.progress: a list containing every iteration of points from stage 3. Only outputted if ret.all.iter == TRUE

  • Nflat.vec: a vector containing Nflat - the number of iterations since an improvement has been made - for every iteration of stage 3. Only outputted if ret.all.iter == TRUE

Examples

## Not run: 
# intraspecific distances only
my.sim <- fauxcurrence(coords=my.coords,rast=my.raster)
# intraspecific distances and interspecific distances
my.sim <- fauxcurrence(coords=my.coords,rast=my.raster,inter.spp=TRUE)

## End(Not run)

ogosborne/fauxcurrence documentation built on April 15, 2022, 10:19 a.m.