optimSPAN: Optimization of sample configurations for variogram and...
In Laboratorio-de-Pedometria/spsann-package: Optimization of Spatial Samples via Simulated Annealing

optimSPAN

R Documentation

Optimization of sample configurations for variogram and spatial trend identification and estimation, and for spatial interpolation

Description

Optimize a sample configuration for variogram and spatial trend identification and estimation, and for spatial interpolation. An utility function U is defined so that the sample points cover, extend over, spread over, SPAN the feature, variogram and geographic spaces. The utility function is obtained aggregating four objective functions: CORR, DIST, PPL, and MSSD.

Usage

optimSPAN(
  points,
  candi,
  covars,
  strata.type = "area",
  use.coords = FALSE,
  lags = 7,
  lags.type = "exponential",
  lags.base = 2,
  cutoff,
  criterion = "distribution",
  distri,
  pairs = FALSE,
  schedule,
  plotit = FALSE,
  track = FALSE,
  boundary,
  progress = "txt",
  verbose = FALSE,
  weights,
  nadir = list(sim = NULL, seeds = NULL, user = NULL, abs = NULL),
  utopia = list(user = NULL, abs = NULL)
)

objSPAN(
  points,
  candi,
  covars,
  strata.type = "area",
  use.coords = FALSE,
  lags = 7,
  lags.type = "exponential",
  lags.base = 2,
  cutoff,
  criterion = "distribution",
  distri,
  pairs = FALSE,
  x.max,
  x.min,
  y.max,
  y.min,
  weights,
  nadir = list(sim = NULL, seeds = NULL, user = NULL, abs = NULL),
  utopia = list(user = NULL, abs = NULL)
)

Arguments

`points`	Integer value, integer vector, data frame (or matrix), or list. The number of sampling points (sample size) or the starting sample configuration. Four options are available: Integer value. The required number of sampling points (sample size). The sample configuration used to start the optimization will consist of grid cell centres of `candi` selected using simple random sampling, i.e. `base::sample()` with `x = 1:nrow(candi)` and `size = points`. Integer vector. A set of row indexes between one (1) and `nrow(candi)`. These row indexes identify the grid cell centres of `candi` that will form the starting sample configuration for the optimization. The length of the integer vector, `length(points)`, is the sample size. Data frame (or matrix). The Cartesian x- and y-coordinates (in this order) of the starting sample configuration. List. An object with two named sub-arguments: `fixed` An integer vector or data frame (or matrix) specifying an existing sample configuration (see options above). This sample configuration is kept as-is (fixed) during the optimization and is used only to compute the objective function values. `free` An integer value, integer vector, data frame or matrix (see options above) specifying the (number of) sampling points to add to the existing sample configuration. These new sampling points are free to be moved around (jittered) during the optimization. Most users will want to set an integer value simply specifying the required sample size. Using an integer vector or data frame (or matrix) will generally be helpful to users willing to evaluate starting sample configurations, test strategies to speed up the optimization, and fine-tune or thin an existing sample configuration. Users interested in augmenting a possibly existing real-world sample configuration or fine-tuning only a subset of the existing sampling points will want to use a list.
`candi`	Data frame (or matrix). The Cartesian x- and y-coordinates (in this order) of the cell centres of a spatially exhaustive, rectangular grid covering the entire spatial sampling domain. The spatial sampling domain can be contiguous or composed of disjoint areas and contain holes and islands. `candi` provides the set of (finite) candidate locations inside the spatial sampling domain for a point jittered during the optimization. Usually, `candi` will match the geometry of the spatial grid containing the prediction locations, e.g. `newdata` in `gstat::krige()`, `object` in `raster::predict()`, and `locations` in `geoR::krige.conv()`.
`covars`	Data frame or matrix with the spatially exhaustive covariates in the columns.
`strata.type`	(Optional) Character value setting the type of stratification that should be used to create the marginal sampling strata (or factor levels) for the numerical covariates. Two options are available: `"area"` (Default) Equal-area marginal sampling strata. `"range"` Equal-range marginal sampling strata. The first option (`"area"`) is equivalent to drawing the frequency histogram of the numerical covariates with bins of variable width but equal area. The second, however, would result in a frequency histogram with bins of equal width but variable area such as when using `graphics::hist()` with its default options. Strata of equal area will include virtually the same number of individual covariate grid cells per stratum, while equal-range strata aim for the same number of unique covariate values in each stratum.
`use.coords`	(Optional) Logical value. Should the projected spatial x- and y-coordinates be used as spatially exhaustive covariates? Defaults to `use.coords = FALSE`.
`lags`	Integer value, the number of lag-distance classes. Alternatively, a vector of numeric values with the lower and upper bounds of each lag-distance class, the lowest value being larger than zero (e.g. 0.0001). Defaults to `lags = 7`.
`lags.type`	Character value, the type of lag-distance classes, with options `"equidistant"` and `"exponential"`. Defaults to `lags.type = "exponential"`.
`lags.base`	Numeric value, base of the exponential expression used to create exponentially spaced lag-distance classes. Used only when `lags.type = "exponential"`. Defaults to `lags.base = 2`.
`cutoff`	Numeric value, the maximum distance up to which lag-distance classes are created. Used only when `lags` is an integer value. If missing, it is set to be equal to the length of the diagonal of the rectangle with sides `x.max` and `y.max` as defined in `scheduleSPSANN()`.
`criterion`	Character value, the feature used to describe the energy state of the system configuration, with options `"minimum"` and `"distribution"`. Defaults to `objective = "distribution"`.
`distri`	Numeric vector, the distribution of points or point-pairs per lag-distance class that should be attained at the end of the optimization. Used only when `criterion = "distribution"`. Defaults to a uniform distribution.
`pairs`	Logical value. Should the sample configuration be optimized regarding the number of point-pairs per lag-distance class? Defaults to `pairs = FALSE`.
`schedule`	List with named sub-arguments setting the control parameters of the annealing schedule. See `scheduleSPSANN()`.
`plotit`	(Optional) Logical for plotting the evolution of the optimization. Plot updates occur at each ten (10) spatial jitters. Defaults to `plotit = FALSE`. The plot includes two panels: The first panel depicts the changes in the objective function value (y-axis) with the annealing schedule (x-axis). The objective function values should be high and variable at the beginning of the optimization (panel's top left). As the optimization proceeds, the objective function values should gradually transition to a monotone decreasing behaviour till they become virtually constant. The objective function values constancy suggests the end of the optimization (panel's bottom right). The second panel shows the starting (grey circles) and current spatial sample configuration (black dots). Black crosses indicate the fixed (existing) sampling points when a spatial sample configuration is augmented. The plot shows the starting sample configuration to assess the effects on the optimized spatial sample configuration: the latter generally should be independent of the first. The second panel also shows the maximum possible spatial jitter applied to a sampling point in the Cartesian x- (x-axis) and y-coordinates (y-axis).
`track`	(Optional) Logical value. Should the evolution of the energy state be recorded and returned along with the result? If `track = FALSE` (the default), only the starting and ending energy states return along with the results.
`boundary`	(Optional) An object of class SpatialPolygons (see sp::SpatialPolygons()) with the outer and inner limits of the spatial sampling domain (see `candi`). These SpatialPolygons help depict the spatial distribution of the (starting and current) sample configuration inside the spatial sampling domain. The outer limits of `candi` serve as a rough `boundary` when `plotit = TRUE`, but the SpatialPolygons are missing.
`progress`	(Optional) Type of progress bar that should be used, with options `"txt"`, for a text progress bar in the R console, `"tk"`, to put up a Tk progress bar widget, and `NULL` to omit the progress bar. A Tk progress bar widget is useful when using parallel processors. Defaults to `progress = "txt"`.
`verbose`	(Optional) Logical for printing messages about the progress of the optimization. Defaults to `verbose = FALSE`.
`weights`	List with named sub-arguments. The weights assigned to each one of the objective functions that form the multi-objective combinatorial optimization problem. They must be named after the respective objective function to which they apply. The weights must be equal to or larger than 0 and sum to 1.
`nadir`	List with named sub-arguments. Three options are available: `sim`: the number of simulations that should be used to estimate the nadir point, and `seeds` vector defining the random seeds for each simulation; `user`: a list of user-defined nadir values named after the respective objective functions to which they apply; `abs`: logical for calculating the nadir point internally (experimental).
`utopia`	List with named sub-arguments. Two options are available: `user`: a list of user-defined values named after the respective objective functions to which they apply; `abs`: logical for calculating the utopia point internally (experimental).
`x.max, x.min, y.max, y.min`	Numeric value defining the minimum and maximum quantity of random noise to be added to the projected x- and y-coordinates. The minimum quantity should be equal to, at least, the minimum distance between two neighbouring candidate locations. The units are the same as of the projected x- and y-coordinates. If missing, they are estimated from `candi`.

Details

The help page of minmaxPareto() contains details on how spsann solves the multi-objective combinatorial optimization problem of finding a globally optimum sample configuration that meets multiple, possibly conflicting, sampling objectives.

Generating mechanism

There are multiple mechanism to generate a new sample configuration out of an existing one. The main step consists of randomly perturbing the coordinates of a single sample, a process known as ‘jittering’. These mechanisms can be classified based on how the set of candidate locations for the samples is defined. For example, one could use an infinite set of candidate locations, that is, any location in the spatial domain can be selected as a new sample location after a sample is jittered. All that is needed is a polygon indicating the boundary of the spatial domain. This method is more computationally demanding because every time an existing sample is jittered, it is necessary to check if the new sample location falls in spatial domain.

Another approach consists of using a finite set of candidate locations for the samples. A finite set of candidate locations is created by discretising the spatial domain, that is, creating a fine (regular) grid of points that serve as candidate locations for the jittered sample. This is a less computationally demanding jittering method because, by definition, the new sample location will always fall in the spatial domain.

Using a finite set of candidate locations has two important inconveniences. First, not all locations in the spatial domain can be selected as the new location for a jittered sample. Second, when a sample is jittered, it may be that the new location already is occupied by another sample. If this happens, another location has to be iteratively sought for, say, as many times as the size of the sample configuration. In general, the larger the size of the sample configuration, the more likely it is that the new location already is occupied by another sample. If a solution is not found in a reasonable time, the the sample selected to be jittered is kept in its original location. Such a procedure clearly is suboptimal.

spsann uses a more elegant method which is based on using a finite set of candidate locations coupled with a form of two-stage random sampling as implemented in spcosa::spsample(). Because the candidate locations are placed on a finite regular grid, they can be taken as the centre nodes of a finite set of grid cells (or pixels of a raster image). In the first stage, one of the “grid cells” is selected with replacement, i.e. independently of already being occupied by another sample. The new location for the sample chosen to be jittered is selected within that “grid cell” by simple random sampling. This method guarantees that virtually any location in the spatial domain can be selected. It also discards the need to check if the new location already is occupied by another sample, speeding up the computations when compared to the first two approaches.

Visit the help pages of optimCORR, optimDIST, optimPPL, and optimMSSD to see the details of the objective functions that compose SPAN.

Value

optimSPAN returns an object of class OptimizedSampleConfiguration: the optimized sample configuration with details about the optimization.

objSPAN returns a numeric value: the energy state of the sample configuration – the objective function value.

Note

Distance between two points

spsann always computes the distance between two locations (points) as the Euclidean distance between them. This computation requires the optimization to operate in the two-dimensional Euclidean space, i.e. the coordinates of the sample, candidate and evaluation locations must be Cartesian coordinates, generally in metres or kilometres. spsann has no mechanism to check if the coordinates are Cartesian: you are the sole responsible for making sure that this requirement is attained.

Author(s)

Alessandro Samuel-Rosa alessandrosamuelrosa@gmail.com

Examples

#####################################################################
# NOTE: The settings below are unlikely to meet your needs.         #
#####################################################################
## Not run: 
# This example takes more than 5 seconds to run!
require(sp)
data(meuse.grid)
candi <- meuse.grid[, 1:2]
nadir <- list(sim = 10, seeds = 1:10)
utopia <- list(user = list(DIST = 0, CORR = 0, PPL = 0, MSSD = 0))
covars <- meuse.grid[, 5]
schedule <- scheduleSPSANN(chains = 1, initial.temperature = 1,
                           x.max = 1540, y.max = 2060, x.min = 0, 
                           y.min = 0, cellsize = 40)
weights <- list(CORR = 1/6, DIST = 1/6, PPL = 1/3, MSSD = 1/3)
set.seed(2001)
res <- optimSPAN(
  points = 10, candi = candi, covars = covars, nadir = nadir, weights = weights,
    use.coords = TRUE, utopia = utopia, schedule = schedule)
objSPSANN(res) -
  objSPAN(points = res, candi = candi, covars = covars, nadir = nadir,
            use.coords = TRUE, utopia = utopia, weights = weights)

## End(Not run)

Laboratorio-de-Pedometria/spsann-package documentation built on Nov. 2, 2023, 3:14 p.m.

Laboratorio-de-Pedometria/spsann-package index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

Laboratorio-de-Pedometria/spsann-package
Optimization of Spatial Samples via Simulated Annealing

optimSPAN: Optimization of sample configurations for variogram and...
In Laboratorio-de-Pedometria/spsann-package: Optimization of Spatial Samples via Simulated Annealing

Optimization of sample configurations for variogram and spatial trend identification and estimation, and for spatial interpolation

Description

Usage

Arguments

Details

Generating mechanism

Value

Note

Distance between two points

Author(s)

See Also

Examples

Related to optimSPAN in Laboratorio-de-Pedometria/spsann-package...

R Package Documentation

Browse R Packages

We want your feedback!

Laboratorio-de-Pedometria/spsann-package Optimization of Spatial Samples via Simulated Annealing

optimSPAN: Optimization of sample configurations for variogram and... In Laboratorio-de-Pedometria/spsann-package: Optimization of Spatial Samples via Simulated Annealing

Optimization of sample configurations for variogram and spatial trend identification and estimation, and for spatial interpolation

Description

Usage

Arguments

Details

Generating mechanism

Value

Note

Distance between two points

Author(s)

See Also

Examples

Related to optimSPAN in Laboratorio-de-Pedometria/spsann-package...

R Package Documentation

Browse R Packages

We want your feedback!

Laboratorio-de-Pedometria/spsann-package
Optimization of Spatial Samples via Simulated Annealing

optimSPAN: Optimization of sample configurations for variogram and...
In Laboratorio-de-Pedometria/spsann-package: Optimization of Spatial Samples via Simulated Annealing