optimPPL: Optimization of sample configurations for variogram...
In spsann: Optimization of Sample Configurations using Spatial Simulated Annealing

Description Usage Arguments Details Value Note Author(s) References Examples

Optimize a sample configuration for variogram identification and estimation. A criterion is defined so that the optimized sample configuration has a given number of points or point-pairs contributing to each lag-distance class (PPL).

optimPPL(points, candi, lags = 7, lags.type = "exponential",
  lags.base = 2, cutoff, criterion = "distribution", distri,
  pairs = FALSE, schedule = scheduleSPSANN(), plotit = FALSE,
  track = FALSE, boundary, progress = "txt", verbose = FALSE)

objPPL(points, candi, lags = 7, lags.type = "exponential",
  lags.base = 2, cutoff, distri, criterion = "distribution",
  pairs = FALSE, x.max, x.min, y.max, y.min)

countPPL(points, candi, lags = 7, lags.type = "exponential",
  lags.base = 2, cutoff, pairs = FALSE, x.max, x.min, y.max, y.min)

`points`	Integer value, integer vector, data frame or matrix, or list. Integer value. The number of points. These points will be randomly sampled from `candi` to form the starting sample configuration. Integer vector. The row indexes of `candi` that correspond to the points that form the starting sample configuration. The length of the vector defines the number of points. Data frame or matrix. An object with three columns in the following order: `[, "id"]`, the row indexes of `candi` that correspond to each point, `[, "x"]`, the projected x-coordinates, and `[, "y"]`, the projected y-coordinates. List. An object with two named sub-arguments: `fixed`, a data frame or matrix with the projected x- and y-coordinates of the existing sample configuration – kept fixed during the optimization –, and `free`, an integer value defining the number of points that should be added to the existing sample configuration – free to move during the optimization.
`candi`	Data frame or matrix with the candidate locations for the jittered points. `candi` must have two columns in the following order: `[, "x"]`, the projected x-coordinates, and `[, "y"]`, the projected y-coordinates.
`lags`	Integer value, the number of lag-distance classes. Alternatively, a vector of numeric values with the lower and upper bounds of each lag-distance class, the lowest value being larger than zero (e.g. 0.0001). Defaults to `lags = 7`.
`lags.type`	Character value, the type of lag-distance classes, with options `"equidistant"` and `"exponential"`. Defaults to `lags.type = "exponential"`.
`lags.base`	Numeric value, base of the exponential expression used to create exponentially spaced lag-distance classes. Used only when `lags.type = "exponential"`. Defaults to `lags.base = 2`.
`cutoff`	Numeric value, the maximum distance up to which lag-distance classes are created. Used only when `lags` is an integer value. If missing, it is set to be equal to the length of the diagonal of the rectangle with sides `x.max` and `y.max` as defined in `scheduleSPSANN`.
`criterion`	Character value, the feature used to describe the energy state of the system configuration, with options `"minimum"` and `"distribution"`. Defaults to `objective = "distribution"`.
`distri`	Numeric vector, the distribution of points or point-pairs per lag-distance class that should be attained at the end of the optimization. Used only when `criterion = "distribution"`. Defaults to a uniform distribution.
`pairs`	Logical value. Should the sample configuration be optimized regarding the number of point-pairs per lag-distance class? Defaults to `pairs = FALSE`.
`schedule`	List with 11 named sub-arguments defining the control parameters of the cooling schedule. See `scheduleSPSANN`.
`plotit`	(Optional) Logical for plotting the optimization results, including a) the progress of the objective function, and b) the starting (gray circles) and current sample configuration (black dots), and the maximum jitter in the x- and y-coordinates. The plots are updated at each 10 jitters. When adding points to an existing sample configuration, fixed points are indicated using black crosses. Defaults to `plotit = FALSE`.
`track`	(Optional) Logical value. Should the evolution of the energy state be recorded and returned along with the result? If `track = FALSE` (the default), only the starting and ending energy states are returned along with the results.
`boundary`	(Optional) SpatialPolygon defining the boundary of the spatial domain. If missing and `plotit = TRUE`, `boundary` is estimated from `candi`.
`progress`	(Optional) Type of progress bar that should be used, with options `"txt"`, for a text progress bar in the R console, `"tk"`, to put up a Tk progress bar widget, and `NULL` to omit the progress bar. A Tk progress bar widget is useful when using parallel processors. Defaults to `progress = "txt"`.
`verbose`	(Optional) Logical for printing messages about the progress of the optimization. Defaults to `verbose = FALSE`.
`x.max`	Numeric value defining the minimum and maximum quantity of random noise to be added to the projected x- and y-coordinates. The minimum quantity should be equal to, at least, the minimum distance between two neighbouring candidate locations. The units are the same as of the projected x- and y-coordinates. If missing, they are estimated from `candi`.
`x.min`	Numeric value defining the minimum and maximum quantity of random noise to be added to the projected x- and y-coordinates. The minimum quantity should be equal to, at least, the minimum distance between two neighbouring candidate locations. The units are the same as of the projected x- and y-coordinates. If missing, they are estimated from `candi`.
`y.max`	Numeric value defining the minimum and maximum quantity of random noise to be added to the projected x- and y-coordinates. The minimum quantity should be equal to, at least, the minimum distance between two neighbouring candidate locations. The units are the same as of the projected x- and y-coordinates. If missing, they are estimated from `candi`.
`y.min`	Numeric value defining the minimum and maximum quantity of random noise to be added to the projected x- and y-coordinates. The minimum quantity should be equal to, at least, the minimum distance between two neighbouring candidate locations. The units are the same as of the projected x- and y-coordinates. If missing, they are estimated from `candi`.

Details about the mechanism used to generate a new sample configuration out of the current sample configuration by randomly perturbing the coordinates of a sample point are available in the help page of spJitter.

Lag-distance classes

Two types of lag-distance classes can be created by default. The first are evenly spaced lags (lags.type = "equidistant"). They are created by simply dividing the distance interval from 0.0001 to cutoff by the required number of lags. The minimum value of 0.0001 guarantees that a point does not form a pair with itself. The second type of lags is defined by exponential spacings (lags.type = "exponential"). The spacings are defined by the base b of the exponential expression b^n, where n is the required number of lags. The base is defined using the argument lags.base. See vgmLags for other details.

Using the default uniform distribution means that the number of point-pairs per lag-distance class (pairs = TRUE) is equal to n \times (n - 1) / (2 \times lag), where n is the total number of points and lag is the number of lags. If pairs = FALSE, then it means that the number of points per lag is equal to the total number of points. This is the same as expecting that each point contributes to every lag. Distributions other than the available options can be easily implemented changing the arguments lags and distri.

There are two optimizing criteria implemented. The first is called using criterion = "distribution" and is used to minimize the sum of the absolute differences between a pre-specified distribution and the observed distribution of points or point-pairs per lag-distance class. The second criterion is called using criterion = "minimum". It corresponds to maximizing the minimum number of points or point-pairs observed over all lag-distance classes.

optimPPL returns an object of class OptimizedSampleConfiguration: the optimized sample configuration with details about the optimization.

objPPL returns a numeric value: the energy state of the sample configuration – the objective function value.

countPPL returns a data.frame with three columns: a) the lower and b) upper limits of each lag-distance class, and c) the number of points or point-pairs per lag-distance class.

The distance between two points is computed as the Euclidean distance between them. This computation assumes that the optimization is operating in the two-dimensional Euclidean space, i.e. the coordinates of the sample points and candidate locations should not be provided as latitude/longitude. spsann has no mechanism to check if the coordinates are projected: the user is responsible for making sure that this requirement is attained.

Alessandro Samuel-Rosa alessandrosamuelrosa@gmail.com

Bresler, E.; Green, R. E. Soil parameters and sampling scheme for characterizing soil hydraulic properties of a watershed. Honolulu: University of Hawaii at Manoa, p. 42, 1982.

Pettitt, A. N.; McBratney, A. B. Sampling designs for estimating spatial variance components. Applied Statistics. v. 42, p. 185, 1993.

Russo, D. Design of an optimal sampling network for estimating the variogram. Soil Science Society of America Journal. v. 48, p. 708-716, 1984.

Truong, P. N.; Heuvelink, G. B. M.; Gosling, J. P. Web-based tool for expert elicitation of the variogram. Computers and Geosciences. v. 51, p. 390-399, 2013.

Warrick, A. W.; Myers, D. E. Optimization of sampling locations for variogram calculations. Water Resources Research. v. 23, p. 496-500, 1987.

## Not run: 
# This example takes more than 5 seconds
require(sp)
data(meuse.grid)
candi <- meuse.grid[, 1:2]
schedule <- scheduleSPSANN(chains = 1, initial.temperature = 30,
                           x.max = 1540, y.max = 2060, x.min = 0, 
                           y.min = 0, cellsize = 40)
set.seed(2001)
res <- optimPPL(points = 10, candi = candi, schedule = schedule)
objSPSANN(res) - objPPL(points = res, candi = candi)
countPPL(points = res, candi = candi)

## End(Not run)