OptimizeNetwork: Optimize Observation Network
In jfisher-usgs/ObsNetwork: Optimization of Observation Networks

Description Usage Arguments Details Value Author(s) References See Also Examples

Determine sites to exclude from an existing observation network because they provide little or no beneficial added information. A kriging-based genetic algorithm (GA) is used to solve the multi-objective optimization problem.

OptimizeNetwork(pts, grd, ply, network.nm, nsites, model, formula, nmax = Inf,
                xlim = bbox(grd)[1, ], ylim = bbox(grd)[2, ], grd.fact = 1,
                obj.weights = c(1, 1, 1, 1), penalty.constant = 1E6,
                maxabort = 10, popSize = 50, pcrossover = 0.8, pmutation = 0.1,
                elitism = base::max(1, round(popSize * 0.05)),
                maxiter = 100, run = maxiter, suggestions = NULL, ...)

`pts`	`SpatialPointsDataFrame`; data at observation sites. Required data frame variables include: `site.no`, a unique site number; `var1`, the dependent variable (such as, the water-level elevation); `var1.acy`, the mean measurement error of the dependent variable; `var1.sd`, the standard deviation of the dependent variable. An optional `network.nm` variable may be included to identify a sites observation network(s). Sites belonging to multiple networks are specified using comma separation. Note that duplicate site numbers and (or) spatial coordinates are not permitted.
`grd`	`SpatialGridDataFrame`; interpolation grid. For kriging with external drift (KED) a data frame variable `var2`, the independent variable (such as, land-surface elevation), is required.
`ply`	`SpatialPolygonsDataFrame`; a polygon used to crop the raster grid (optional).
`network.nm`	`character`; vector of observation network names. Only sites belonging to this network will be included in the analysis; this argument is optional, in its absence all sites are assumed to belong to a single network (that is, all sites are used).
`nsites`	`integer`; number of sites to remove from the observation network.
`model`	`variogramModel`; variogram model of dependent variable defined by a call to `vgm`.
`formula`	`formula`; defines the dependent variable as a linear model of the independent variables.
`nmax`	`numeric`; for local kriging, the number of nearest sites that should be used for a kriging prediction or simulation, where nearest is defined in terms of the space of the spatial locations. By default, all sites are used.
`xlim`	`numeric`; vector of length 2 giving left and right limits for the x-axis, used to crop the interpolation grid.
`ylim`	`numeric`; vector of length 2 giving lower and upper limits for the y-axis, used to crop the interpolation grid.
`grd.fact`	`integer`; aggregation factor for the interpolation grid, `grd`.
`obj.weights`	`numeric`; vector of length 4 giving the weights for each objective in the multi-objective optimization problem, see ‘details’ section below.
`penalty.constant`	`numeric`; constant in the penalty function, its value needs to be greater than the largest possible fitness value.
`maxabort`	`integer`; maximum number of times an invalid child chromosome can be aborted during crossover.
`popSize`	`integer`; population size.
`pcrossover`	`numeric`; probability of crossover between pairs of chromosomes.
`pmutation`	`numeric`; probability of mutation in a parent chromosome.
`elitism`	`integer`; number of chromosomes to survive into the next generation. By default is about 5 percent of the population size.
`maxiter`	`integer`; maximum number of iterations to run before the GA search is halted.
`run`	`integer`; number of consecutive generations without any improvement in the “best” fitness value before the GA is stopped.
`suggestions`	`matrix`; initial population.
`...`	additional arguments to be passed to `ga`.

A solution to the multi-objective optimization problem is found by minimizing the aggregate objective function, the weighted linear sum of 4 objectives. The objectives are given as:

Mean standard error at points in interpolation grid.
Root-mean-square error, difference between predicted and measured values, at removed sites.
Mean standard deviation, variability of measurement over time, at removed sites.
Mean measurement error, at remaining sites.

The “best” solution found will depend on the relative values of the weights specified in obj.weights. For example, if a higher weight is specified for the mean standard error, the solution will be one that favors a smaller mean standard error over a small root-mean squared error, mean standard deviation, and mean measurement error. Setting a weight equal to zero will remove an objective from the multi-objective function.

Spatial data is transformed to the map projection and datum of the raster data set in grd.

The optimization problem is solved using a GA with integer chromosomes; site identifiers are represented as binary strings using Gray encoding. The initial population is randomly generated with valid chromosomes; that is, sites are not repeated within a single chromosome. The GA uses linear-rank selection, single-point crossover, and uniform random mutation.

Returns a list with components:

`call`	`character`; function call with all specified arguments.
`pts.rm`	`SpatialPointsDataFrame`; a subset of `pts` with row length equal to `nsites`. Includes data records for those sites identified by the GA for removal from the observation network(s).
`is.net`	`logical`; vector of length equal to the number of rows in `pts` specifying sites belonging to the reduced network.
`is.rm`	`logical`; vector of length equal to the number of rows in `pts` specifying sites to remove from the network.
`obj.values`	`matrix`; objective values at each iteration of the GA. This matrix has `maxiter` rows and 5 columns (that is, the 4 objective values and their sum).
`niter`	`integer`; number of completed iterations.
`nrep.ans`	`integer`; number of iterations the “best” solution was repeated.
`proc.time`	`proc_time`; CPU time for running the GA, in seconds.
`ncalls.penalty`	`integer`; vector giving the number of calls to the penalty function at each iteration of the GA.
`kr`	`SpatialGridDataFrame`; a data frame containing the coordinates of `grd` cropped to the axis limits and polygon. Data attributes based on block kriging of the reduced network include: predictions, `var1.pred`; prediction variances, `var1.var`; and prediction standard errors, `var1.se`. Differences between the original and reduced network predictions are specified in the `var1.diff` attribute.
`rmsd`	`numeric`; root-mean-square-deviation between the kriged surfaces using the original and reduced networks.
`local.error`	`numeric`; percent local error between the kriged surfaces using the original and reduced networks.
`obj.space`	`matrix`; range of objective values in solution space.
`ga.ans`	`ga`; returned value of `ga`.
`start.time`	`POSIXct`; system time at start of GA run.
`diff.time`	`numeric`; vector of time differences since start of GA run, specified for each iteration in the GA, in hours.

The status of the objective values after each iteration of the GA is plotted.

J.C. Fisher

Fisher, J.C., 2013, Optimization of water-level monitoring networks in the eastern Snake River Plain aquifer using a kriging-based genetic algorithm method: U.S. Geological Survey Scientific Investigations Report 2013-5120 (DOE/ID-22224), 74 p., https://pubs.usgs.gov/sir/2013/5120/.

Scrucca, Luca, 2013, GA: a package for genetic algorithms in R, Journal of Statistical Software, v. 53, no. 4, 37 p., https://www.jstatsoft.org/article/view/v053i04.

WriteGAResults, krige, ga

data(ESRP_NED)
data(ESRP_NWIS)
data(ESRP_Boundary)

# Formula and variogram
fo <- var1 ~ x + y
model <- vgm(psill = 1948.533, model = "Sph", nugget = 0, range = 153891.038)

# Optimize combined "State" and "INL" networks
xlim <- c(10000, 328000)
ylim <- c(81200, 335700)
ans <- OptimizeNetwork(ESRP_NWIS, ESRP_NED, ESRP_Boundary,
                       network.nm = c("State", "INL"), nsites = 20,
                       model = model, formula = fo, grd.fact = 5,
                       obj.weights = c(100, 1, 1, 1), popSize = 20,
                       maxiter = 3)
PlotRaster(ans$kr, "var1.pred", ESRP_NWIS,
           pal = colorRampPalette(c("#F02311", "#F7FDFA", "#107FC9")),
           main = "Predictions", net.idxs = which(ans$is.net),
           rm.idxs = which(ans$is.rm), xlim = xlim, ylim = ylim)
PlotRaster(ans$kr, "var1.se", ESRP_NWIS,
           pal = terrain.colors, main = "Standard errors",
           net.idxs = which(ans$is.net), rm.idxs = which(ans$is.rm),
           xlim = xlim, ylim = ylim)

# Optimize "INL" network
xlim <- c(178000, 257500)
ylim <- c(202000, 272000)
ans <- OptimizeNetwork(ESRP_NWIS, ESRP_NED, ESRP_Boundary,
                       network.nm = "INL", nsites = 20, model = model,
                       formula = fo, xlim = xlim, ylim = ylim, grd.fact = 5,
                       obj.weights = c(100, 1, 1, 1),  maxiter = 3)
PlotRaster(ans$kr, "var1.diff", ESRP_NWIS, pal = GA::jet.colors,
           main = "Prediction Differences", net.idxs = which(ans$is.net),
           rm.idxs = which(ans$is.rm), xlim = xlim, ylim = ylim)

# Restart GA using previous "best" solution
ans <- OptimizeNetwork(ESRP_NWIS, ESRP_NED, ESRP_Boundary,
                       network.nm = "INL", nsites = 20, model = model,
                       formula = fo, xlim = xlim, ylim = ylim,
                       grd.fact = 5, obj.weights = c(100, 1, 1, 1),
                       maxiter = 3, suggestions = ans$ga.ans@population)