CompareOneToOne-function: Compare a pair of spatial histograms
In SpatialKWD: Spatial KWD for Large Spatial Maps

CompareOneToOne-function

R Documentation

Compare a pair of spatial histograms

Description

This function computes the Kantorovich-Wasserstein between a pair of spatial histograms defined over the same grid map.

The grid map is described by the two lists of N coordinates Xs and Ys, which specify the coordinates of the centroid of each tile of the map. For each tile i with coordinates Xs[i], Ys[i], we have the two lists of weights, one for the first histograms and the other for the second histogram.

The two lists of coordinates are passed to compareOneToOne as a matrix with N rows and two columns. The two lists of weights are passed as a matrix with N rows and two columns, a column for each histogram.

Usage

compareOneToOne(Coordinates, Weights, L = 3, recode = TRUE,
           method = "approx",    algorithm = "colgen",
           model="mincostflow",  verbosity = "silent",
           timelimit = 14400,    opt_tolerance = 1e-06,
           unbalanced = FALSE, unbal_cost = 1e+09, convex = TRUE)

Arguments

`Coordinates`	A `Matrix` with `N` rows and two columns: `Coordinates[,1]`: (First Column) Vector of horizontal coordinates of the centroids of each tile of the map (`Xs`). Data type: vector of positive integers. `Coordinates[,2]`: (Second Column) Vector of vertical coordinates of the centroids of each tile of the map (`Ys`). Data type: vector of positive integers.
`Weights`	A `Matrix` of positive weights of the tiles specified by `Coordinates`. `Weights[,1]`: (First Column) Weights of the first spatial histogram, a weight for each tile located at position `Xs[i], Ys[i]` for `i=1,...N`. Data type: vector of positive doubles. `Weights[,2]`: (Second Column) Weights of the second spatial histogram, a weight for each tile located at position `Xs[i], Ys[i]` for `i=1,...N`. Data type: vector of positive doubles.
`L`	Approximation parameter. Higher values of L give a more accurate solution, but they require a longer running time. Data type: positive integer.
`recode`	If equal to `True`, recode the input coordinates as consecutive integers.
`method`	Method for computing the KW distances: `exact` or `approx`.
`algorithm`	Algorithm for computing the KW distances: `fullmodel` or `colgen`.
`model`	Model for building the underlying network: `bipartite` or `mincostflow`.
`verbosity`	Level of verbosity of the log: `silent`, `info`, or `debug`.
`timelimit`	Time limit in second for running the solver.
`opt_tolerance`	Numerical tolerance on the negative reduced cost for the optimal solution.
`unbalanced`	If equal to `True`, solve the problem with unbalanced masses.
`unbal_cost`	Cost for the arcs going from each point to the extra artificial bin.
`convex`	If equal to `True`, compute the convex hull of the input points.

Details

The function compareOneToOne(Coordinates, Weights, ...) computes the distance between the two histograms specified by the weights given in the two columns of matrix Weights. The support points (i.e., centroids of each tile of the map) are defined by the coordinates given in Xs and Ys in the two columns of matrix Coordinates. The algorithm used to compute such distance depends on the parameters specified as optional arguments of the function.

The most important is the parameter L, which by default is equal to 3. The following table shows the worst-case approximation ratio as a function of the value assigned to L. The table also reports the number of arcs in the network flow model as a function of the number of bins n contained in the convex hull of the support points of the histograms given in input with matrix Coordinates.

L	1	2	3	5	10	15
`Worst-case error`	7.61%	2.68%	1.29%	0.49%	0.12%	0.06%
`Number of arcs`	O(8n)	O(16n)	O(32n)	O(80n)	O(256n)	O(576n)

The following two figures show the network build on a grid with 8x8 nodes and using L=2 and L=3.

L=2

L=3

Value

Return an R List with the following named attributes:

distance: The value of the KW-distance between the two input histograms.
status: Status of the solver used to compute the distances.
runtime: Overall runtime in seconds to compute all the distances.
iterations: Overall number of iterations of the Network Simplex algorithm.
nodes: Number of nodes in the network model used to compute the distances.
arcs: Number of arcs in the network model used to compute the distances.

Examples

# Define a simple example
library(SpatialKWD)

# Random coordinates
N = 90
Xs <- as.integer(runif(N, 0, 31))
Ys <- as.integer(runif(N, 0, 31))
coordinates <- matrix(c(Xs, Ys), ncol=2, nrow=N)

# Random weights
test1 <- matrix(runif(2*N, 0, 1), ncol=2, nrow=N)

# Compute distance
print("Compare one-to-one with exact algorithm:")
d <- compareOneToOne(coordinates, Weights=test1, method="exact",
                     recode=TRUE, verbosity = "info")
cat("runtime:", d$runtime, " distance:", d$distance,
    " nodes:", d$nodes, " arcs:", d$arcs, "\n")

print("Compare one-to-one with approximate algorithm:")
d <- compareOneToOne(coordinates, Weights=test1, L=2, recode=TRUE)
cat("L: 2, runtime:", d$runtime, " distance:", d$distance,
    " nodes:", d$nodes, " arcs:", d$arcs, "\n")

d <- compareOneToOne(coordinates, Weights=test1, L=3)
cat("L: 3 runtime:", d$runtime, " distance:", d$distance, "\n")

d <- compareOneToOne(coordinates, Weights=test1, L=10)
cat("L: 10, runtime:", d$runtime, " distance:", d$distance, "\n")

SpatialKWD documentation built on Dec. 9, 2022, 5:08 p.m.