In edzer/mss: Meaningful Spatial Statistics

Meaningful Spatial Statistics: The `mss` Package

This vignette introduces an R package for meaningful spatial statistics,

library(mss)

The package introduces three classes for spatial data: SField for representing geostatistical data (spatial fields), SObjects for point patterns (spatial objects), and SLattice for lattice (areal) data (spatial lattices). The abbreviations follow this paper (which is under review).

`SExtent`: extent (window) of field or point pattern

SField and SObjects objects contain an SExtent, representing the domain or extent of the field, or the window of observation for a point pattern.

The window class defines an area:

showClass("SExtent")

The area can be represented either by a grid, as in

library(sp)
demo(meuse, ask = FALSE, echo = FALSE)
w2 = SExtent(meuse.grid) # note the grid is passed, not the polygons

or by a polygon or set of polygons, as in

w1 = SExtent(meuse.area)

`SField`: geostatistical data

SFields represent continuous functions, which over a domain $D$ have a value at every point $s \in D$.

showClass("SField")

Fields from points

Observations on fields variables are often done on irregular point locations. Such a field variable is created from a SpatialPointsDataFrame object:

sf = SField(meuse, meuse.grid)

where, here, meuse.grid defines the domain for the field. Alternatively, point observations can be gridded, and defined as a field by

sf = SField(meuse.grid, meuse.grid, cellsArePoints = TRUE)

Note that the first argument defines the point observations, and the second argument the domain: the first argument is treated as a (gridded) set of points (grid cell centres), the second as a gridded set of areas (cells).

Fields from grids, grid cells representing constant areas

We can alternatively assume that each grid cell represents an area with constant point values. Such a field is defined by:

sf = SField(meuse.grid, meuse.grid, cellsArePoints = FALSE)

This field is completely known, for every point location. Note that grid cell values are constant throughout the cell, and are not conceived as cell aggregate (e.g. average, or otherwise convoluted) values. For that case, SpatialAggregates are used.

Fields from lines and polygons

Sets of points can not only be defined by grid cells, but also through lines, or polygons. For instance, points on a contour line could form a sample of a field:

library(maptools)
library(rgeos)
cl = ContourLines2SLDF(contourLines(as.image.SpatialGridDataFrame(
  meuse.grid["dist"])))
proj4string(cl) = CRS(proj4string(meuse.grid))
sf.lines = SField(cl, meuse.grid)

or, alternatively, as a set of polygons; in this example, a single polygon with a constant value:

pol = addAttrToGeom(meuse.area, data.frame(value = 3), FALSE)
sf.pol = SField(pol, meuse.area)

`SObjects`: point patterns or objects

The SObjects class can be used to represent objects in a space, where the window of observation reflects the area for which all points or objects are available:

showClass("SObjects")

We can create a point pattern e.g. for the sample locations of the meuse data set:

pts = geometry(meuse)
se.pts = SObjects(pts, meuse.area)

but also areas can indicate entities:

se.area = SObjects(meuse.area, meuse.area)

`SLattice`: areal data

Aggregations refer to values, measured or computed as the aggregate over a set of (point) values, e.g. the mean, maximum or minimum value. As such, aggregations need to be defined for sets of points, i.e. for lines, polygons, or grid cells:

showClass("SLattice")

Sets of points can be represented by anything but points: lines, grids, and polygons:

sa.1 = SLattice(cl)
sa.3 = SLattice(meuse.grid)
sa.2 = SLattice(pol)

Three interpretations of a spatial grid

Spatial grid data can be conceived as having values for 1. points, at grid cell centres 2. constant values throughout each grid cell 3. aggregated values over the grid cell region The difference is illustrated by querying each at two points, one coinciding with a grid cell centre, one not coinciding. The two points are indicated by red circles:

xy = rbind(coordinates(meuse)[1,], coordinates(meuse.grid)[10,])
pts = SpatialPoints(xy, CRS(proj4string(meuse.grid)))
plot(as(meuse.grid[1:21,], "SpatialPolygons"))
plot(pts, col = 'red', pch = 16, add = TRUE)
plot(meuse.grid[1:21,], add = TRUE)
text(coordinates(pts), c("1", "2"), pos = 2)

Querying a spatial field where cells are points gives non-missing values only when the point coincides with the grid cell centre:

f1 = SField(meuse.grid, meuse.area, cellsArePoints = TRUE)
pts = SField(pts, meuse.grid)
over(pts, f1)

Querying a spatial field where cells are constant valued areas gives the grid cell values of the cells where the point is in:

f2 = SField(meuse.grid, meuse.area, cellsArePoints = FALSE)
over(pts, f2) # matches pts to the complete grid cell

Querying a grid where cell values are aggregations does not answer and generates a warning:

a = SLattice(meuse.grid)
over(pts, a)

Methods for fields: interpolation

point-to-point interpolation

We can predict to points, laid out regularly, inside the polygon of the field domain:

m = SField(meuse, meuse.area)
v = gstat::vgm(.6, "Sph", 900, .05) # variogram model
i1 = interpolate(log(zinc)~1, m, model = v)
spplot(i1, "var1.pred")

To show that these really concern points, plotting for a smaller region gives actual point symbols:

spplot(i1[1:300,], "var1.pred")

Alternatively, we can predict values on a pre-defined grid, taken from the domain:

m = SField(meuse, meuse.grid)
i2 = interpolate(log(zinc)~1, m, model = v)
print(as(i2[1:3,], "data.frame"), digits = 10) # point support

point-to-area interpolation (block kriging)

From point values, we can predict the polygon area mean value:

library(rgeos)
m3 = SLattice(meuse.area)
i3 = interpolate(log(zinc)~1, m, m3, model = v) # interpolate from point TO polygon support
i3$var1.pred # a single value, estimate of mean for the whole area
mean(log(m$zinc)) # sample mean
mean(i1$var1.pred) # mean of predictions, similar to i2$var1.pred

or predict values for grid cell averages (block kriging):

m4 = SLattice(meuse.grid)
i4 = interpolate(log(zinc)~1, m, m4, model = v) # interpolate from point TO feature (cell) support
print(as(i4[1:3,], "data.frame"), digits = 10) # grid cells support
spplot(i4, "var1.pred")

Support: area-to-point and area-to-area kriging

Suppose we have grid data on a 3 x 3 grid, and want to estimate on a new location, indicated by a red circle:

set.seed(131)
gt = GridTopology(cellcentre.offset = c(0,0), cellsize = c(1,1), cells.dim = c(3,3))
sgdf = SpatialGridDataFrame(SpatialGrid(gt), 
    data.frame(r = round(runif(9, max = 10), 1)))
plot(as(sgdf, "SpatialPolygons"), axes = TRUE)
text(coordinates(sgdf), labels = sgdf[[1]])
pt.red = SpatialPoints(cbind(0.25, 0.25))
points(pt.red, col = 'red', pch = 1)

The first case is when the grid cell data represent point support values at the grid cell centre, which is identical to case i1 above:

sf = SField(as(sgdf, "SpatialPointsDataFrame"), sgdf)
vm = gstat::vgm(1, "Exp", 1)
i5 = interpolate(r~1, sf, SField(pt.red, sgdf), model = vm)
as(i5[1,], "data.frame")

The second case, grid cells represent constant area with point support value, meaning that every point location within a grid cell has the value of the grid cell assigned:

sf = SField(sgdf, sgdf, cellsArePoints = FALSE)
i6 = interpolate(r~1, sf, SField(pt.red, sgdf), model = vm)
as(i6[1,], "data.frame") # does not interpolate, but queries grid cell

as case i4 above, from point values we can predict a mean value for a grid cell by

sf = SField(sgdf, sgdf, cellsArePoints = TRUE)
gt = GridTopology(cellcentre.offset = c(.25,.25), cellsize = c(1,1), cells.dim = c(1,1))
grd.red = SpatialGrid(gt)
i7 = interpolate(r~1, sf, SLattice(grd.red), model = vm)
i7@observations@data

Area-to-point kriging

sp = as(sgdf, "SpatialPolygonsDataFrame")
gf = SLattice(sp)
kr = interpolate(r~1, gf, SField(pt.red, sp), model = vm)
kr@observations[[1]]
library(gstat)
krige0(r ~ 1, sp, pt.red, vgmArea, vgm = vm) # check

Area-to-area (grid-to-grid) kriging

sp = as(sgdf, "SpatialPolygonsDataFrame")
gr = as(grd.red, "SpatialPolygons")
grd = addAttrToGeom(gr, data.frame(r=0), FALSE)
plot(as(sgdf, "SpatialPolygons"), axes = TRUE)
text(coordinates(sgdf), labels = sgdf[[1]])
plot(grd, add = TRUE, border = 'red')
kr = interpolate(r~1, gf, SLattice(grd), model = vm)
kr@observations[[1]]
krige0(r ~ 1, sp, grd, vgmArea, vgm = vm) # check

Estimating density from entitities

For SpatialEntity objects, we can estimate the density of objects; this method reuses MASS:kde2d:

e = SObjects(meuse, meuse.area)
d = density(e, newdata = meuse.grid)
lt = list(list("sp.polygons", meuse.area, border = 'black',
    first=FALSE), list("sp.points", meuse, col = grey(.5), pch = 16))
class(d)
spplot(d, sp.layout = lt)

The reason why this returns a SLattice is that the values computed (densities) do not correspond to quantities that are measurable at point locations where they are registered (grid cells). In fact, they are aggregated values for areas even much larger than the grid cells.

Spatial Aggregation

for `SField` data

m = SField(meuse["zinc"], meuse.area)
a = aggregate(m, SLattice(meuse.area), mean) # no warning
a[[1]]
a = aggregate(m, SLattice(meuse.area), sum) # warns:
a[[1]]

for `SpatialEntity` data

m = SObjects(meuse["zinc"], meuse.area)
a = aggregate(m, SLattice(meuse.area), mean) # warns:
a[[1]]
a = aggregate(m, SLattice(meuse.area), sum) # no warning
a[[1]]

Meaningful warnings

Pebesma et al. (2014) make assertions A1-A7.

A1.

A prediction is meaningful, if it provides an estimate for a potential observation Prediction on a field variable generates no warning.

Where prediction of a SField varialbe generates no warning,

m = SField(meuse, meuse.area) # data are point support
res = interpolate(log(zinc)~1, m) # interpolate from point TO point support

prediction of a point pattern variable generates a warning:

m = SObjects(meuse, meuse.area)
sa = SLattice(meuse.grid)
res = interpolate(log(zinc)~1, m, sa) # interpolate point pattern variable: warns

A2.

Summing up values over an area is meaningful, if the observed window corresponds to the target geometry (grouping predicate) of the aggregation.

For SpatialEntity data, spatial aggregation over an area larger than the observation window generates a warning:

x = bbox(meuse)[1,]
y = bbox(meuse)[2,]
bb = Polygon(cbind(c(x[1],x[1],x[2],x[2],x[1]), c(y[1],y[2],y[2],y[1],y[1])))
bbx = SpatialPolygons(list(Polygons(list(bb),"bbox_meuse")),
    proj4string = CRS(proj4string(meuse)))
m = SObjects(meuse["zinc"], meuse.area)
a = aggregate(m, SLattice(bbx), sum) # warns:

Stasch et al. 2014 prove A1 and A2 more formally.

Further assertions in the poster:

A3.

Polygon data may reflect constant values over areas (geology) or means (population density), this matters for (further) aggregation or downsampling.

Package mss distinguishes between SField and SLattice to represent constant values, and aggregations, respectively. Subsampling SField variabls does not not warn,

area = addAttrToGeom(meuse.area, data.frame(v=1), match.ID = FALSE)
f = SField(area, meuse.area)
pts = SField(meuse[1:5,], meuse.area)
over(pts, f) # retrieve f at the points of pts:

but subsampling SLattice variables triggers a warning, and no useful value

a = SLattice(area)
over(pts, a)

as it should really disaggregate first, from area to point.

A4.

The same is true for grid cells.

SField and SLattice both generalize polygon and grid data.

f1 = SField(meuse.grid, meuse.area, cellsArePoints = TRUE)
pts = SField(meuse[1:5,], meuse.area)
over(pts, f1) # matches pts to grid cell center points:
f2 = SField(meuse.grid, meuse.area, cellsArePoints = FALSE)
over(pts, f2) # matches pts to the complete grid cell
a = SLattice(meuse.grid)
over(pts, a)

A5.

Point pattern data often come without observation window, over which aggregation is meaningful

Attempting to create a SObjects object without an observation window specified gives a warning message:

pts <- SObjects(meuse)

Providing a window that does not contain all the points gives an error:

tr = try(pts <- SObjects(meuse, meuse[1:10,]))
attr(tr, "condition")

A6.

Field measurements often come without a notion for which locations interpolations based on them make sense.

m = SField(meuse["zinc"], meuse.area)
a = interpolate(log(zinc)~1, m, SLattice(bbx)) # warns:

A7.

File types do not inform on meaningfulness

SpatialPoints, SpatialPixels, SpatialGrid, SpatialLines, SpatialPolygons and their *DataFrame counterparts do not inform whether data concern fields (geostatistical data), entities (point patterns), or aggregations. The classes in mss: SField, SObjects, and SLattice, do.

References

E. Pebesma, C. Stasch, B. Graeler and S. Scheider, 2014. Meaningfully Integrating Big Earth Science Data; AGU fall meeting, poster IN33A-3757 pdf
Stasch, C., S. Scheider, E. Pebesma, W. Kuhn, 2014. Meaningful Spatial Prediction and Aggregation. Environmental Modelling \& Software, 51, 149-165 open access.

edzer/mss documentation built on May 15, 2019, 11:08 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

edzer/mss
Meaningful Spatial Statistics

In edzer/mss: Meaningful Spatial Statistics

Meaningful Spatial Statistics: The `mss` Package

`SExtent`: extent (window) of field or point pattern

`SField`: geostatistical data

Fields from points

Fields from grids, grid cells representing constant areas

Fields from lines and polygons

`SObjects`: point patterns or objects

`SLattice`: areal data

Three interpretations of a spatial grid

Methods for fields: interpolation

point-to-point interpolation

point-to-area interpolation (block kriging)

Support: area-to-point and area-to-area kriging

Area-to-point kriging

Area-to-area (grid-to-grid) kriging

Estimating density from entitities

Spatial Aggregation

for `SField` data

for `SpatialEntity` data

Meaningful warnings

A1.

A2.

A3.

A4.

A5.

A6.

A7.

References

R Package Documentation

Browse R Packages

We want your feedback!

edzer/mss Meaningful Spatial Statistics

In edzer/mss: Meaningful Spatial Statistics

Meaningful Spatial Statistics: The mss Package

SExtent: extent (window) of field or point pattern

SField: geostatistical data

Fields from points

Fields from grids, grid cells representing constant areas

Fields from lines and polygons

SObjects: point patterns or objects

SLattice: areal data

Three interpretations of a spatial grid

Methods for fields: interpolation

point-to-point interpolation

point-to-area interpolation (block kriging)

Support: area-to-point and area-to-area kriging

Area-to-point kriging

Area-to-area (grid-to-grid) kriging

Estimating density from entitities

Spatial Aggregation

for SField data

for SpatialEntity data

Meaningful warnings

A1.

A2.

A3.

A4.

A5.

A6.

A7.

References

R Package Documentation

Browse R Packages

We want your feedback!

edzer/mss
Meaningful Spatial Statistics

Meaningful Spatial Statistics: The `mss` Package

`SExtent`: extent (window) of field or point pattern

`SField`: geostatistical data

`SObjects`: point patterns or objects

`SLattice`: areal data

for `SField` data

for `SpatialEntity` data