Introduction to Wicket

wicket is a little package that makes certain kinds of geospatial data manipulation easier in R - specifically, validating and generating Well-Known Text (WKT) data, including from sp objects. At the moment the functionality consists of:

  1. Generating bounding boxes from WKT data and normal, R data
  2. Validating WKT data, and
  3. Converting sp objects into WKT data

Let's step through each in turn

Bounding boxes

A bounding box is a very simple concept: a representation of the smallest area in which all the points in a dataset lie. In WKT, bounding boxes look like:

POLYGON((10 14,10 16,12 16,12 14,10 14))

Sometimes you've got WKT data like this - a Polygon, a LineString, whatever - and you want a bounding box in a format R can understand. The answer is wkt_bounding, which takes a vector of valid WKT objects and produces a data.frame or matrix of R representations, whichever you'd prefer:

wkt <- c("POLYGON ((30 10, 40 40, 20 40, 10 20, 30 10))",
         "LINESTRING (30 10, 10 90, 40 40)")
wkt_bounding(wkt)
#   min_x min_y max_x max_y
# 1    10    10    40    40
# 2    10    10    40    90

Alternately you might want to go in the other direction and turn R bounding boxes into WKT objects. You can do that with, appropriately, bounding_wkt:

bounding_wkt(min_x = 10, min_y = 10, max_x = 40, max_y = 40)
# [1] "POLYGON((10 10,10 40,40 40,40 10,10 10))"

This accepts either a series of vectors, one for each min or max value, or a list of length-4 vectors. Either way, it produces a nice WKT representation of the R data you give it.

WKT validation

The two greatest challenges in computer science are naming things, cache invalidation, and off-by-one errors. The two greatest challenges in data science are naming things and other peoples' data. And off-by-one-errors.

wicket contains a validator for WKT, validate_wkt, which takes a vector of WKT objects and spits out a data.frame containing whether each object is valid, and any comments the parser has in the case that it isn't:

wkt <- c("POLYGON ((30 10, 40 40, 20 40, 10 20, 30 10))",
         "ARGHLEFLARFDFG",
         "LINESTRING (30 10, 10 90, 40 out of cheese error redo universe from start)")
validate_wkt(wkt)
#   is_valid comments
# 1    FALSE The WKT object has a different orientation from the default
# 2    FALSE Object could not be recognised as a supported WKT type
# 3    FALSE bad lexical cast: source type value could not be interpreted as target at 'out' in 'linestring (30 10, 10 90, 40 out of cheese error redo universe from start)'

With this you can check and clean your data before you rely on it and watch all your code fall down in a heap.

WKT generation from sp objects

sp objects - particularly SpatialPolygons and SpatialPolygonDataFrames - are the standard way of representing geodata in R. They're also entirely unique to R and really difficult to use elsewhere. Enter sp_convert, which takes a list of SP/SPDF objects (or a single one) and turns the coordinate sets within them into WKT. In the case that there are multiple coordinate sets in an object and the group argument is set to TRUE, a MultiPolygon will be generated for that entry: if it's FALSE, a vector of Polygons:

library(sp)
Sr1 <- Polygon(cbind(c(2,4,4,1,2),c(2,3,5,4,2)))
Sr2 <- Polygon(cbind(c(5,4,2,5),c(2,3,2,2)))
Sr3 <- Polygon(cbind(c(4,4,5,10,4),c(5,3,2,5,5)))
Sr4 <- Polygon(cbind(c(5,6,6,5,5),c(4,4,3,3,4)), hole = TRUE)

Srs1 <- Polygons(list(Sr1), "s1")
Srs2 <- Polygons(list(Sr2), "s2")
Srs3 <- Polygons(list(Sr3, Sr4), "s3/4")
sp_object <- SpatialPolygons(list(Srs1,Srs2,Srs3), 1:3)

# With grouping
sp_convert(x = sp_object, group = TRUE)
# [1] "MULTIPOLYGON(((2 2,1 4,4 5,4 3,2 2)),((5 2,2 2,4 3,5 2)),((4 5,10 5,5 2,4 3,4 5)),((5 4,5 3,6 3,6 4,5 4)))"

# Without grouping
sp_convert(x = sp_object, group = FALSE)
# [[1]]
# [1] "POLYGON((2 2,1 4,4 5,4 3,2 2))"  "POLYGON((5 2,2 2,4 3,5 2))"      "POLYGON((4 5,10 5,5 2,4 3,4 5))"
# [4] "POLYGON((5 4,5 3,6 3,6 4,5 4))"

Coordinate and centroid extraction

WKT POLYGONs are often used to store latitude and longitude coordinates - and you can use wkt_coords to get them:

wkt_coords(("POLYGON ((30 10, 40 40, 20 40, 10 20, 30 10))"))
#   object  ring lng lat
# 1      1 outer  30  10
# 2      1 outer  40  40
# 3      1 outer  20  40
# 4      1 outer  10  20
# 5      1 outer  30  10

The result of a wkt_coords call is a data.frame of four columns - object, identifying which of the input WKT objects the row refers to, ring referring to the layer in that object, and then lat and lng.

Extracting centroids is also useful, and can be performed with wkt_centroid. Again, it's entirely vectorised and produces a data.frame:

wkt_centroid(("POLYGON ((30 10, 40 40, 20 40, 10 20, 30 10))"))
#        lng     lat
# 1 25.45455 26.9697

New features and bugs

If you've got ideas for other features - or have found something in the existing featureset that is broken - throw them on the GitHub issues page!



Try the wicket package in your browser

Any scripts or data that you put into this service are public.

wicket documentation built on May 2, 2019, 11:11 a.m.