knitr::opts_chunk$set(echo = TRUE)

Topology

Objects in the sp package are complex: the R object is composed of a set of geometric objects, organized as a table of attributes belonging to those. Even the terminology is complex, we need a fairly sophisticated nomenclature to talk about the objects and their geometry and other attributes. You only have one type of topology in each, and these are points, lines, or polygons. Each geometric object in turn may be simple or quite complex, a simple case is a filled shape, such as as a property boundary. Much more complex is a continent, composed of many land masses separated by water, and on those polygon land masses may be a lake, with an island inside that lake, and a lake on that island. A similar complexity exists for lines, a single geometric object may be set of long connected "line strings" that aren't spatially connected. Points are slightly less complex, but see {...} for how Spatial[Multi]Points are defined in a fundamentally different way. We term these geometric objects "polygons", "lines" and "points" but beware of the slippery concept between simple and complex multi-part geometries. A single polygon in the data frame may be composed of many simple polygon rings.

Draw an analogy to multiple continents and a simpler map of islands in a peninsula, and a cadastre. Use these maps to explain the relation of layer, object, part, coordinate.

This complexity is reflected in the structure of the sp classes SpatialPolygonsDataFrame and SpatialLinesDataFrame, where the actual matrices of coordinates of each "part" are stored in nested lists. A single-row SpatialPolygonsDataFrame contains a SpatialPolygons which is composed of a Polygons which is composed of a list of Polygon objects - it is each Polygon that stores the matrix of the ring. Similarly this holds for SpatialLines, Lines, and Line.

Nesting of lists of matrices sounds like the complex "lake in an island in a lake in an island . . ." story from above, but the relationship does not hold. A Polygons object contains a flat list of all the simple polygon rings, and there is no pattern to how these are stored that records the relationship of holes and islands. There are only two levels to Polygons and Lines, one is the i-th object, and the next is the k-th polygon, or line, matrix. Lines are no hierarchical like polygons can be, so it's simpler to use them as an example first.

The relationship of islands and holes is recorded as an obscure attribute, essentially each individual polygon can be identified to its parent - but only some formats and systems provide this information, and it's not applied or checked consistently. An example of the inconsistency is that there are two definitions for calculating whether you are inside or outside, these are the non-zero (or winding) rule and the even-odd rule. when you plot a polygon object, you can specify which rule is used (it is passed down to base::polypath, or to choose grid::polyGrob or grid::pathGrob), but when checking analytically via sp::point.in.poly this is not provided as a choice. The nominated hole/island status of the ring/s you are inside is what determines the answer.

I prefer ggplot2! geom_polygon uses grid::polyGrob and so you canot have more than one hole, and even then it is done via sleight of hand.

Why is sp so complicated, but ggplot2 so much easier to understand?

Ggplot requires a single table of all geometric and attribute information, it then applies aesthetics (the x and y coordinates), grouping (the sets of coordinates that are connected, as above), and other properties like colour, line size etc. that correspond to the objects in the original data frame. (Each country say).

Explain the difference between the fortify table and the sp objects. Show examples with continents, islands/lakes, and boundary-sharing polygons.

Use the cadastre to mesh onto a DEM and plot in 3D.

Nesting what we had in the fortify table tidies this up, there's no longer an issue of "many-to-one", with its ambiguity of the "object in table ID" vs. the value of a particular attribute.

But what about all the stuff in each individual table? There's a mess of parts, hole status and order. We can continue to nest. Isn't this perfect? We have recreated the two-tier nesting of sp objects, and we no longer have a complex S4 object to deal with.

Each row in the top table is a single geometric object. Each row in the parts table is a part, we can easily see how many there are, we only store "I'm a hole" once, and we automatically have a place to store the topological membership of which parent this part is a child of.

Nesting - the good parts

Notes for future Mike:

Interesting things about nesting is that is must happen from the inside out, we cannot double nest first by object and then by branch.

If we nest first by Object (246 rows in wrld_simpl), we get all the coordinates for each object in one nested table. This is basically the fortify data frame.

nest1 <- function(data, ...) { 
  sptab <-  sptable(data) %>%  
  group_by_("object_") %>% nest_(key_col = "Object") 

  attrd <- as_data_frame(as.data.frame(data)) 
  y <- bind_cols(attrd, sptab) 
  attr(y, "crs") <- proj4string(data) 
  class(y) <- c("sp1nest", class(y)) 
  y 
}

library(maptools) 
data(wrld_simpl) 
library(tidyr) 
library(spbabel)
sp1 <- nest1(wrld_simpl) 
x1 <- sp1 %>% 
  filter(NAME %in% c("Australia", "New Zealand")) %>% 
  select(Object, ISO3)  
x1
x1$Object[[1]]
unnest(x1)

Note that there are several layers of redundancy here. The branch attribute is repeated for each coordinate, as is island status, and order is there so we could restore the arrangement (databases don't guarantee row order, though R does).

Tell this story later as duplicated vertices

We can remove the need to repeat the values for branch and island by nesting twice.

nest2 <- function(data, ...) { 
  sptab <-  sptable(data) %>%  
    group_by_("branch_", "object_", "island_") %>%  
    nest_(key_col = "Branch_") %>%   
    group_by_("object_") %>% nest_(key_col = "Object") 

  attrd <- as_data_frame(as.data.frame(data)) 
  y <- bind_cols(attrd, sptab) 
  attr(y, "crs") <- proj4string(data) 
  class(y) <- c("sp2nest", class(y)) 
  y 
}
sp2 <- nest2(wrld_simpl)
## this time we have a sub-dataframe for each individual polygon ring
x2 <- sp2 %>% 
  filter(NAME %in% c("Australia", "New Zealand")) %>% 
  select(Object, ISO3) 
x2$Object[[1]]
x2$Object[[1]]$Branch_[[1]]

This structure is exactly analogous to the Spatial*DataFrame classes, with two levels of nesting.

There is another level of redundancy in the coordinate values.

sptab <- sptable(wrld_simpl)
nrow(sptab)
nrow(distinct(sptab[, c("x_", "y_")]))

This is not only because of the extra coordinate on every ring.

nrow(sptab) - length(unique(sptab$branch_))

The same holds for this data set as lines or as points, or multipoints.

How can we use this data structure?

Filter, etc. all essentially need to unnest to do most tests. We save effort by filter on object attributes first, then drill down.

I don't think this is useful, you constantly need to unnest to do tests, but it does tidy up a lot for having a single table.

nest3 <- function(data, ...) { 
  sptab <-  sptable(data)
  sptab$vertex_ <- as.integer(factor(paste(sptab$x_, sptab$y_, sep = "-")))
  coords <- sptab %>% select_("x_", "y_", "vertex_") %>% distinct(vertex_) 

  ## batch duplicates

 sptab <- sptab %>%  select(-x_, -y_) %>% 
    group_by_("branch_", "object_", "island_") %>%  
    nest_(key_col = "Branch_") %>%   
    group_by_("object_") %>% nest_(key_col = "Object") 

  attrd <- as_data_frame(as.data.frame(data)) 
   attr(coords, "crs") <- proj4string(data) 
   ## more natural to now use a database
env <- environment()
env$object <- bind_cols(attrd, sptab)
env$vertices <- coords
env
#  y <- list(object = bind_cols(attrd, sptab) , vertices = coords)
  #class(y) <- c("sp3nest", class(y)) 
  #y 
}
sp3 <- nest3(wrld_simpl)
with(sp3, 
     dat <- filter(object, NAME == "Australia"),  
     vert < dat %>% select(Object) %>% unnest() %>% unnest() %>% select(vertex_)) %>% inner_join(vertices)
vert
)





## this time we have a sub-dataframe of only the vertex index 
x3 <- sp3 %>% 
  filter(NAME %in% c("Australia", "New Zealand")) %>% 
  select(Object, ISO3) 
x3$Object[[1]]
x3$Object[[1]]$Branch_[[1]]

summary

To escape the polygon hegemony we must

So 3 models

These can be combined, or used independently. Do all this without classes, to show we can have a table-based engine without restrictions.

The crux between line and polygons is path, polygons composed of lines are paths. Polygons composed of edges are just PSLG.



r-gris/ggrasp documentation built on May 26, 2019, 1:33 p.m.