knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
library(sfdep)
library(dplyr)

sfdep introduces a new s3 class to represent spatio-temporal data. The spacetime class links a flat data set containing spatio-temporal information with the related geometry. The spacetime class is informed by the {spacetime} package by Edzer Pebesma (2012), and the interface is inspired by the design of {tidygraph}.

Spatio-temporal data

Traditionally "spatio-temporal data often come in the form of single tables" that can typically be categorized as "time-wide", "space-wide", or "long formats." In long formats, often referred to as "tidy", a row identifies a unique location and time observation represented by a column dedicated to time and another to locations. This is the typical presentation of panel data.

Space-wide data present each time period across each row and locational information in each column. Whereas a time-wide representation has location data down the rows and each time period is represented as a new column.

These flat formats are not linked to the geographies that they represent in any meaningful way. These flat files typically contain only an identifier of the location, but the spatial representation.

The spacetime class is developed with particular focus to lattice data. That is, to create a representation of spatio-temporal data for a set of regions over a number of different time-periods e.g. population densities in census tracts for each year.

To represent spatial data in a temporal context Pebesma, 2012 identifies a number of spatio-temporal layouts, two of which are of particular interest. These are the spatio-temporal full grid and sparse grids.

Given a number of spatial features $n$, and time periods $m$, a spatio-temporal full grid contains $n \times m$ rows. Each location has a recorded observation for each of the time periods in $m$. For example, if there are 10 locations and 20 time periods, there are 20 observations per location meaning there are $10 \times 20 = 200$ observations. This is efficient only when are there are complete time-series for each location.

When there are missing observations for some locations or time periods and they are entirely omitted from the data set, that is a spatio-temporal sparse grid. In this case $N \lt m \times n$

spacetime s3 class in sfdep

Inspired by the design of the tidygraph package, the spacetime class links a data frame and an sf object based on a shared location identifier column. These are referred to as the data context and the geometry context. The spacetime class allows you switch between different contexts and work with them individually as you see fit.

Typically, if one wants to represent location data over multiple time periods containing information about the geography, an sf object is used which duplicates the geometry at each location for each time period which can be computationally expensive. By linking sf objects to a data frame based on their location ID, we are able to avoid this problem

There are four important aspects to the spacetime class:

Creating a spacetime object

There are two ways to create spacetime objects: 1) with as_spacetime() and 2) spacetime() or new_spacetime(). The former takes an sf object that contains the location IDs, times, and geometry and converts it into a spacetime object. Whereas the constructor functions require a data frame and a separate sf object containing the geometry.

Let's create a sample data set using the guerry

# replicate the guerry dataset 10 times
x <- purrr::map_dfr(1:10, ~guerry) |> 
  select(code_dept, crime_pers) |> 
         # create an indicator for time period
  mutate(time_period = sort(rep(1:10, 85)), 
         # add some noise 
         crime_pers = crime_pers * runif(850, max = 2))

x

This representation, where there are duplicate geometries for each location, should be cast into a spacetime object using as_spacetime().

spt <- as_spacetime(x, "code_dept", "time_period")

Alternatively, we have the other scenario, where we have the geometry and the data as two separate objects. In this case we can use the spacetime() constructor. It's required arguments are .data, .geometry, .loc_col, .time_col. .data must be a data frame and .geometry must be a tibble.

Here we create a data frame df which contains columns for the location identifier, the time period, and any other variables of interest in this case crime_pers.

df <- sf::st_drop_geometry(x)
geo <- select(guerry, code_dept)

head(df)

Note that the location identifier column is the same between the two objects—this is a requirement.

spt <- spacetime(
  .data = df, 
  .geometry = geo, 
  .loc_col = "code_dept", 
  .time_col = "time_period"
  ) 

spt

As an aside, I'd note that as_spacetime() uses the sf distinct method which can be a bit computationally intense depending on your geometries. As such I'd recommend using spacetime() constructor always.

With the spacetime objects, we can also cast them back into sf objects using as_sf(x).

Spacetime Contexts

Spacetime objects have two contexts: the data and geometry contexts.

The data context consists of a data frame object. It can be manipulated just like any other data frame. You switch between contexts using activate(). To switch to the data context activate "data."

activate(spt, "data")

The geometry context is an sf object that too can be used like any other sf object and is activated with activate(x, "geometry").

spt |> 
  activate("geometry") 

Spatio-temporal grids and spacetime

Unlike {spacetime}, sfdep does not make explicit distinctions between spatio-temporal full and sparse grids. Rather, the approach is more laissez faire. The design of the spacetime interface is very flexible and is designed to let the user clean their data with whatever tools are familiar and to their own specification.

The distinction between sparse and full grids is important when it comes to analyzing data. For example emerging hot spot analysis requires a spatio-temporal full-grid. sfdep utilizes the phrase "spacetime cube" as popularized by ESRI to refer to a spatio-temporal full grid.

Spacetime Cubes

A spacetime object is a spacetime cube if every location has a value for every time index. Another way of saying this is that each location contains a regular time-series.

In ESRI terminology, the basic unit of a spacetime cube is a bin. A bin is the unique combination of a location and time index. For each time index, the collection of every location is called a time slice. In every location, the collection of every bin at each time index is referred to as a a bin time-series.

knitr::include_graphics("https://pro.arcgis.com/en/pro-app/2.8/tool-reference/space-time-pattern-mining/GUID-0FEECE1A-6B54-44B4-AE49-05E7EA849A8B-web.png")

We can test if an object is a spacetime cube with is_spacetime_cube()

is_spacetime_cube(spt)

Here we take a sample of 800 of the 850 rows of spt which makes this a sparse grid.

sparse_spt <- dplyr::slice_sample(spt, n = 800)

is_spacetime_cube(sparse_spt)

If an object is a spare spatio-temporal grid we can make it a full one using complete_spacetime_cube(). This works similarly to [tidyr::complete()]. complete_spacetime_cube() ensures that there is a row for each combination of location and time. New rows will contain missing values

spt_complete <- complete_spacetime_cube(sparse_spt)

is_spacetime_cube(spt_complete)

One of the conditions of being a spactime cube is that the time-series must be regular (only one observation for each time index). Here we can create a sample of our data with replacement to create an irregular time-series at multiple locations.

set.seed(0)
sparse_spt <- dplyr::slice_sample(spt, n = 800, replace = TRUE)

complete_spacetime_cube(sparse_spt)

This error is informative. We do not have unique bins in our spacetime data. We can check this.

dplyr::count(sparse_spt, time_period, code_dept)

Spacetime cubes are used for emerging hot spot analysis as below.

emerging_hotspot_analysis(spt, "crime_pers", threshold = 0.05)


JosiahParry/sfdep documentation built on Sept. 7, 2024, 6:15 a.m.