GIS Features and R

In order to represent geographic features in a chart we need to have highly structured data, particularly for the following types of features:

These structures are complex because their representation depends on a number of parameters:

There are a number of different approaches to working with these features in R, however it's currently more of a Wild West of options than when dealing with rectangular datafiles. The tidyverse is becoming the de facto standard in rectangular data manipulation, the fairly new sf library provides a framework for working with geographic features that can easily be reused alongside the tidyverse.

The sf library has been funded by the R Consortium and is a significant improvement over previous offerings for working with geographic features. You may previously have worked with the sp library, which created Spatial*DataFrame objects. With sf all features are stored within an "sf data.frame", which can be operated on directly with the tidyerse.

While the sf vignettes are designed for readers with a good knowledge of GIS vocabulary, they are still recommended reading if you are doing a significant amount of GIS work in R:

Geographic regions (shapefiles)

The representation of geographic regions requires a data structure that defines a convex polygon (with or without holes in it), there are essentially two different data structures used to store/communicate/share this data:

ESRI Shapefiles

The ESRI shapefile (Environmental Systems Research Institute) standard is a collection of datafiles that when combined constitute an excellent data structure for representing geographic regions. You'll know you're looking at a set of ESRI shapefiles if you find a .zip/folder containing (at least) the following files:

For our purposes, we're not interested in what these files are for. If you're interested, start at Wikipedia: Shapefile.

Are these good shapefiles?

Obtaining good shapefiles is a challenge in itself. Here are some of the concerns you should have in mind when looking for ESRI shapefiles:

In addition, you may require additional information to be specified for each region e.g. population. Often this information is contained within the shapefiles and can be manipulated directly within R.

However, it's also extremely important to note that shapefiles may or may not contain projection data. You're thoroughly encouraged to obtain unprojected shapefiles, using the WGS84 datum standard (see https://en.wikipedia.org/wiki/World_Geodetic_System). Most example data in this library is obtained from unprojected ESRI shapefiles, you can check if your shapefiles are similarly formatted using this code

library("sf")
library("oidnChaRts")
summary(data_world_shapefiles$geometry)

Importing shapefiles into R

Many of the example datasets in this library are imported ESRI shapefiles, it's important you understand how to import your own shapefiles into R. You need to follow these steps:

  1. Download and unzip shapefiles

ESRI shapefiles are typically provided as .zips, the data_world_shapefiles was obtained from http://www.naturalearthdata.com/downloads/50m-cultural-vectors/ as follows:

dir.create(file.path("data-raw"), showWarnings = FALSE)
download.file(url = "http://www.naturalearthdata.com/http//www.naturalearthdata.com/download/50m/cultural/ne_50m_admin_0_countries.zip",
destfile = "data-raw/world-shape-files.zip")
unzip("data-raw/world-shape-files.zip",
exdir = "data-raw/world-shape-files")
  1. Import with sf

The sf library provides a single super function for importing GIS data, which infers the type of data from file extensions and a number of other heuristics. To import ESRI shapefiles it's simply necessary to specify the following arguements:

Newcomers are often confused by the layer so here's the filenames of the shapefiles folder - they all have the same name meaning they refer to the same "layer":

list.files("data-raw/world-shape-files")

The shapefiles are therefore imported as follows:

library("sf")
world_shapefiles <- read_sf(dsn = "data-raw/world-shape-files/", layer = "ne_50m_admin_0_countries")
summary(world_shapefiles)


martinjhnhadley/oidnChaRts documentation built on May 21, 2019, 12:38 p.m.