The MazamaSpatialUtils package was created by MazamaScience to
regularize our work with spatial data. The sp, rgdal and maptools packages have
made it much easier to work with spatial data found in shapefiles. Many sources of shapefile
data are available and can be used to make beautiful maps in R. Unfortunately, the data attached
to these datasets, even when fairly complete, often lacks standardized identifiers such as the
ISO 3166-1 alpha-2 encodings for countries. Maddeningly, even when these ISO codes are used,
the dataframe column in which they are stored does not have a standardized name. It may be
COUNTRY or any of a dozen other names we have seen.
While many mapping packages provide 'natural' naming of countries, those who wish to develop operational, GIS-like systems need something that is both standardized and language-independent. The ISO 3166-1 alpha-2 encodings have emerged as the de facto standard for this sort of work. In similar fashion, ISO 3166-2 alpha-2 encodings are available for the next administrative level down -- state/province/oblast, etc. For timezones, the de facto standard is the set of Olson timezones used in all UNIX systems.
The main goal of this package is to create an internally standardized set of spatial data
that we can use in various projects. Along with three built-in datasets, this package provides
convert~() functions for other spatial datasets that we currently use. These convert functions
all follow the same recipe:
Other datasets can be added following the same procedure.
The 'package internal standards' are very simple.
1) Every spatial dataset must contain the following columns:
2) Spatial datasets with timezone data must contain the following column:
3) Spatial datasets at scales smaller than the nation-state should contain the following column:
If other columns contain these data, those columns must be renamed or duplicated with the internally standardized name. This simple level of consistency makes it possible to generate maps for any data that is ISO encoded. It also makes it possible to create functions that return the country, state or timezone associated with a set of locations.
The core functionality for which this package was developed is determining spatial information associated with a set of locations.
Current functionality includes the following:
getCountry~(lon,lat,...)-- returns names, ISO codes and other country-level data specified lo cations
getState~(lon,lat,...)-- returns names, ISO codes and other state-level at specified locations
getUSCounty~(lon,lat,...)-- returns names and other county-level data at specified locations
getTimezones(lon,lat,...)-- returns Olson timezones and other data at specified locations
getHUC~(lon,lat,...)-- returns USGS Hydrologic Unit Codes and other data at specified locations
getSpatialData(lon,lat,...) returns a dataframe whose rows are associated with specified locations.
This function can be used with newly converted SpatialPolygonsDataFrames.
For those working with geo-located data, the ability to enhance location metadata with this information can be extremely helpful.
When using MazamaSpatialUtils, always run
setSpatialDataDir(<spatial_data_directory>) first. This
sets the directory where spatial data will be installed and loaded from.
MazamaSpatialUtils has three built-in datasets:
SimpleCountries-- country outlines
SimpleCountriesEEZ-- country outlines including Exclusive Economic Zones over water
Version 0.5.x of the package is built around the three built-in datasets and several other core datasets that may be installed including:
2 Mb EEZCountries.RData-- country boundaries including Exclusive Economic Zones
15 Mb NaturalEarthAdm1.RData-- state/province/oblast level boundaries
61 Mb OSMTimezones.RData-- Open Street Map high resolution time zones
4 Mb TMWorldBorders.RData-- high resolution country level boundaries
48 Mb TerrestrialEcoregions.RData-- terrestrial eco-regions
7 Mb USCensus115thCongress.RData-- US congressional districts
2 Mb USCensusCounties.RData-- US county level boundaries
3 Mb USCensusStates.RData-- US state level boundaries
3 Mb USIndianLands.RData-- US tribal boundaries
16 Mb WorldTimezones.RData-- high resolution timezones
Install these with:
This may take some time to download and untar the 170 Mb file.
Once datasets have been installed,
loadSpatialData() can be used load datasets found in the
SpatialDataDir that match a particular pattern, e.g:
Additional watershed data from the Watershed Boundary Dataset are quite large and can be downloaded separately from http://mazamascience.com/RData/Spatial.
38 Mb WBDHU2.RData-- USGS level 2 watersheds
107 Mb WBDHU4.RData-- USGS level 4 watersheds
136 Mb WBDHU6.RData-- USGS level 6 watersheds
294 Mb WBDHU8.RData-- USGS level 8 watersheds
768 Mb WBDHU10.RData-- USGS level 10 watersheds
1.4 Gb WBDHU12.RData-- USGS level 12 watersheds
These two functions are used for assigning countries to one or many locations.
getCountry() returns English country names and
getCountryCode() returns the ISO-3166 two
character country code. Both functions can be passed
allData = TRUE which returns a dataframe
with more information on the countries. You can also specify
countryCodes = c(<codes>) to speed
up searching by restricting the search to polygons associated within those countries.
These functions use the package-internal
SimpleCountries dataset which can be used without
loading any additional datasets.
In this example we'll find which countries a vector of points fall in.
library(MazamaSpatialUtils) lon <- c(-122.3, -73.5, 21.1, 2.5) lat <- c(47.5, 40.75, 52.1, 48.5) # Get countries/codes associated with locations getCountry(lon, lat) getCountryCode(lon, lat) # Review all available data getCountry(lon, lat, allData=TRUE)
Similar to above, these functions return state names and ISO 3166 code. They also take
the same arguments. Adding the
countryCodes argument is more important for
getStateCode() because the
NaturalEarthAdm1 dataset is fairly large. Lets use the
lon variables as above and assign states to those locations.
These functions require installation of the large
NaturalEarthAdm1 dataset which is not
distributed with the package.
(The next block of code is not evaluated in the vignette.)
# Load states dataset if you haven't already loadSpatialData('NaturalEarthAdm1') # Get country codes associated with locations countryCodes <- getCountryCode(lon, lat) # Pass the countryCodes as an argument to speed everything up getState(lon, lat, countryCodes = countryCodes) getStateCode(lon, lat, countryCodes = countryCodes) # This is a very detailed dataset so we'll grab a few important columns states <- getState(lon, lat, allData=TRUE, countryCodes = countryCodes) states[c('countryCode', 'stateCode', 'stateName')]
Returns the Olsen Timezone where the given points are located. Arguments are the same as
the previous functions.
allData=TRUE will return other useful information such as the UTC Offset.
These functions use the package-internal
SimpleTimezones dataset which can be used
without loading any additional datasets.
# Find the timezones the points are in getTimezone(lon, lat) # Get country codes associated with locations countryCodes <- getCountryCode(lon, lat) # Pass the countryCodes as an argument to potentially speed things up getTimezone(lon, lat, countryCodes = countryCodes) # Review all available data getTimezone(lon, lat, allData=TRUE, countryCodes = countryCodes)
Returns the US County which name pairs of coordinates fall in. The arguments are similar
as above except that
stateCodes=c() is used instead of
countryCodes=c() since this
dataset is US specific.
(The next block of code is not evaluated in the vignette.)
# Load counties dataset if you haven't already loadSpatialData("USCensusCounties") # New dataset of points only in the US stateCodes <- getStateCode(lon,lat) # Optionally pass the stateCodes as an argument to speed everything up getUSCounty(lon, lat, stateCodes=stateCodes) getUSCounty(lon, lat, allData=TRUE, stateCodes=stateCodes)
While identifying the states, countries and timezones associated with a set of locations is important, we can also generate some quick eye candy with these datasets. Let's color the timezones by the data variable 'UTC_offset'
library(sp) # For spatial plotting # Assign timezones polygons an index based on UTC_offset colorIndices <- .bincode(SimpleTimezones@data$UTC_offset, breaks=seq(-12.5,12.5,1)) # Color our timezones by UTC_offset plot(SimpleTimezones, col=rainbow(25)[colorIndices]) title(line=0,'Timezone Offsets from UTC')
On of the main reasons for ensuring that our spatial datasets use ISO encoding is that it makes it easy to generate plots with any datasets that use that encoding. Here is a slightly more involved example using Energy data from the British Petroleum Statistical Review that has been ISO-encoded.
library(sp) # For spatial plotting # Read in ISO-encoded oil production and consumption data prod <- read.csv(url('http://mazamascience.com/OilExport/BP_2016_oil_production_bbl.csv'), skip=6, stringsAsFactors=FALSE, na.strings='na') cons <- read.csv(url('http://mazamascience.com/OilExport/BP_2016_oil_consumption_bbl.csv'), skip=6, stringsAsFactors=FALSE, na.strings='na') # Only work with ISO-encoded columns of data prodCountryCodes <- names(prod)[ stringr::str_length(names(prod)) == 2 ] consCountryCodes <- names(cons)[ stringr::str_length(names(cons)) == 2 ] # Use the last row (most recent data) lastRow <- nrow(prod) year <- prod$YEAR[lastRow] # Neither dataframe contains all countries so create four categories based on the # amount of information we have: netExporters, netImporters, exportOnly, importOnly sharedCountryCodes <- intersect(prodCountryCodes,consCountryCodes) net <- prod[lastRow, sharedCountryCodes] - cons[lastRow, sharedCountryCodes] # Find codes associated with each category netExportCodes <- sharedCountryCodes[net > 0] netImportCodes <- sharedCountryCodes[net <= 0] exportOnlyCodes <- setdiff(prodCountryCodes,consCountryCodes) importOnlyCodes <- setdiff(consCountryCodes,prodCountryCodes) # Create a logical 'mask' associated with each category netExportMask <- SimpleCountries@data$countryCode %in% netExportCodes netImportMask <- SimpleCountries@data$countryCode %in% netImportCodes onlyExportMask <- SimpleCountries@data$countryCode %in% exportOnlyCodes onlyImportMask <- SimpleCountries@data$countryCode %in% importOnlyCodes color_export = '#40CC90' color_import = '#EE5555' color_missing = 'gray90' # Base plot (without Antarctica) notAQ <- SimpleCountries@data$countryCode != 'AQ' plot(SimpleCountries[notAQ,],col=color_missing) plot(SimpleCountries[netExportMask,],col=color_export,add=TRUE) plot(SimpleCountries[onlyExportMask,],col=color_export,add=TRUE) plot(SimpleCountries[netImportMask,],col=color_import,add=TRUE) plot(SimpleCountries[onlyImportMask,],col=color_import,add=TRUE) legend('bottomleft',legend=c('Net Exporters','Net Importers'),fill=c(color_export,color_import)) title(line=0,paste('World Crude Oil in',year))
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.