# Don't run this vignette on CRAN
NOT_CRAN <- identical(tolower(Sys.getenv("NOT_CRAN")), "true")
knitr::opts_chunk$set(purl = NOT_CRAN, eval = NOT_CRAN)
knitr::opts_chunk$set(warning=FALSE, message=FALSE, collapse=TRUE, comment = "#>")

The postGIStools package extends the standard R / PostgreSQL interface (as implemented in RPostgreSQL) to provide support for two popular PostgreSQL extensions: PostGIS (spatial data) and hstore (key/value pairs).

Reading PostGIS and hstore data into R

We demonstrate the postGIStools functions using a test database hosted on Heroku. It contains a single table country with the following fields:

| name | type | comments | |-----------|--------|-------------------------------------| | name | text | country name | | iso2 | text | ISO two-letter code (primary key) | | capital | text | | | population| integer | | | translations| hstore | key/value pairs where key is language (e.g. "es", "fr") | | geom | geometry | country polygons |

The data originates from the REST Countries API, whereas the country geometries are from the wrld_simpl map included in maptools R package.

Note: Currently, the test database cannot be accessed from a Windows or Mac OS X environment, due the lack of SSL support (which is required by Heroku) in those versions of RPostgreSQL.

To read data from PostgreSQL into R, postGIStools provides the get_postgis_query function. Like the dbGetQuery function in PostgreSQL, it requires a connection object and a SQL statement, which in this case must be a SELECT statement. In addition, the user may identify a geometry and/or hstore field by name.

library(RPostgreSQL)
library(postGIStools)

con <- dbConnect(PostgreSQL(), dbname = "d2u06to89nuqei", user = "mzcwtmyzmgalae",
                 host = "ec2-107-22-246-250.compute-1.amazonaws.com",
                 password = "UTv2BuwJUPuruhDqJthcngyyvO")

countries <- get_postgis_query(con, "SELECT * FROM country 
                               WHERE population > 1000000",
                               geom_name = "geom", hstore_name = "translations")

class(countries)

When a geometry column is specified, the query output is a spatial data frame type from the sp package. The hstore column is converted to a list-column in R, where each "cell" is a named list.

str(countries@data[1:2,])

The query statement can include the output of PostGIS spatial functions. In that case, it is simpler to set an alias for the output column and pass that alias as the geom_name. For example, the following query returns a SpatialPointsDataFrame of the country centroids.

centroids <- get_postgis_query(con, 
                        "SELECT name, ST_Centroid(geom) centr FROM country",
                        geom_name = "centr")
head(centroids)

Working with hstore columns

To interact with hstore columns imported into R, postGIStools defines the %->% operator, which is analogous to -> in PostgreSQL. Specifically, hstore %->% "key" extracts the value in each cell of the hstore corresponding to the given key, or NA when the key is absent from a given cell.

head(countries$translations %->% "es")

The operator is also compatible with single bracket subsetting of the hstore.

countries$translations[5:7] %->% "fr"

The assignment version of %->% operates similarly, with the option of deleting keys by assigning them to NULL.

countries$translations[2] %->% "nl" <- "Albanië"
countries$translations[3] %->% "fr" <- NULL
countries$translations[2:3]

The new_hstore function creates a blank hstore of a given length, which is just an empty list of lists. It is most useful when assigned to a data frame column (e.g. df$hs <- new_hstore(3)) that can be then populated with %->% and written back to a PostgreSQL database.

Inserting and updating PostgreSQL tables from R

The two write methods postgis_insert and postgis_update wrap around their namesake SQL commands, while also converting R spatial objects and list-columns back into the geometry and hstore data types, respectively.

To demonstrate these functions, we create a new temporary table in the database.

dbSendQuery(con, paste("CREATE TEMP TABLE cty_tmp (name text,", 
                       "iso2 text PRIMARY KEY, capital text,",
                       "translations hstore, geom geometry)"))

Calls to postgis_insert must specify the connection, data frame and table name. By default, all data frame columns are inserted, but a subset of columns can be specified as write_cols. In both cases, the names of inserted columns must have a match in the target table.

postgis_insert(con, countries[1:10,], "cty_tmp",
               write_cols = c("name", "iso2", "translations"),
               geom_name = "geom", hstore_name = "translations")

# Reimport to check
cty_tmp <- get_postgis_query(con, paste("SELECT name, iso2, capital,",
                                        "geom, translations FROM cty_tmp"),
                             geom_name = "geom", hstore_name = "translations")
head(cty_tmp@data)

We next update the records in cty_tmp to include the capital field. The syntax of postgis_update is similar to postgis_insert, except that we must specify both id_cols, the column(s) identifying the records to update, as well as update_cols, the column(s) to be updated. (The underlying PostgreSQL operation is of the format UPDATE... SET ... FROM....) Neither the geometry nor the hstore can be used as id_cols. Note that since the input data frame countries[1:10,] includes spatial and list-column data, we need to specify geom_name and hstore_name, even if those columns are not needed for the update operation.

postgis_update(con, countries[1:10,], "cty_tmp", id_cols = "iso2", 
               update_cols = "capital", geom_name = "geom", 
               hstore_name = "translations")

cty_tmp <- get_postgis_query(con, paste("SELECT name, iso2, capital,",
                                        "geom, translations FROM cty_tmp"),
                             geom_name = "geom", hstore_name = "translations")
head(cty_tmp@data)

By default, hstore columns are updated by concatenation: keys present in the input data frame but not the original table are added to the hstore, keys present in both the data frame and table have their associated values updated, but keys absent from the input data frame are not deleted from the table. This can be changed by setting hstore_concat = FALSE, in which case whole hstore cells are replaced with corresponding ones in the input data frame.

countries$translations[2] %->% "nl" <- NULL
countries$translations[3] %->% "fr" <- "Algérie"

postgis_update(con, countries[1:10,], "cty_tmp", id_cols = "iso2", 
               update_cols = "translations", geom_name = "geom", 
               hstore_name = "translations")

cty_tmp <- get_postgis_query(con, paste("SELECT name, iso2, capital,",
                                        "geom, translations FROM cty_tmp"),
                             geom_name = "geom", hstore_name = "translations")
cty_tmp@data[cty_tmp$iso2 %in% c("AL", "DZ"), ]

# Key deletion not reflected in database unless hstore_concat = FALSE
postgis_update(con, countries[1:10,], "cty_tmp", id_cols = "iso2", 
               update_cols = "translations", geom_name = "geom", 
               hstore_name = "translations", hstore_concat = FALSE)

cty_tmp <- get_postgis_query(con, paste("SELECT name, iso2, capital,",
                                        "geom, translations FROM cty_tmp"),
                             geom_name = "geom", hstore_name = "translations")
cty_tmp@data[cty_tmp$iso2 %in% c("AL", "DZ"), ]


SESYNC-ci/postGIStools documentation built on May 9, 2019, 11:09 a.m.