knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  out.width = "100%"
)

RGDALDB

The goal of RGDALDB is to provide a DBI wrapper for GDAL's ExecuteSQL in sf.

What's here?

A patchy work through of building a DBI backend for GDAL using sf.

The core is sf::st_read(, query = '') in an unsupported branch of sf, it relies on OGRSQL, which is not a real database engine.

https://www.gdal.org/ogr_sql.html

But, we should be able to make it useable using the DBI abstractions and support in dbplyr.

These official guides to doing this have been partly implemented in this package:

https://cran.r-project.org/web/packages/DBI/vignettes/backend.html

https://cran.r-project.org/web/packages/dbplyr/vignettes/new-backend.html

There is significant overlap in the support in sf to read from PostGIS, so that needs clarifying.

Installation

remotes::install_cran("sf")
devtools::install_github("mdsumner/RGDALDB")

Example

A pretty good example, ideally no data is read until collect() is called.

library(dplyr)
library(DBI)
library(RGDALDB)

con <- DBI::dbConnect(RGDALDB::GDALDB(), dsn = system.file("shape/nc.shp", package="sf"))
#con <- DBI::dbConnect(RGDALDB::GDALDB(), dsn = system.file("gpkg/nc.gpkg", package="sf"))

(nclazy <- tbl(con, "nc"))

library(sf)
nclazy %>% 
  dplyr::filter(AREA > .1) %>% 
  dplyr::select(AREA, PERIMETER, BIR74, FIPS) %>% 
  collect() %>%  ## consider making plot and collect methods for the tbl_GDALDBConnection
  st_as_sf() %>% 
  plot()

Rough and ready examples

library(dplyr)
library(DBI)
con <- DBI::dbConnect(RGDALDB::GDALDB(), dsn = system.file("extdata/nc.gpkg", package= "RGDALDB"))
#DBI::dbWriteTable(con, "mtcars", mtcars)

tbl(con, "nc.gpkg")

tbl(con, "nc.gpkg") %>% filter(between(AREA, 0.05, 1))

# * summarise(), mutate(), filter() etc: powered by sql_select()
# * left_join(), inner_join(): powered by sql_join()
# * semi_join(), anti_join(): powered by sql_semi_join()
# * union(), intersect(), setdiff(): powered by sql_set_op()

## NO, because we expect sf not a data frame
##  tbl(con, "nc.gpkg") %>% filter(between(AREA, 0.05, 0.1)) %>% summarize(min(AREA)) 


tbl(con, "nc.gpkg") %>% filter(between(AREA, 0.05, 0.1)) %>% mutate(AREA = AREA * 2)

## this works, no sense until we have either multiple layers or control over the row_number
tbl(con, "nc.gpkg") %>% left_join(tbl(con, "nc.gpkg"), "CRESS_ID")

## FID is not working, need to wrap for row_number
#tbl(con, "nc.gpkg") %>% select(geom, AREA)  %>% mutate(ID =row_number())

Please note that the 'RGDALDB' project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.



mdsumner/RGDALDB documentation built on Oct. 14, 2019, 6:16 p.m.