knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/REAsDME-",
  out.width = "100%"
)

sfarrow: Read/Write Simple Feature Objects (sf) with 'Apache' 'Arrow'

sfarrow is a package for reading and writing Parquet and Feather files with sf objects using arrow in R.

Simple features are a popular format for representing spatial vector data using data.frames and a list-like geometry column, implemented in the R package sf. Apache Parquet files are an open-source, column-oriented data storage format (https://parquet.apache.org/) which enable efficient read/writing for large files. Parquet files are becoming popular across programming languages and can be used in R using the package arrow.

The sfarrow implementation translates simple feature data objects using well-known binary (WKB) format for geometries and reads/writes Parquet/Feather files. A key goal of the package is for interoperability of the files (particularly with Python GeoPandas), so coordinate reference system information is maintained in a standard metadata format (https://github.com/geopandas/geo-arrow-spec). Note to users: this metadata format is not yet stable for production uses and may change in the future.

Installation

sfarrow is available through CRAN with:

install.packages('sfarrow')

or it can be installed from Github with:

devtools::install_github("wcjochem/sfarrow@main")

Load the library to begin using it.

library(sfarrow)

arrow package

The installation requires the Arrow library which should be installed with the R package arrow dependency. However, some systems may need to follow additional steps to enable full support of that library. Please refer to the arrow documentation.

Basic usage

Reading Parquet data of spatial files created with Python GeoPandas.

# load Natural Earth low-res dataset. 
# Created in Python with geopandas.to_parquet()
path <- system.file("extdata", "world.parquet", package = "sfarrow")

world <- st_read_parquet(path)

world
plot(sf::st_geometry(world))

Writing sf objects to Parquet format files. These Parquet files created with sfarrow can be read within Python using GeoPandas.

nc <- sf::st_read(system.file("shape/nc.shp", package="sf"), quiet=TRUE)

st_write_parquet(obj=nc, dsn=file.path(tempdir(), "nc.parquet"))

# read back into R
nc_p <- st_read_parquet(file.path(tempdir(), "nc.parquet"))

nc_p
plot(sf::st_geometry(nc_p))

For additional examples please see the vignettes.

Contributions

Contributions, questions, ideas, and issue reports are welcome. Please raise an issue to discuss or submit a pull request.

Acknowledgements

This work benefited from the work by developers in the GeoPandas, Arrow, and r-spatial teams. Thank you to the teams for their excellent, open-source work.



wcjochem/sfarrow documentation built on Nov. 2, 2021, 8:03 a.m.