In r-spark/sparklyr.sedona: Sparklyr Extension for Apache Sedona

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

What are `SpatialRDD`s?

SpatialRDDs are basic building blocks of distributed spatial data in Apache Sedona. A SpatialRDD can be partitioned and indexed using well-known spatial data structures to facilitate range queries, KNN queries, and other low-level operations. One can also export records from SpatailRDDs into regular Spark dataframes, making them accessible through Spark SQL and through the dplyr interface of sparklyr.

Creating a SpatialRDD

NOTE: this section is largely based on https://sedona.apache.org/tutorial/rdd/#create-a-spatialrdd, except for examples have been written in R instead of Scala to reflect usages of sparklyr.sedona.

Currently SpatialRDDs can be created in sparklyr.sedona by reading a file in a supported geospatial format, or by extracting data from a Spark SQL query.

For example, the following code will import data from arealm-small.csv into a SpatialRDD:

pt_rdd <- sedona_read_dsv_to_typed_rdd(
  sc,
  location = "arealm-small.csv",
  delimiter = ",",
  type = "point",
  first_spatial_col_index = 1,
  has_non_spatial_attrs = TRUE
)

Records from the example arealm-small.csv file look like the following:

testattribute0,-88.331492,32.324142,testattribute1,testattribute2
testattribute0,-88.175933,32.360763,testattribute1,testattribute2
testattribute0,-88.388954,32.357073,testattribute1,testattribute2

As one can see from the above, each record is comma-separated and consists of a 2-dimensional coordinate starting at the 2nd column and ending at the 3rd column. All other columns contain non-spatial attributes. Because column indexes are 0-based, we need to specify first_spatial_col_index = 1 in the example above to ensure each record is parsed correctly.

In addition to formats such as CSV and TSV, currently sparklyr.sedona also supports reading files in WKT (Well-Known Text), WKB (Well-Known Binary), and GeoJSON formats. See ?sparklyr.sedona::sedona_read_wkt, ?sparklyr.sedona::sedona_read_wkb, and ?sparklyr.sedona::sedona_read_geojson for details.

One can also run to_spatial_rdd() to extract a SpatailRDD from a Spark SQL query, e.g.,

library(sparklyr)
library(sparklyr.sedona)
library(dplyr)

sc <- spark_connect(master = "local")

sdf <- tbl(
  sc,
  sql("SELECT ST_GeomFromText('POINT(-71.064544 42.28787)') AS `geom`, \"point\" AS `type`")
)

spatial_rdd <- sdf %>% to_spatial_rdd(spatial_col = "geom")
print(spatial_rdd)

## $.jobj
## <jobj[70]>
##   org.apache.sedona.core.spatialRDD.SpatialRDD
##   org.apache.sedona.core.spatialRDD.SpatialRDD@422afc5a
##
## ...

will extract a spatial column named "geom" from the Sedona spatial SQL query above and store it in a SpatialRDD object.

r-spark/sparklyr.sedona documentation built on Dec. 22, 2021, 11:56 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

r-spark/sparklyr.sedona
Sparklyr Extension for Apache Sedona

In r-spark/sparklyr.sedona: Sparklyr Extension for Apache Sedona

What are `SpatialRDD`s?

Creating a SpatialRDD

R Package Documentation

Browse R Packages

We want your feedback!

r-spark/sparklyr.sedona Sparklyr Extension for Apache Sedona

In r-spark/sparklyr.sedona: Sparklyr Extension for Apache Sedona

What are SpatialRDDs?

Creating a SpatialRDD

R Package Documentation

Browse R Packages

We want your feedback!

r-spark/sparklyr.sedona
Sparklyr Extension for Apache Sedona

What are `SpatialRDD`s?