knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
SpatialRDD
s?SpatialRDDs are basic building blocks of
distributed spatial data in Apache Sedona. A SpatialRDD
can be partitioned and indexed using well-known spatial data structures to
facilitate range queries, KNN queries, and other low-level operations. One can also export records from SpatailRDD
s into regular Spark
dataframes, making them accessible through Spark SQL and through the dplyr
interface of sparklyr
.
NOTE: this section is largely based on https://sedona.apache.org/tutorial/rdd/#create-a-spatialrdd, except for examples have been
written in R instead of Scala to reflect usages of sparklyr.sedona
.
Currently SpatialRDD
s can be created in sparklyr.sedona
by reading a file in a supported geospatial format, or by extracting data from a
Spark SQL query.
For example, the following code will import data from arealm-small.csv into a SpatialRDD
:
pt_rdd <- sedona_read_dsv_to_typed_rdd( sc, location = "arealm-small.csv", delimiter = ",", type = "point", first_spatial_col_index = 1, has_non_spatial_attrs = TRUE )
.
Records from the example arealm-small.csv file look like the following:
testattribute0,-88.331492,32.324142,testattribute1,testattribute2 testattribute0,-88.175933,32.360763,testattribute1,testattribute2 testattribute0,-88.388954,32.357073,testattribute1,testattribute2
As one can see from the above, each record is comma-separated and consists of a 2-dimensional coordinate starting at the 2nd column and ending at the 3rd column.
All other columns contain non-spatial attributes. Because column indexes are 0-based, we need to specify first_spatial_col_index = 1
in the example above to
ensure each record is parsed correctly.
In addition to formats such as CSV and TSV, currently sparklyr.sedona
also supports reading files in WKT (Well-Known Text), WKB (Well-Known Binary), and GeoJSON formats.
See ?sparklyr.sedona::sedona_read_wkt
, ?sparklyr.sedona::sedona_read_wkb
, and ?sparklyr.sedona::sedona_read_geojson
for details.
One can also run to_spatial_rdd()
to extract a SpatailRDD from a Spark SQL query, e.g.,
library(sparklyr) library(sparklyr.sedona) library(dplyr) sc <- spark_connect(master = "local") sdf <- tbl( sc, sql("SELECT ST_GeomFromText('POINT(-71.064544 42.28787)') AS `geom`, \"point\" AS `type`") ) spatial_rdd <- sdf %>% to_spatial_rdd(spatial_col = "geom") print(spatial_rdd)
## $.jobj ## <jobj[70]> ## org.apache.sedona.core.spatialRDD.SpatialRDD ## org.apache.sedona.core.spatialRDD.SpatialRDD@422afc5a ## ## ...
will extract a spatial column named "geom"
from the Sedona spatial SQL query above and store it in a SpatialRDD
object.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.