knitr::opts_chunk$set( collapse = TRUE, comment = "#>", eval = RPresto::presto_has_default() )
In this vignette, we demonstrate how complex structural types in Presto can be translated into R types (e.g., vectors, list, and tibbles).
library(RPresto)
You can check your RPresto
version by running the packageVersion()
function.
You need version 1.3.9 or later to have a more comprehensive and robust
complex types support.
packageVersion("RPresto")
Complex types refer to structural types including ARRAY
, MAP
and ROW
.
Those data types are basically containers to hold other data types (hence
complex).
We summarize the similarities and differences between the complex types below.
| Type | Is atomic? | Has names/keys? | |------|:----------:|:---------------:| | ARRAY | Yes | No | | MAP | Yes | Yes | | ROW | No | Yes |
Atomic here means all elements in the container share the same type (usually
primitive types, but can be complex types too). For example, an ARRAY
of
integer can only hold integer type elements whereas a ROW
can have elements of
different types (e.g., one element is integer and the other boolean).
We assume that the user already have a Presto server with a memory connector set up. If you don't have such a server set up, refer to the Presto documentation for instructions if you want to follow along.
We first create a PrestoConnection
which will serve as the bridge between the
Presto server and R.
con <- DBI::dbConnect( drv = RPresto::Presto(), host = "http://localhost", port = 8080, user = Sys.getenv("USER"), catalog = "memory", schema = "default" )
We first issue a simple query to see if the Presto connection is working properly.
DBI::dbGetQuery(con, "SELECT 1+1 AS res")
| Presto type | R type | |-------------|:------:| | ARRAY | unnamed typed vector | | MAP | named typed vector |
ARRAY
sGiven the atomic and unnamed nature of ARRAY
s, we map them to unnamed typed
vectors in R.
ARRAY
s of primitive typesWe first create a table with ARRAY
s of all supported primitive Presto data
types using the create_primitive_arrays_table()
function included
in the RPresto
package.
RPresto:::create_primitive_arrays_table( con, table_name = "presto_primitive_arrays", verbose = FALSE )
We can check if the table now exists in Presto.
DBI::dbExistsTable(con, "presto_primitive_arrays")
We can list the fields in the table. They are named after the Presto types they represent.
DBI::dbListFields(con, "presto_primitive_arrays")
ARRAY
s to R vectors( df.array_of_primitive_types <- dbGetQuery( con, "SELECT * FROM presto_primitive_arrays", bigint = "integer64" ) )
We can verify the R types of each column.
tibble::enframe(purrr::map_chr(df.array_of_primitive_types, ~class(.[[1]])[1]))
All vectors are unnamed.
purrr::every(df.array_of_primitive_types, ~is.null(names(.[[1]])))
We can also call functions such as length()
on each of the columns to get the
ARRAY
cardinality. It shows that all ARRAY
s have 3 elements in them.
tibble::enframe(purrr::map_int(df.array_of_primitive_types, ~length(.[[1]])))
MAP
sA MAP
in Presto can be thought as a combination of two same-length ARRAY
s.
The first ARRAY
contains the keys of the MAP
and the second contains the
values of the MAP
. In fact, that's exactly how MAP
literals are created in
Presto (e.g., MAP(ARRAY[1, 2], ARRAY['a', 'b'])
creates a 2-element MAP
).
Following the logic, we translate MAP
s to named typed vectors in R.
MAP
S of primitive typesWe first create a table with MAPS
s of all supported primitive Presto data
types using the create_primitive_maps_table()
function included
in the RPresto
package.
RPresto:::create_primitive_maps_table( con, table_name = "presto_primitive_maps", verbose = FALSE )
We can check if the table now exists in Presto.
DBI::dbExistsTable(con, "presto_primitive_maps")
We can list the fields in the table. They are named after the Presto types they represent.
DBI::dbListFields(con, "presto_primitive_maps")
MAP
s to R vectors( df.map_of_primitive_types <- dbGetQuery( con, "SELECT * FROM presto_primitive_maps", bigint = "integer64" ) )
We can verify the R types of each column.
tibble::enframe(purrr::map_chr(df.map_of_primitive_types, ~class(.[[1]])[1]))
All vectors are named.
purrr::none(df.map_of_primitive_types, ~is.null(names(.[[1]])))
ARRAY
s and MAP
sIt's possible to have repeated ARRAY
s and MAP
s in Presto in the form of
ARRAY
s of ARRAY
s and ARRAY
s of MAP
s.
| Repeated Presto type | R type | |----------------------|--------| | ARRAY of ARRAY | not supported | | ARRAY of MAP | unnamed list of named typed vectors |
We are not supporting nested ARRAY
s at the moment although it's technically
possible in Presto.
For ARRAY
s of MAP
s, we translate the ARRAY
container into an unnamed list
and each of the MAP
element into a named typed vector.
We first create an array-of-maps table by using the
create_array_of_maps_table()
function.
RPresto:::create_array_of_maps_table( con, table_name = "presto_array_of_maps", verbose = FALSE )
We can check if the table now exists in Presto.
DBI::dbExistsTable(con, "presto_array_of_maps")
We can list the fields in the table.
DBI::dbListFields(con, "presto_array_of_maps")
Let's import all the data into R.
( df.array_of_maps <- dbGetQuery( con, "SELECT * FROM presto_array_of_maps", bigint = "integer64" ) )
We need to pry open the wrapping unnamed list to reveal the types of the vectors underneath.
tibble::enframe(purrr::map_chr(df.array_of_maps, ~class(.[[1]][[1]])[1]))
ROW
typeThe easiest way to think about the ROW
type in Presto is to think of it
literally as a row of a table. Just as a table can have multiple columns of
different data types, a ROW
can have multiple elements of different types. And
just like a table having a name for each column, every element of a ROW
has a
name associated with the value.
Depending on whether the ROW
type is repeated (i.e., wrapped in an ARRAY
),
the translation into R is different.
We translate single ROW
value to a named list in R.
Rather than interpret repeated ROW
s (i.e., ARRAY
of ROW
s) as a list of
named lists, we translate the collection of ROW
s into a tibble.
| Presto type | R type | |-------------|--------| | Single ROW | named list | | Repeated ROWs | tibble |
ROW
translationTo demonstrate how ROW
types are translated into R types, we first create a
table using an auxiliary create_primitive_rows_table()
function
included in the package. The resulting table has only 1 column named
row_primitive_types
which is a ROW
that includes 18 sub-columns representing
all supported primitive types.
RPresto:::create_primitive_rows_table( con, table_name = "presto_primitive_rows", verbose = FALSE )
We can check if the table now exists in Presto.
DBI::dbExistsTable(con, "presto_primitive_rows")
We can list the fields in the table.
DBI::dbListFields(con, "presto_primitive_rows")
We can then retrieve all the data from the table.
( df.row_of_primitive <- dbGetQuery( con, "SELECT row_primitive_types FROM presto_primitive_rows", bigint = "integer64" ) )
We can check the R types of each element in the named list.
tibble::enframe( purrr::map_chr(df.row_of_primitive$row_primitive_types[[1]], ~class(.)[1]) )
ROW
s translationTo construct a repeated ROW
column, we use the auxiliary
create_array_of_rows_table()
function.
RPresto:::create_array_of_rows_table( con, table_name = "presto_array_of_rows", verbose = FALSE )
We can check if the table now exists in Presto and the field name.
DBI::dbExistsTable(con, "presto_array_of_rows")
DBI::dbListFields(con, "presto_array_of_rows")
We can import the whole data into R.
( df.array_of_rows <- dbGetQuery( con, "SELECT array_of_rows FROM presto_array_of_rows", bigint = "integer64" ) )
We can verify the tibble's column types.
tibble::enframe( purrr::map_chr(df.array_of_rows$array_of_rows[[1]], ~class(.)[1]) )
DBI::dbDisconnect(con)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.