#| include: false knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) library(prqlr)
PRQL (Pipelined Relational Query Language, pronounced "Prequel") is a modern language for transforming data, can be compiled to SQL.
This package provides a simple function to convert a PRQL query string to a SQL string.
For example, this is a PRQL query.
#| label: sample #| eval: false from mtcars filter cyl > 6 select {cyl, mpg} derive {mpg_int = math.round 0 mpg}
And, this is the SQL query that is compiled from the PRQL query.
#| label: sample #| echo: false #| engine-opts: #| signature_comment: false
To compile a PRQL string, just pass the query string to the prql_compile()
function, like this.
library(prqlr) " <<sample>> " |> prql_compile() |> cat()
This output SQL query string can be used with already existing great packages that manipulate data with SQL.
Using it with the {DBI}
package, we can execute PRQL queries against the database.
library(DBI) # Create an ephemeral in-memory SQLite database con <- dbConnect(RSQLite::SQLite(), ":memory:") # Create a table inclueds `mtcars` data dbWriteTable(con, "mtcars", mtcars) # Execute a PRQL query " <<sample>> take 3 " |> prql_compile("sql.sqlite") |> dbGetQuery(con, statement = _)
We can also use the sqldf::sqldf()
function to automatically register Data Frames to the database.
" <<sample>> take 3 " |> prql_compile("sql.sqlite") |> sqldf::sqldf()
Since SQLite is used here via {RSQLite}
, the target
option of prql_compile()
is set to "sql.sqlite"
.
Available target names can be found with the prql_get_targets()
function.
Using {prqlr}
with the {tidyquery}
package, we can execute PRQL queries against R Data Frames via {dplyr}
.
{dplyr}
is a very popular R package for manipulating Data Frames,
and the PRQL syntax is very similar to the {dplyr}
syntax.
Let's run a query that aggregates a Data Frame flights
, contained in the {nycflights13}
package.
library(tidyquery) library(nycflights13) " from flights filter (distance | in 200..300) filter air_time != null group {origin, dest} ( aggregate { num_flts = count this, avg_delay = (average arr_delay | math.round 0) } ) sort {-origin, avg_delay} take 2 " |> prql_compile() |> query()
This query can be written with {dplyr}
's functions as follows.
library(dplyr, warn.conflicts = FALSE) library(nycflights13) flights |> filter( distance |> between(200, 300), !is.na(air_time) ) |> group_by(origin, dest) |> summarise( num_flts = n(), avg_delay = mean(arr_delay, na.rm = TRUE) |> round(0), .groups = "drop" ) |> arrange(desc(origin), avg_delay) |> head(2)
Note that {dplyr}
queries can be generated by the tidyquery::show_dplyr()
function!
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.