quak provides convenient utilities for using DuckDB with datasets stored in
Azure Data Lake Storage Gen2 (abfss://). It opens connections configured for
Azure-backed Delta Lake data, registers Azure credentials as DuckDB secrets, and
works with both SQL via DBI and lazy table queries via dplyr and dbplyr.
quak is not yet on CRAN. Install the development version from GitHub:
# install.packages("remotes")
remotes::install_github("pedrobtz/quak")
Start by opening an Azure-ready DuckDB connection. Then register one Azure secret on that connection.
library(quak)
conn <- az_conn()
Use a credential chain when you want DuckDB to resolve credentials itself, for example from the Azure CLI:
library(quak)
conn <- az_conn()
az_set_chain_secret(conn, chain = "cli")
Use an access token when another package obtains the token for you:
library(quak)
az_creds <- azr::DefaultCredential$new(
scope = az_default_scope()
)
conn <- az_conn()
az_set_token_secret(conn, token = az_creds$get_token()$access_token)
Use a service principal when you want to pass the application credentials directly:
library(quak)
conn <- az_conn()
az_set_sp_secret(
conn,
tenant_id = Sys.getenv("AZURE_TENANT_ID"),
client_id = Sys.getenv("AZURE_CLIENT_ID"),
client_secret = Sys.getenv("AZURE_CLIENT_SECRET")
)
All secret helpers also accept account = "storageaccount" to scope the secret
to one storage account.
Use load_delta() when you want to register a Delta table in DuckDB and query
it with SQL.
library(quak)
conn <- az_conn()
az_set_chain_secret(conn, chain = "cli")
load_delta(
conn = conn,
url = "abfss://container@account/path/sales",
name = "sales",
method = "view"
)
DBI::dbGetQuery(
conn,
"SELECT COUNT(*) AS n, AVG(amount) AS avg_amount
FROM sales
WHERE amount > 100"
)
DBI::dbDisconnect(conn, shutdown = TRUE)
Use tbl_delta() when you want to work with a Delta table through dplyr/dbplyr.
library(quak)
conn <- az_conn()
az_set_chain_secret(conn, chain = "cli")
sales <- tbl_delta(conn, "abfss://container@account/path/sales")
sales |>
dplyr::filter(amount > 100) |>
dplyr::summarise(avg_amount = mean(amount, na.rm = TRUE)) |>
dplyr::collect()
DBI::dbDisconnect(conn, shutdown = TRUE)
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.