README.md

dbtools

About

A package that is used to run SQL queries configured for the Analytical Platform. This package is a reticulated wrapper around pydbtools which uses AWS Wrangler's Athena module but adds additional functionality (like Jinja templating, creating temporary tables) and alters some configuration to our specification.

Alternatively you probably want to use Rdbtools, which has the advantages of supporting dbplyr and being R-native, so there's no messing with reticulate and Python which cause endless problems.

Installation

Run the following commands in the R console.

# Set up the project to use renv, if not already done
renv::init()
# Tell renv that Python will be used
renv::use_python()
# Install the reticulate library to interface with Python
renv::install("reticulate")
# Install the Python library pydbtools
reticulate::py_install("pydbtools")
# Install dbtools
renv::install("moj-analytical-services/dbtools")

Quickstart guide

There is a vignette with more details but the following describes the basics of the package.

Read an SQL Athena query into an R dataframe

library(dbtools)

df <- read_sql_query("SELECT * from a_database.table LIMIT 10")

Run a query in Athena

response <- dbtools::start_query_execution_and_wait(
  "CREATE DATABASE IF NOT EXISTS my_test_database"
)

Create temporary tables to do further separate SQL queries on later

dbtools::create_temp_table(
  "SELECT a_col, count(*) as n FROM a_database.table GROUP BY a_col", 
  table_name="temp_table_1"
)
df <- dbtools::read_sql_query("SELECT * FROM __temp__.temp_table_1 WHERE n < 10")

Delete databases, tables and partitions together with the data on S3

dbtools::delete_partitions_and_data(
  database='my_database', 
  table='my_table', 
  expression='year = 2020 or year = 2021'
)
dbtools::delete_table_and_data(database='my_database', table='my_table')
dbtools::delete_database('my_database')

# These can be used for temporary databases and tables.
dbtools::delete_table_and_data(database='__temp__', table='my_temp_table')

Use Jinja templating to inject arguments into your SQL

sql_template <- "SELECT * FROM {{ db_name }}.{{ table }}"
sql <- dbtools::render_sql_template(sql_template, {"db_name": db_name, "table": "department"})
df <- dbtools::read_sql_query(sql)

cat("SELECT * FROM {{ db_name }}.{{ table_name }}", file="tempfile.sql")
sql <- dbtools::get_sql_from_file("tempfile.sql", jinja_args={"db_name": db_name, "table_name": "department"})
dbtools::read_sql_query(sql)

Changelog:

3.0.0 - 2022-02-03

2.0.3 - 2020-04-29

2.0.2 - 2019-06-14

2.0.1 - 2019-04-23

v2.0.0 - 2019-02-08

v1.0.0 - 2019-01-14

v0.0.2 - 2018-10-12

v0.0.1 - First Release



moj-analytical-services/dbtools documentation built on Sept. 29, 2024, 2:06 a.m.