Getting Started

The noctua package aims to make it easier to work with data stored in AWS Athena. noctua package attempts to provide three levels of interacting with AWS Athena:

Installing noctua:

As noctua utilising the R AWS SDK paws the installation of noctua is pretty straight forward:

# cran version
install.packages("noctua")

# Dev version
remotes::install_github("dyfanjones/noctua")

Docker Example:

To help with users wishing to run noctua in a docker, a simple docker file has been created here. To set up the docker please refer to link. For demo purposes we will use the example docker and run it locally:

# build docker image
docker build . -t noctua

# start container with aws credentials passed from local
docker run \
      -e AWS_ACCESS_KEY_ID="$(aws configure get aws_access_key_id)" \
      -e AWS_SECRET_ACCESS_KEY="$(aws configure get aws_secret_access_key)" \
      -e AWS_SESSION_TOKEN="$(aws configure get aws_session_token)" \
      -e AWS_DEFAULT_REGION="$(aws configure get region)" \
      -it noctua

NOTE: readr isn't required for noctua, however it has been included in the docker file to improve performance when querying AWS Athena.

Usage:

Low - Level API:

library(DBI)
library(noctua)

con <- dbConnect(athena())

# list all current work groups in AWS Athena
list_work_groups(con)

# Create a new work group
create_work_group(con, "demo_work_group", description = "This is a demo work group",
                  tags = tag_options(key= "demo_work_group", value = "demo_01"))

DBI:

library(DBI)

con <- dbConnect(noctua::athena())

# Get metadata 
dbGetInfo(con)

# $profile_name
# [1] "default"
# 
# $s3_staging
# [1] ######## NOTE: Please don't share your S3 bucket to the public
# 
# $dbms.name
# [1] "default"
# 
# $work_group
# [1] "primary"
# 
# $poll_interval
# NULL
# 
# $encryption_option
# NULL
# 
# $kms_key
# NULL
# 
# $expiration
# NULL
# 
# $region_name
# [1] "eu-west-1"
# 
# $paws
# [1] "0.1.6"
# 
# $noctua
# [1] "1.5.1"

# create table to AWS Athena
dbWriteTable(con, "iris", iris)

dbGetQuery(con, "select * from iris limit 10")
# Info: (Data scanned: 860 Bytes)
#  sepal_length sepal_width petal_length petal_width species
# 1:           5.1         3.5          1.4         0.2  setosa
# 2:           4.9         3.0          1.4         0.2  setosa
# 3:           4.7         3.2          1.3         0.2  setosa
# 4:           4.6         3.1          1.5         0.2  setosa
# 5:           5.0         3.6          1.4         0.2  setosa
# 6:           5.4         3.9          1.7         0.4  setosa
# 7:           4.6         3.4          1.4         0.3  setosa
# 8:           5.0         3.4          1.5         0.2  setosa
# 9:           4.4         2.9          1.4         0.2  setosa
# 10:          4.9         3.1          1.5         0.1  setosa

dplyr:

library(dplyr)

athena_iris <- tbl(con, "iris")

athena_iris %>%
  select(species, sepal_length, sepal_width) %>% 
  head(10) %>%
  collect()

# Info: (Data scanned: 860 Bytes)
# # A tibble: 10 x 3
# species  sepal_length sepal_width
# <chr>           <dbl>       <dbl>
# 1 setosa            5.1         3.5
# 2 setosa            4.9         3  
# 3 setosa            4.7         3.2
# 4 setosa            4.6         3.1
# 5 setosa            5           3.6
# 6 setosa            5.4         3.9
# 7 setosa            4.6         3.4
# 8 setosa            5           3.4
# 9 setosa            4.4         2.9
# 10 setosa           4.9         3.1

Useful Links:



Try the noctua package in your browser

Any scripts or data that you put into this service are public.

noctua documentation built on Aug. 9, 2023, 1:07 a.m.