The noctua
package aims to make it easier to work with data stored in AWS Athena
. noctua
package attempts to provide three levels of interacting with AWS Athena:
AWS Athena
backend utilising the AWS SDK paws
. This includes configuring AWS Athena Work Groups
to assuming different roles within AWS
when connecting to AWS Athena
.noctua
, by providing a DBI
interface to AWS Athena
. Users are able to interact with AWS Athena
utilising familiar functions and methods they have used for other Databases from R.dplyr
is coming more popular, noctua
aims to give dplyr
a seamless interface into AWS Athena
.noctua
:As noctua
utilising the R AWS SDK paws
the installation of noctua
is pretty straight forward:
# cran version install.packages("noctua") # Dev version remotes::install_github("dyfanjones/noctua")
To help with users wishing to run noctua
in a docker, a simple docker file has been created here. To set up the docker please refer to link. For demo purposes we will use the example docker and run it locally:
# build docker image docker build . -t noctua # start container with aws credentials passed from local docker run \ -e AWS_ACCESS_KEY_ID="$(aws configure get aws_access_key_id)" \ -e AWS_SECRET_ACCESS_KEY="$(aws configure get aws_secret_access_key)" \ -e AWS_SESSION_TOKEN="$(aws configure get aws_session_token)" \ -e AWS_DEFAULT_REGION="$(aws configure get region)" \ -it noctua
NOTE: readr
isn't required for noctua
, however it has been included in the docker file to improve performance when querying AWS Athena.
library(DBI) library(noctua) con <- dbConnect(athena()) # list all current work groups in AWS Athena list_work_groups(con) # Create a new work group create_work_group(con, "demo_work_group", description = "This is a demo work group", tags = tag_options(key= "demo_work_group", value = "demo_01"))
library(DBI) con <- dbConnect(noctua::athena()) # Get metadata dbGetInfo(con) # $profile_name # [1] "default" # # $s3_staging # [1] ######## NOTE: Please don't share your S3 bucket to the public # # $dbms.name # [1] "default" # # $work_group # [1] "primary" # # $poll_interval # NULL # # $encryption_option # NULL # # $kms_key # NULL # # $expiration # NULL # # $region_name # [1] "eu-west-1" # # $paws # [1] "0.1.6" # # $noctua # [1] "1.5.1" # create table to AWS Athena dbWriteTable(con, "iris", iris) dbGetQuery(con, "select * from iris limit 10") # Info: (Data scanned: 860 Bytes) # sepal_length sepal_width petal_length petal_width species # 1: 5.1 3.5 1.4 0.2 setosa # 2: 4.9 3.0 1.4 0.2 setosa # 3: 4.7 3.2 1.3 0.2 setosa # 4: 4.6 3.1 1.5 0.2 setosa # 5: 5.0 3.6 1.4 0.2 setosa # 6: 5.4 3.9 1.7 0.4 setosa # 7: 4.6 3.4 1.4 0.3 setosa # 8: 5.0 3.4 1.5 0.2 setosa # 9: 4.4 2.9 1.4 0.2 setosa # 10: 4.9 3.1 1.5 0.1 setosa
library(dplyr) athena_iris <- tbl(con, "iris") athena_iris %>% select(species, sepal_length, sepal_width) %>% head(10) %>% collect() # Info: (Data scanned: 860 Bytes) # # A tibble: 10 x 3 # species sepal_length sepal_width # <chr> <dbl> <dbl> # 1 setosa 5.1 3.5 # 2 setosa 4.9 3 # 3 setosa 4.7 3.2 # 4 setosa 4.6 3.1 # 5 setosa 5 3.6 # 6 setosa 5.4 3.9 # 7 setosa 4.6 3.4 # 8 setosa 5 3.4 # 9 setosa 4.4 2.9 # 10 setosa 4.9 3.1
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.