The noctua
package aims to make it easier to work with data stored in AWS Athena
. noctua
package attempts to provide three levels of interacting with AWS Athena:
AWS Athena
backend utilising the AWS SDK paws
. This includes configuring AWS Athena Work Groups
to assuming different roles within AWS
when connecting to AWS Athena
.noctua
, by providing a DBI
interface to AWS Athena
. Users are able to interact with AWS Athena
utilising familiar functions and methods they have used for other Databases from R.dplyr
is coming more popular, noctua
aims to give dplyr
a seamless interface into AWS Athena
.noctua
:As noctua
utilising the R AWS SDK paws
the installation of noctua
is pretty straight forward:
# cran version install.packages("noctua") # Dev version remotes::install_github("dyfanjones/noctua")
To help with users wishing to run noctua
in a docker, a simple docker file has been created here. To set up the docker please refer to link. For demo purposes we will use the example docker and run it locally:
# build docker image docker build . -t noctua # start container with aws credentials passed from local docker run \ -e AWS_ACCESS_KEY_ID="$(aws configure get aws_access_key_id)" \ -e AWS_SECRET_ACCESS_KEY="$(aws configure get aws_secret_access_key)" \ -e AWS_SESSION_TOKEN="$(aws configure get aws_session_token)" \ -e AWS_DEFAULT_REGION="$(aws configure get region)" \ -it noctua
NOTE: readr
isn't required for noctua
, however it has been included in the docker file to improve performance when querying AWS Athena.
library(DBI) library(noctua) con <- dbConnect(athena()) # list all current work groups in AWS Athena list_work_groups(con) # Create a new work group create_work_group(con, "demo_work_group", description = "This is a demo work group", tags = tag_options(key= "demo_work_group", value = "demo_01"))
library(DBI) con <- dbConnect(noctua::athena()) # Get metadata dbGetInfo(con) # $profile_name # [1] "default" # # $s3_staging # [1] ######## NOTE: Please don't share your S3 bucket to the public # # $dbms.name # [1] "default" # # $work_group # [1] "primary" # # $poll_interval # NULL # # $encryption_option # NULL # # $kms_key # NULL # # $expiration # NULL # # $region_name # [1] "eu-west-1" # # $paws # [1] "0.1.6" # # $noctua # [1] "1.5.1" # create table to AWS Athena dbWriteTable(con, "iris", iris) dbGetQuery(con, "select * from iris limit 10") # Info: (Data scanned: 860 Bytes) # sepal_length sepal_width petal_length petal_width species # 1: 5.1 3.5 1.4 0.2 setosa # 2: 4.9 3.0 1.4 0.2 setosa # 3: 4.7 3.2 1.3 0.2 setosa # 4: 4.6 3.1 1.5 0.2 setosa # 5: 5.0 3.6 1.4 0.2 setosa # 6: 5.4 3.9 1.7 0.4 setosa # 7: 4.6 3.4 1.4 0.3 setosa # 8: 5.0 3.4 1.5 0.2 setosa # 9: 4.4 2.9 1.4 0.2 setosa # 10: 4.9 3.1 1.5 0.1 setosa
library(dplyr) athena_iris <- tbl(con, "iris") athena_iris %>% select(species, sepal_length, sepal_width) %>% head(10) %>% collect() # Info: (Data scanned: 860 Bytes) # # A tibble: 10 x 3 # species sepal_length sepal_width # <chr> <dbl> <dbl> # 1 setosa 5.1 3.5 # 2 setosa 4.9 3 # 3 setosa 4.7 3.2 # 4 setosa 4.6 3.1 # 5 setosa 5 3.6 # 6 setosa 5.4 3.9 # 7 setosa 4.6 3.4 # 8 setosa 5 3.4 # 9 setosa 4.4 2.9 # 10 setosa 4.9 3.1
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.