db_compute: S3 implementation of 'db_compute' for Athena

db_computeR Documentation

S3 implementation of db_compute for Athena

Description

This is a backend function for dplyr's compute function. Users won't be required to access and run this function.

Usage

db_compute.AthenaConnection(
  con,
  table,
  sql,
  ...,
  overwrite = FALSE,
  temporary = FALSE,
  unique_indexes = list(),
  indexes = list(),
  analyze = TRUE,
  in_transaction = FALSE,
  partition = NULL,
  s3_location = NULL,
  file_type = c("csv", "tsv", "parquet"),
  compress = FALSE
)

sql_query_save.AthenaConnection(con, sql, name, temporary = TRUE, with, ...)

Arguments

con

A dbConnect object, as returned by dbConnect()

table

Table name, if left default noctua will use the default from dplyr's compute function.

sql

SQL code to be sent to the data

...

passes noctua table creation parameters: [file_type,s3_location,partition]

overwrite

Allows overwriting the destination table. Cannot be TRUE if append is also TRUE.

temporary

if TRUE, will create a temporary table that is local to this connection and will be automatically deleted when the connection expires

unique_indexes

a list of character vectors. Each element of the list will create a new unique index over the specified column(s). Duplicate rows will result in failure.

indexes

a list of character vectors. Each element of the list will create a new index.

analyze

if TRUE (the default), will automatically ANALYZE the new table so that the query optimiser has useful information.

in_transaction

Should the table creation be wrapped in a transaction? This typically makes things faster, but you may want to suppress if the database doesn't support transactions, or you're wrapping in a transaction higher up (and your database doesn't support nested transactions.)

partition

Partition Athena table (needs to be a named list or vector) for example: c(var1 = "2019-20-13")

s3_location

s3 bucket to store Athena table, must be set as a s3 uri for example ("s3://mybucket/data/")

file_type

What file type to store data.frame on s3, noctua currently supports ["tsv", "csv", "parquet"]. Default delimited file type is "tsv", in previous versions of noctua (=< 1.4.0) file type "csv" was used as default. The reason for the change is that columns containing Array/JSON format cannot be written to Athena due to the separating value ",". This would cause issues with AWS Athena. Note: "parquet" format is supported by the arrow package and it will need to be installed to utilise the "parquet" format.

compress

FALSE | TRUE To determine if to compress file.type. If file type is ["csv", "tsv"] then "gzip" compression is used, for file type "parquet" "snappy" compression is used.

name

Table name, if left default noctua will use the default from dplyr's compute function.

with

An optional WITH clause for the CREATE TABLE statement.

  • file_type: What file type to store data.frame on s3, noctua currently supports ["NULL","csv", "parquet", "json"]. "NULL" will let Athena set the file_type for you.

  • s3_location: s3 bucket to store Athena table, must be set as a s3 uri for example ("s3://mybucket/data/")

  • partition: Partition Athena table, requires to be a partitioned variable from previous table.

Value

db_compute returns table name

See Also

AthenaWriteTables backend_dbplyr_v2 backend_dbplyr_v1

Examples

## Not run: 
# Note:
# - Require AWS Account to run below example.
# - Different connection methods can be used please see `noctua::dbConnect` documentation

library(DBI)
library(dplyr)

# Demo connection to Athena using profile name
con <- dbConnect(noctua::athena())

# Write data.frame to Athena table
copy_to(con, mtcars,
  s3_location = "s3://mybucket/data/"
)

# Write Athena table from tbl_sql
athena_mtcars <- tbl(con, "mtcars")
mtcars_filter <- athena_mtcars %>% filter(gear >= 4)

# create athena with unique table name
mtcars_filer %>%
  compute()

# create athena with specified name and s3 location
mtcars_filer %>%
  compute("mtcars_filer",
    s3_location = "s3://mybucket/mtcars_filer/"
  )

# Disconnect from Athena
dbDisconnect(con)

## End(Not run)

noctua documentation built on Aug. 9, 2023, 1:07 a.m.