db_copy_to: S3 implementation of 'db_copy_to' for Athena
In noctua: Connect to 'AWS Athena' using R 'AWS SDK' 'paws' ('DBI' Interface)

db_copy_to

R Documentation

S3 implementation of `db_copy_to` for Athena

Description

This is an Athena method for dbplyr function db_copy_to to create an Athena table from a data.frame.

Usage

db_copy_to.AthenaConnection(
  con,
  table,
  values,
  ...,
  partition = NULL,
  s3_location = NULL,
  file_type = c("csv", "tsv", "parquet"),
  compress = FALSE,
  max_batch = Inf,
  overwrite = FALSE,
  append = FALSE,
  types = NULL,
  temporary = TRUE,
  unique_indexes = NULL,
  indexes = NULL,
  analyze = TRUE,
  in_transaction = FALSE
)

Arguments

`con`	A `dbConnect` object, as returned by `dbConnect()`
`table`	A character string specifying a table name. Names will be automatically quoted so you can use any sequence of characters, not just any valid bare table name.
`values`	A data.frame to write to the database.
`...`	other parameters currently not supported in noctua
`partition`	Partition Athena table (needs to be a named list or vector) for example: `c(var1 = "2019-20-13")`
`s3_location`	s3 bucket to store Athena table, must be set as a s3 uri for example ("s3://mybucket/data/")
`file_type`	What file type to store data.frame on s3, noctua currently supports ["tsv", "csv", "parquet"]. Default delimited file type is "tsv", in previous versions of `noctua (=< 1.4.0)` file type "csv" was used as default. The reason for the change is that columns containing `Array/JSON` format cannot be written to Athena due to the separating value ",". This would cause issues with AWS Athena. Note: "parquet" format is supported by the `arrow` package and it will need to be installed to utilise the "parquet" format.
`compress`	`FALSE \| TRUE` To determine if to compress file.type. If file type is ["csv", "tsv"] then "gzip" compression is used, for file type "parquet" "snappy" compression is used.
`max_batch`	Split the data frame by max number of rows i.e. 100,000 so that multiple files can be uploaded into AWS S3. By default when compression is set to `TRUE` and file.type is "csv" or "tsv" max.batch will split data.frame into 20 batches. This is to help the performance of AWS Athena when working with files compressed in "gzip" format. `max.batch` will not split the data.frame when loading file in parquet format. For more information please go to link
`overwrite`	Allows overwriting the destination table. Cannot be `TRUE` if `append` is also `TRUE`.
`append`	Allow appending to the destination table. Cannot be `TRUE` if `overwrite` is also `TRUE`. Existing Athena DDL file type will be retained and used when uploading data to AWS Athena. If parameter `file.type` doesn't match AWS Athena DDL file type a warning message will be created notifying user and `noctua` will use the file type for the Athena DDL.
`types`	Additional field types used to override derived types.
`temporary`	if TRUE, will create a temporary table that is local to this connection and will be automatically deleted when the connection expires
`unique_indexes`	a list of character vectors. Each element of the list will create a new unique index over the specified column(s). Duplicate rows will result in failure.
`indexes`	a list of character vectors. Each element of the list will create a new index.
`analyze`	if TRUE (the default), will automatically ANALYZE the new table so that the query optimiser has useful information.
`in_transaction`	Should the table creation be wrapped in a transaction? This typically makes things faster, but you may want to suppress if the database doesn't support transactions, or you're wrapping in a transaction higher up (and your database doesn't support nested transactions.)

Value

db_copy_to returns table name

Examples

## Not run: 
# Note:
# - Require AWS Account to run below example.
# - Different connection methods can be used please see `noctua::dbConnect` documnentation

library(DBI)
library(dplyr)

# Demo connection to Athena using profile name
con <- dbConnect(noctua::athena())

# List existing tables in Athena
dbListTables(con)

# Write data.frame to Athena table
copy_to(con, mtcars,
  s3_location = "s3://mybucket/data/"
)

# Checking if uploaded table exists in Athena
dbExistsTable(con, "mtcars")

# Write Athena table from tbl_sql
athena_mtcars <- tbl(con, "mtcars")
mtcars_filter <- athena_mtcars %>% filter(gear >= 4)

copy_to(con, mtcars_filter)

# Checking if uploaded table exists in Athena
dbExistsTable(con, "mtcars_filter")

# Disconnect from Athena
dbDisconnect(con)

## End(Not run)

noctua documentation built on Aug. 9, 2023, 1:07 a.m.