AthenaWriteTables: Convenience functions for reading/writing DBMS tables
In RAthena: Connect to 'AWS Athena' using 'Boto3' ('DBI' Interface)

AthenaWriteTables

R Documentation

Convenience functions for reading/writing DBMS tables

Description

Convenience functions for reading/writing DBMS tables

Usage

## S4 method for signature 'AthenaConnection,character,data.frame'
dbWriteTable(
  conn,
  name,
  value,
  overwrite = FALSE,
  append = FALSE,
  row.names = NA,
  field.types = NULL,
  partition = NULL,
  s3.location = NULL,
  file.type = c("tsv", "csv", "parquet", "json"),
  compress = FALSE,
  max.batch = Inf,
  ...
)

## S4 method for signature 'AthenaConnection,Id,data.frame'
dbWriteTable(
  conn,
  name,
  value,
  overwrite = FALSE,
  append = FALSE,
  row.names = NA,
  field.types = NULL,
  partition = NULL,
  s3.location = NULL,
  file.type = c("tsv", "csv", "parquet", "json"),
  compress = FALSE,
  max.batch = Inf,
  ...
)

## S4 method for signature 'AthenaConnection,SQL,data.frame'
dbWriteTable(
  conn,
  name,
  value,
  overwrite = FALSE,
  append = FALSE,
  row.names = NA,
  field.types = NULL,
  partition = NULL,
  s3.location = NULL,
  file.type = c("tsv", "csv", "parquet", "json"),
  compress = FALSE,
  max.batch = Inf,
  ...
)

Arguments

`conn`	An `AthenaConnection` object, produced by [DBI::dbConnect()]
`name`	A character string specifying a table name. Names will be automatically quoted so you can use any sequence of characters, not just any valid bare table name.
`value`	A data.frame to write to the database.
`overwrite`	Allows overwriting the destination table. Cannot be `TRUE` if `append` is also `TRUE`.
`append`	Allow appending to the destination table. Cannot be `TRUE` if `overwrite` is also `TRUE`. Existing Athena DDL file type will be retained and used when uploading data to AWS Athena. If parameter `file.type` doesn't match AWS Athena DDL file type a warning message will be created notifying user and `RAthena` will use the file type for the Athena DDL. When appending to an Athena DDL that has been created outside of `RAthena`. `RAthena` can support the following SerDes and Data Formats. csv/tsv: LazySimpleSerDe parquet: Parquet SerDe json: JSON SerDe Libraries
`row.names`	Either `TRUE`, `FALSE`, `NA` or a string. If `TRUE`, always translate row names to a column called "row_names". If `FALSE`, never translate row names. If `NA`, translate rownames only if they're a character vector. A string is equivalent to `TRUE`, but allows you to override the default name. For backward compatibility, `NULL` is equivalent to `FALSE`.
`field.types`	Additional field types used to override derived types.
`partition`	Partition Athena table (needs to be a named list or vector) for example: `c(var1 = "2019-20-13")`
`s3.location`	s3 bucket to store Athena table, must be set as a s3 uri for example ("s3://mybucket/data/"). By default, the s3.location is set to s3 staging directory from `AthenaConnection` object. Note: When creating a table for the first time `s3.location` will be formatted from `"s3://mybucket/data/"` to the following syntax `"s3://{mybucket/data}/{schema}/{table}/{parition}/"` this is to support tables with the same name but existing in different schemas. If schema isn't specified in `name` parameter then the schema from `dbConnect` is used instead.
`file.type`	What file type to store data.frame on s3, RAthena currently supports ["tsv", "csv", "parquet", "json"]. Default delimited file type is "tsv", in previous versions of `RAthena (=< 1.6.0)` file type "csv" was used as default. The reason for the change is that columns containing `Array/JSON` format cannot be written to Athena due to the separating value ",". This would cause issues with AWS Athena. Note: "parquet" format is supported by the `arrow` package and it will need to be installed to utilise the "parquet" format. "json" format is supported by `jsonlite` package and it will need to be installed to utilise the "json" format.
`compress`	`FALSE \| TRUE` To determine if to compress file.type. If file type is ["csv", "tsv"] then "gzip" compression is used, for file type "parquet" "snappy" compression is used. Currently `RAthena` doesn't support compression for "json" file type.
`max.batch`	Split the data frame by max number of rows i.e. 100,000 so that multiple files can be uploaded into AWS S3. By default when compression is set to `TRUE` and file.type is "csv" or "tsv" max.batch will split data.frame into 20 batches. This is to help the performance of AWS Athena when working with files compressed in "gzip" format. `max.batch` will not split the data.frame when loading file in parquet format. For more information please go to link
`...`	Other arguments used by individual methods.

Value

dbWriteTable() returns TRUE, invisibly. If the table exists, and both append and overwrite arguments are unset, or append = TRUE and the data frame with the new data has different column names, an error is raised; the remote table remains unchanged.

Examples

## Not run: 
# Note: 
# - Require AWS Account to run below example.
# - Different connection methods can be used please see `RAthena::dbConnect` documnentation

library(DBI)

# Demo connection to Athena using profile name 
con <- dbConnect(RAthena::athena())

# List existing tables in Athena
dbListTables(con)

# Write data.frame to Athena table
dbWriteTable(con, "mtcars", mtcars,
             partition=c("TIMESTAMP" = format(Sys.Date(), "%Y%m%d")),
             s3.location = "s3://mybucket/data/")
             
# Read entire table from Athena
dbReadTable(con, "mtcars")

# List all tables in Athena after uploading new table to Athena
dbListTables(con)

# Checking if uploaded table exists in Athena
dbExistsTable(con, "mtcars")

# using default s3.location
dbWriteTable(con, "iris", iris)

# Read entire table from Athena
dbReadTable(con, "iris")

# List all tables in Athena after uploading new table to Athena
dbListTables(con)

# Checking if uploaded table exists in Athena
dbExistsTable(con, "iris")

# Disconnect from Athena
dbDisconnect(con)

## End(Not run)

RAthena documentation built on Dec. 28, 2022, 1:19 a.m.