dbWriteTable: dbWriteTable

dbWriteTable,MoJAthenaConnection,character,data.frame-methodR Documentation

dbWriteTable

Description

See noctua::dbWriteTable(). Note that you must have write permission to the s3 directory where the data is stored. In general you will not have this permission for the automatically generated directory generated by connect_athena() so you must specify an s3 directory where you do have write permission. You can do this either as an argument to connect_athena (which will affect all your Athena transactions), or specifically to the dbWriteTable call using the s3.location argument. This function calls noctua::dbWriteTable(), after replacing any references to ⁠__temp__⁠ in the statement with your temporary database in Athena. Your temporary database will be created if you do not already have one.

Usage

## S4 method for signature 'MoJAthenaConnection,character,data.frame'
dbWriteTable(
  conn,
  name,
  value,
  overwrite = FALSE,
  append = FALSE,
  row.names = NA,
  field.types = NULL,
  partition = NULL,
  s3.location = NULL,
  file.type = c("tsv", "csv", "parquet", "json"),
  compress = FALSE,
  max.batch = Inf,
  ...
)

Arguments

conn

A DBIConnection object, as returned by connect_athena()

name

A character string specifying a table name. Names will be automatically quoted so you can use any sequence of characters, not just any valid bare table name.

value

A data.frame to write to the database.

overwrite

Allows overwriting the destination table. Cannot be TRUE if append is also TRUE.

append

Allow appending to the destination table. Cannot be TRUE if overwrite is also TRUE. Existing Athena DDL file type will be retained and used when uploading data to AWS Athena. If parameter file.type doesn't match AWS Athena DDL file type a warning message will be created notifying user and noctua will use the file type for the Athena DDL. When appending to an Athena DDL that has been created outside of noctua. noctua can support the following SerDes and Data Formats.

row.names

Either TRUE, FALSE, NA or a string.

If TRUE, always translate row names to a column called "row_names". If FALSE, never translate row names. If NA, translate rownames only if they're a character vector.

A string is equivalent to TRUE, but allows you to override the default name.

For backward compatibility, NULL is equivalent to FALSE.

field.types

Additional field types used to override derived types.

partition

Partition Athena table (needs to be a named list or vector) for example: c(var1 = "2019-20-13")

s3.location

s3 bucket to store Athena table, must be set as a s3 uri for example ("s3://mybucket/data/"). By default, the s3.location is set to s3 staging directory from AthenaConnection object. Note: When creating a table for the first time s3.location will be formatted from "s3://mybucket/data/" to the following syntax "s3://{mybucket/data}/{schema}/{table}/{parition}/" this is to support tables with the same name but existing in different schemas. If schema isn't specified in name parameter then the schema from dbConnect is used instead.

file.type

What file type to store data.frame on s3, noctua currently supports ["tsv", "csv", "parquet", "json"]. Default delimited file type is "tsv", in previous versions of noctua (=< 1.4.0) file type "csv" was used as default. The reason for the change is that columns containing Array/JSON format cannot be written to Athena due to the separating value ",". This would cause issues with AWS Athena. Note: "parquet" format is supported by the arrow package and it will need to be installed to utilise the "parquet" format. "json" format is supported by jsonlite package and it will need to be installed to utilise the "json" format.

compress

FALSE | TRUE To determine if to compress file.type. If file type is ["csv", "tsv"] then "gzip" compression is used, for file type "parquet" "snappy" compression is used. Currently noctua doesn't support compression for "json" file type.

max.batch

Split the data frame by max number of rows i.e. 100,000 so that multiple files can be uploaded into AWS S3. By default when compression is set to TRUE and file.type is "csv" or "tsv" max.batch will split data.frame into 20 batches. This is to help the performance of AWS Athena when working with files compressed in "gzip" format. max.batch will not split the data.frame when loading file in parquet format. For more information please go to link

...

Other arguments used by individual methods.

Examples

# Either specify the location to dbWriteTable itself
con <- connect_athena()
dbWriteTable(con, "__temp__.table_name", dataframe, s3.location = "s3://bucket_you_have_write_permission/dir")

# Or to the connection object
con <- connect_athena(staging_dir = "s3://bucket_you_have_write_permission/dir")
dbWriteTable(con, "__temp__.table_name", dataframe)

moj-analytical-services/Rdbtools documentation built on June 22, 2024, 8:46 p.m.