dbWriteTable: dbWriteTable
In moj-analytical-services/Rdbtools: Connects the MoJ Analytical Platform to Athena

dbWriteTable,MoJAthenaConnection,character,data.frame-method

R Documentation

dbWriteTable

Description

See noctua::dbWriteTable(). Note that you must have write permission to the s3 directory where the data is stored. In general you will not have this permission for the automatically generated directory generated by connect_athena() so you must specify an s3 directory where you do have write permission. You can do this either as an argument to connect_athena (which will affect all your Athena transactions), or specifically to the dbWriteTable call using the s3.location argument. This function calls noctua::dbWriteTable(), after replacing any references to ⁠__temp__⁠ in the statement with your temporary database in Athena. Your temporary database will be created if you do not already have one.

Usage

## S4 method for signature 'MoJAthenaConnection,character,data.frame'
dbWriteTable(
  conn,
  name,
  value,
  overwrite = FALSE,
  append = FALSE,
  row.names = NA,
  field.types = NULL,
  partition = NULL,
  s3.location = NULL,
  file.type = c("tsv", "csv", "parquet", "json"),
  compress = FALSE,
  max.batch = Inf,
  ...
)

Arguments

`conn`	A DBIConnection object, as returned by `connect_athena()`
`name`	A character string specifying a table name. Names will be automatically quoted so you can use any sequence of characters, not just any valid bare table name.
`value`	A data.frame to write to the database.
`overwrite`	Allows overwriting the destination table. Cannot be `TRUE` if `append` is also `TRUE`.
`append`	Allow appending to the destination table. Cannot be `TRUE` if `overwrite` is also `TRUE`. Existing Athena DDL file type will be retained and used when uploading data to AWS Athena. If parameter `file.type` doesn't match AWS Athena DDL file type a warning message will be created notifying user and `noctua` will use the file type for the Athena DDL. When appending to an Athena DDL that has been created outside of `noctua`. `noctua` can support the following SerDes and Data Formats. csv/tsv: LazySimpleSerDe parquet: Parquet SerDe json: JSON SerDe Libraries
`row.names`	Either `TRUE`, `FALSE`, `NA` or a string. If `TRUE`, always translate row names to a column called "row_names". If `FALSE`, never translate row names. If `NA`, translate rownames only if they're a character vector. A string is equivalent to `TRUE`, but allows you to override the default name. For backward compatibility, `NULL` is equivalent to `FALSE`.
`field.types`	Additional field types used to override derived types.
`partition`	Partition Athena table (needs to be a named list or vector) for example: `c(var1 = "2019-20-13")`
`s3.location`	s3 bucket to store Athena table, must be set as a s3 uri for example ("s3://mybucket/data/"). By default, the s3.location is set to s3 staging directory from `AthenaConnection` object. Note: When creating a table for the first time `s3.location` will be formatted from `"s3://mybucket/data/"` to the following syntax `"s3://{mybucket/data}/{schema}/{table}/{parition}/"` this is to support tables with the same name but existing in different schemas. If schema isn't specified in `name` parameter then the schema from `dbConnect` is used instead.
`file.type`	What file type to store data.frame on s3, noctua currently supports ["tsv", "csv", "parquet", "json"]. Default delimited file type is "tsv", in previous versions of `noctua (=< 1.4.0)` file type "csv" was used as default. The reason for the change is that columns containing `Array/JSON` format cannot be written to Athena due to the separating value ",". This would cause issues with AWS Athena. Note: "parquet" format is supported by the `arrow` package and it will need to be installed to utilise the "parquet" format. "json" format is supported by `jsonlite` package and it will need to be installed to utilise the "json" format.
`compress`	`FALSE \| TRUE` To determine if to compress file.type. If file type is ["csv", "tsv"] then "gzip" compression is used, for file type "parquet" "snappy" compression is used. Currently `noctua` doesn't support compression for "json" file type.
`max.batch`	Split the data frame by max number of rows i.e. 100,000 so that multiple files can be uploaded into AWS S3. By default when compression is set to `TRUE` and file.type is "csv" or "tsv" max.batch will split data.frame into 20 batches. This is to help the performance of AWS Athena when working with files compressed in "gzip" format. `max.batch` will not split the data.frame when loading file in parquet format. For more information please go to link
`...`	Other arguments used by individual methods.

Examples

# Either specify the location to dbWriteTable itself
con <- connect_athena()
dbWriteTable(con, "__temp__.table_name", dataframe, s3.location = "s3://bucket_you_have_write_permission/dir")

# Or to the connection object
con <- connect_athena(staging_dir = "s3://bucket_you_have_write_permission/dir")
dbWriteTable(con, "__temp__.table_name", dataframe)

moj-analytical-services/Rdbtools documentation built on June 22, 2024, 8:46 p.m.