dbConvertTable: Simple wrapper to convert Athena backend file types

dbConvertTableR Documentation

Simple wrapper to convert Athena backend file types

Description

Utilises AWS Athena to convert AWS S3 backend file types. It also also to create more efficient file types i.e. "parquet" and "orc" from SQL queries.

Usage

dbConvertTable(conn, obj, name, ...)

## S4 method for signature 'AthenaConnection'
dbConvertTable(
  conn,
  obj,
  name,
  partition = NULL,
  s3.location = NULL,
  file.type = c("NULL", "csv", "tsv", "parquet", "json", "orc"),
  compress = TRUE,
  data = TRUE,
  ...
)

Arguments

conn

An AthenaConnection object, produced by [DBI::dbConnect()]

obj

Athena table or SQL DML query to be converted. For SQL, the query need to be wrapped with DBI::SQL() and follow AWS Athena DML format link

name

Name of destination table

...

Extra parameters, currently not used

partition

Partition Athena table

s3.location

location to store output file, must be in s3 uri format for example ("s3://mybucket/data/").

file.type

File type for name, currently support ["NULL","csv", "tsv", "parquet", "json", "orc"]. "NULL" will let Athena set the file type for you.

compress

Compress name, currently can only compress ["parquet", "orc"] (AWS Athena CTAS)

data

If name should be created with data or not.

Value

dbConvertTable() returns TRUE but invisible.

Examples

## Not run: 
# Note:
# - Require AWS Account to run below example.
# - Different connection methods can be used please see `RAthena::dbConnect` documnentation

library(DBI)
library(noctua)

# Demo connection to Athena using profile name
con <- dbConnect(athena())

# write iris table to Athena in defualt delimited format
dbWriteTable(con, "iris", iris)

# convert delimited table to parquet
dbConvertTable(con,
  obj = "iris",
  name = "iris_parquet",
  file.type = "parquet"
)

# Create partitioned table from non-partitioned
# iris table using SQL DML query
dbConvertTable(con,
  obj = SQL("select
                            iris.*,
                            date_format(current_date, '%Y%m%d') as time_stamp
                          from iris"),
  name = "iris_orc_partitioned",
  file.type = "orc",
  partition = "time_stamp"
)

# disconnect from Athena
dbDisconnect(con)

## End(Not run)

noctua documentation built on Aug. 9, 2023, 1:07 a.m.