civis_to_multifile_csv: Split a Redshift table into multiple CSV parts on S3

View source: R/io.R

civis_to_multifile_csvR Documentation

Split a Redshift table into multiple CSV parts on S3

Description

Split a Redshift table into multiple CSV parts on S3 and return their locations as file IDs and presigned S3 urls.

Usage

civis_to_multifile_csv(
  sql,
  database,
  job_name = NULL,
  hidden = TRUE,
  include_header = TRUE,
  compression = "gzip",
  delimiter = ",",
  unquoted = FALSE,
  prefix = NULL
)

Arguments

sql

string, The SQL select string to be executed.

database

string, Name of database where query is run.

job_name

string, optional. Name to identify scripted sql job.

hidden

logical, Whether to hide the query in platform.

include_header

logical, optional. Whether to include headers as an element in the returned list.

compression

string, optional, Type of compression to use, if any. One of 'none', 'zip', or 'gzip'.

delimiter

string, optional. Which delimiter to use. One of ',', '\t' or '|'.

unquoted

logical, optional. Whether or not to quote fields.

prefix

string, optional. A user specified filename prefix for the output files to have.

Details

When tables are large, unloading by splitting them first is faster. When we split a table, each Redshift compute node can dump its data into S3 in parallel with the others. By doing so, we avoid having all compute nodes sending the data through the leader node, which is slow.

This function returns a list that contains the location of the CSV parts as file IDs and presigned S3 urls. The user can use either the file IDs or the presigned S3 urls to download the CSV parts. The content of the list returned by this function is similar to that of the manifest file returned by Amazon S3 UNLOAD statements.

Value

A list with the items:

  • header: column headers if 'include_header' is TRUE

  • query: the executed query

  • entries: a list containing presigned urls for each csv part

  • compression: the type of compression on each csv part

  • delimiter: the delimiter used to separate fields

  • unquoted: whether fields are quoted

Examples

## Not run: 
# Download a table into multiple csv parts
sql <- "SELECT * FROM schema.table"
database <- "important_database"
manifest <- civis_to_multifile_csv(sql=sql, database=database)
files <- lapply(manifest[["entries"]], function(x) {
  download_civis(x$id, x$name)
  x$name
})

## End(Not run)

civis documentation built on April 1, 2023, 12:01 a.m.