redshift_query_n: redshift_query_n
In OLICHIPPERFIELD/rednshift: rednshift

Description Usage Arguments Examples

View source: R/redshift_query_n.R

Retrieves unloaded redshift query components from an AWS library. The process is parallelised to speed it up. This means much larger datasets can be queried and compiled. If parallel = FALSE parameter is not needed.

1 2	redshift_query_n(sql.string, conn, bucket, transform.function = NULL, parallel = FALSE, cores = NULL, package.list = NULL)

`sql.string`	SQL Query string
`conn`	Redshift database connection object
`bucket`	AWS bucket to store unload files
`transform.function`	function object detailing additional transformations to perform on the data
`parallel`	TRUE or FALSE, do you want to do parallel processing. Defaults to FALSE
`cores`	Cores to utilise, defaults to NULL, if NULL will detect cores on machine and utilise all of them. If parallel = FALSE parameter is not needed.
`package.list`	packages packages needed in the transformation function. If parallel = FALSE parameter is not needed. NB: If parallel = FALSE the package must be loaded in the local environment.

# Firstly you'll need to set up AWS permissions

Sys.setenv("AWS_ACCESS_KEY_ID" = "key",
           "AWS_SECRET_ACCESS_KEY" = "key",
           "AWS_DEFAULT_REGION" = "key")

# Define SQL string for database query
# e.g.

sql.string <- "SELECT * FROM some.table"

# Setup a database connection object using redshift.connect

conn <- redshift.connect("connection string")

# You'll also need an existing bucket where you can deposit interim files
# e.g.

bucket <- "an-aws-bucket"

# A transformation function used to make additional alterations on the data

transform.function <- function(x) {data.table(x)}

# Decide if you want to do parallel processing, if no

parallel <- FALSE

# if yes

parallel <- TRUE

# If parallel = TRUE
# Define number of threads, if undefined it'll detect the number of cores on the system and use these.

cores <- 10

# If parallel = TRUE
# A list of packages that must be used in the transformation function
# e.g.

package.list <- "data.table"

# Execute the function

data <- redshift_query_n(sql.string, conn, bucket, cores, transform.function, package.list)