redshift_query_n: redshift_query_n

Description Usage Arguments Examples

View source: R/redshift_query_n.R

Description

Retrieves unloaded redshift query components from an AWS library. The process is parallelised to speed it up. This means much larger datasets can be queried and compiled. If parallel = FALSE parameter is not needed.

Usage

1
2
redshift_query_n(sql.string, conn, bucket, transform.function = NULL,
  parallel = FALSE, cores = NULL, package.list = NULL)

Arguments

sql.string

SQL Query string

conn

Redshift database connection object

bucket

AWS bucket to store unload files

transform.function

function object detailing additional transformations to perform on the data

parallel

TRUE or FALSE, do you want to do parallel processing. Defaults to FALSE

cores

Cores to utilise, defaults to NULL, if NULL will detect cores on machine and utilise all of them. If parallel = FALSE parameter is not needed.

package.list

packages packages needed in the transformation function. If parallel = FALSE parameter is not needed. NB: If parallel = FALSE the package must be loaded in the local environment.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
# Firstly you'll need to set up AWS permissions

Sys.setenv("AWS_ACCESS_KEY_ID" = "key",
           "AWS_SECRET_ACCESS_KEY" = "key",
           "AWS_DEFAULT_REGION" = "key")

# Define SQL string for database query
# e.g.

sql.string <- "SELECT * FROM some.table"

# Setup a database connection object using redshift.connect

conn <- redshift.connect("connection string")

# You'll also need an existing bucket where you can deposit interim files
# e.g.

bucket <- "an-aws-bucket"

# A transformation function used to make additional alterations on the data

transform.function <- function(x) {data.table(x)}

# Decide if you want to do parallel processing, if no

parallel <- FALSE

# if yes

parallel <- TRUE

# If parallel = TRUE
# Define number of threads, if undefined it'll detect the number of cores on the system and use these.

cores <- 10

# If parallel = TRUE
# A list of packages that must be used in the transformation function
# e.g.

package.list <- "data.table"

# Execute the function

data <- redshift_query_n(sql.string, conn, bucket, cores, transform.function, package.list)

OLICHIPPERFIELD/rednshift documentation built on Feb. 28, 2020, 4:21 p.m.