rs_upsert_table: Upsert redshift table

Description Usage Arguments Examples

View source: R/upsert.R

Description

Upload a table to S3 and then load it with redshift, replacing rows with the same keys, and inserting rows with new keys. The table on redshift has to have the same structure and column ordering to work correctly.

Usage

1
2
3
4
5
6
7
8
rs_upsert_table(df, dbcon, table_name, keys, split_files,
  bucket = Sys.getenv("AWS_BUCKET_NAME"),
  region = Sys.getenv("AWS_DEFAULT_REGION"),
  access_key = Sys.getenv("AWS_ACCESS_KEY_ID"),
  secret_key = Sys.getenv("AWS_SECRET_ACCESS_KEY"),
  session_token = Sys.getenv("AWS_SESSION_TOKEN"),
  iam_role_arn = Sys.getenv("AWS_IAM_ROLE_ARN"), wlm_slots = 1,
  additional_params = "")

Arguments

df

a data frame

dbcon

an RPostgres/RJDBC connection to the redshift server

table_name

the name of the table to update/insert

keys

this optional vector contains the variables by which to upsert. If not defined, the upsert becomes an append.

split_files

optional parameter to specify amount of files to split into. If not specified will look at amount of slices in Redshift to determine an optimal amount.

bucket

the name of the temporary bucket to load the data. Will look for AWS_BUCKET_NAME on environment if not specified.

region

the region of the bucket. Will look for AWS_DEFAULT_REGION on environment if not specified.

access_key

the access key with permissions for the bucket. Will look for AWS_ACCESS_KEY_ID on environment if not specified.

secret_key

the secret key with permissions for the bucket. Will look for AWS_SECRET_ACCESS_KEY on environment if not specified.

session_token

the session key with permissions for the bucket, this will be used instead of the access/secret keys if specified. Will look for AWS_SESSION_TOKEN on environment if not specified.

iam_role_arn

an iam role arn with permissions fot the bucket. Will look for AWS_IAM_ROLE_ARN on environment if not specified. This is ignoring access_key and secret_key if set.

wlm_slots

amount of WLM slots to use for this bulk load http://docs.aws.amazon.com/redshift/latest/dg/tutorial-configuring-workload-management.html

additional_params

Additional params to send to the COPY statement in Redshift

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
library(DBI)

a=data.frame(a=seq(1,10000), b=seq(10000,1))
n=head(a,n=5000)
n$b=n$a
nx=rbind(n, data.frame(a=seq(99999:104000), b=seq(104000:99999)))

## Not run: 
con <- dbConnect(RPostgres::Postgres(), dbname="dbname",
host='my-redshift-url.amazon.com', port='5439',
user='myuser', password='mypassword',sslmode='require')

rs_upsert_table(df=nx, dbcon=con, table_name='testTable',
bucket="my-bucket", split_files=4, keys=c('a'))


## End(Not run)

redshiftTools documentation built on May 21, 2019, 5:03 p.m.