rs_upsert_table: Upsert redshift table
In sicarul/redshiftTools: Amazon Redshift Tools

Description Usage Arguments Examples

View source: R/upsert.R

Upload a table to S3 and then load it with redshift, replacing rows with the same keys, and inserting rows with new keys. The table on redshift has to have the same structure and column ordering to work correctly.

rs_upsert_table(df, dbcon, table_name, keys, split_files,
  bucket = Sys.getenv("AWS_BUCKET_NAME"),
  region = Sys.getenv("AWS_DEFAULT_REGION"),
  access_key = Sys.getenv("AWS_ACCESS_KEY_ID"),
  secret_key = Sys.getenv("AWS_SECRET_ACCESS_KEY"),
  session_token = Sys.getenv("AWS_SESSION_TOKEN"),
  iam_role_arn = Sys.getenv("AWS_IAM_ROLE_ARN"), wlm_slots = 1,
  additional_params = "")

`df`	a data frame
`dbcon`	an RPostgres/RJDBC connection to the redshift server
`table_name`	the name of the table to update/insert
`keys`	this optional vector contains the variables by which to upsert. If not defined, the upsert becomes an append.
`split_files`	optional parameter to specify amount of files to split into. If not specified will look at amount of slices in Redshift to determine an optimal amount.
`bucket`	the name of the temporary bucket to load the data. Will look for AWS_BUCKET_NAME on environment if not specified.
`region`	the region of the bucket. Will look for AWS_DEFAULT_REGION on environment if not specified.
`access_key`	the access key with permissions for the bucket. Will look for AWS_ACCESS_KEY_ID on environment if not specified.
`secret_key`	the secret key with permissions for the bucket. Will look for AWS_SECRET_ACCESS_KEY on environment if not specified.
`session_token`	the session key with permissions for the bucket, this will be used instead of the access/secret keys if specified. Will look for AWS_SESSION_TOKEN on environment if not specified.
`iam_role_arn`	an iam role arn with permissions fot the bucket. Will look for AWS_IAM_ROLE_ARN on environment if not specified. This is ignoring access_key and secret_key if set.
`wlm_slots`	amount of WLM slots to use for this bulk load http://docs.aws.amazon.com/redshift/latest/dg/tutorial-configuring-workload-management.html
`additional_params`	Additional params to send to the COPY statement in Redshift

library(DBI)

a=data.frame(a=seq(1,10000), b=seq(10000,1))
n=head(a,n=5000)
n$b=n$a
nx=rbind(n, data.frame(a=seq(99999:104000), b=seq(104000:99999)))

## Not run: 
con <- dbConnect(RPostgres::Postgres(), dbname="dbname",
host='my-redshift-url.amazon.com', port='5439',
user='myuser', password='mypassword',sslmode='require')

rs_upsert_table(df=nx, dbcon=con, table_name='testTable',
bucket="my-bucket", split_files=4, keys=c('a'))


## End(Not run)