rs_cols_upsert_table: Upsert Redshift table by specifying a column list
In sicarul/redshiftTools: Amazon Redshift Tools

Description Usage Arguments See Also Examples

View source: R/upsert2.R

Upload a table to S3 and then load to Redshift, replacing the target value in existing rows that have the same keys, and inserting rows with new keys. New rows must match structure and column ordering of existing Redshift table.

rs_cols_upsert_table(dat, dbcon, table_name, values, keys, split_files,
  bucket = Sys.getenv("AWS_BUCKET_NAME"),
  region = Sys.getenv("AWS_DEFAULT_REGION"),
  access_key = Sys.getenv("AWS_ACCESS_KEY_ID"),
  secret_key = Sys.getenv("AWS_SECRET_ACCESS_KEY"),
  session_token = Sys.getenv("AWS_SESSION_TOKEN"),
  iam_role_arn = Sys.getenv("AWS_IAM_ROLE_ARN"), wlm_slots = 1,
  additional_params = "")

`dat`	a data frame
`dbcon`	an RPostgres/RJDBC connection to the redshift server
`table_name`	the name of the table to update/insert
`values`	the columns that will be updated
`keys`	this optional vector contains the variables by which to upsert. If not defined, the upsert becomes an append.
`split_files`	optional parameter to specify amount of files to split into. If not specified will look at amount of slices in Redshift to determine an optimal amount.
`bucket`	the name of the temporary bucket to load the data. Will look for AWS_BUCKET_NAME on environment if not specified.
`region`	the region of the bucket. Will look for AWS_DEFAULT_REGION on environment if not specified.
`access_key`	the access key with permissions for the bucket. Will look for AWS_ACCESS_KEY_ID on environment if not specified.
`secret_key`	the secret key with permissions for the bucket. Will look for AWS_SECRET_ACCESS_KEY on environment if not specified.
`session_token`	the session key with permissions for the bucket, this will be used instead of the access/secret keys if specified. Will look for AWS_SESSION_TOKEN on environment if not specified.
`iam_role_arn`	an iam role arn with permissions fot the bucket. Will look for AWS_IAM_ROLE_ARN on environment if not specified. This is ignoring access_key and secret_key if set.
`wlm_slots`	amount of WLM slots to use for this bulk load http://docs.aws.amazon.com/redshift/latest/dg/tutorial-configuring-workload-management.html
`additional_params`	Additional params to send to the COPY statement in Redshift

https://docs.aws.amazon.com/redshift/latest/dg/merge-specify-a-column-list.html

http://docs.aws.amazon.com/redshift/latest/dg/tutorial-configuring-workload-management.html

library(DBI)

a=data.frame(a=seq(1,10000), b=seq(10000,1))
n=head(a,n=5000)
n$b=n$a
nx=rbind(n, data.frame(a=seq(99999:104000), b=seq(104000:99999)))

## Not run: 
con <- dbConnect(RPostgres::Postgres(), dbname="dbname",
host='my-redshift-url.amazon.com', port='5439',
user='myuser', password='mypassword',sslmode='require')

rs_cols_upsert_table(df=nx, dbcon=con, table_name='testTable',
bucket="my-bucket", split_files=4, values=c('b'), keys=c('a'))

## End(Not run)