tsql_chunk_loader: Loads large data sets to Microsoft SQL Server (TSQL) in...

tsql_chunk_loaderR Documentation

Loads large data sets to Microsoft SQL Server (TSQL) in 'chunks'

Description

tsql_chunk_loader divides a data.frame/data.table into smaller tables so it can be easily loaded into SQL. Experience has shown that loading large tables in 'chunks' is less likely to cause errors. It is not needed for small tables which load quickly. For extremely large datasets, you will likely want to use another method, perhaps something like bcp.

Usage

tsql_chunk_loader(
  ph.data = NULL,
  db_conn = NULL,
  chunk_size = 5000,
  schema_name = NULL,
  table_name = NULL,
  overwrite = FALSE,
  append = TRUE,
  field_types = NULL,
  validate_field_types = TRUE,
  validate_upload = TRUE
)

Arguments

ph.data

The name of a single data.table/data.frame to be loaded to SQL Server

db_conn

The name of the relevant open database connection to SQL Server

chunk_size

The number of rows that you desire to have per upload 'chunk'

schema_name

The name of the schema where you want to write the data

table_name

The name of the table where you want to write the data

overwrite

Do you want to overwrite an existing table? Logical (T|F). Default overwrite = FALSE.

append

Do you want to append to an existing table? Logical (T|F). Default append = TRUE.

field_types

Optional! A named character vector with the desired TSQL datatypes for your upload. For example, c(col1 = 'int', col2 = 'float', col3 = 'date')

validate_field_types

Do you want to validate TSQL field types using rads::tsql_validate_field_types? Logical (T|F). Default validate_field_types = TRUE.

validate_upload

Do you want to validate that all rows have been uploaded? Logical (T|F). Default validate_upload = TRUE.

Details

overwrite & append are intentionally redundant in order to reduce the risk of accidentally overwriting a table. Note that it is illogical for overwrite & append to have the same value.

The names in field_types must be the same as the names in ph.data. Also note that field_types is only processed when append = FALSE. This prevents conflicts with data types in pre-existing SQL tables.

validate_field_types = TRUE is ignored if the field_types argument is not provided.

Examples


 library(data.table)
 mydt = data.table(col1 = 1:10000L,  # create integer
                   col2 = 1:10000/3) # create float
 mydt[, col3 := as.Date(Sys.Date()) - col1] # create date
 mydt[, col4 := as.character(col3)] # create string
 myfieldtypes <- c(col1 = 'int', col2 = 'float', col3 = 'date', col4 = 'nvarchar(255)')

 tsql_chunk_loader(
   ph.data = mydt,
   db_conn = rads::validate_hhsaw_key(), # connect to Azure 16
   chunk_size = 3333,
   schema_name = Sys.getenv("USERNAME"),
   table_name = 'JustTesting',
   overwrite = TRUE,
   append = FALSE,
   field_types = myfieldtypes,
   validate_field_types = TRUE,
   validate_upload = TRUE
 )



PHSKC-APDE/rads documentation built on April 14, 2025, 10:47 a.m.