tsql_chunk_loader: Loads large data sets to Microsoft SQL Server (TSQL) in...
In PHSKC-APDE/rads: Assisted computation of King County public health data

tsql_chunk_loader

R Documentation

Loads large data sets to Microsoft SQL Server (TSQL) in 'chunks'

Description

tsql_chunk_loader divides a data.frame/data.table into smaller tables so it can be easily loaded into SQL. Experience has shown that loading large tables in 'chunks' is less likely to cause errors. It is not needed for small tables which load quickly. For extremely large datasets, you will likely want to use another method, perhaps something like bcp.

Usage

tsql_chunk_loader(
  ph.data = NULL,
  db_conn = NULL,
  chunk_size = 5000,
  schema_name = NULL,
  table_name = NULL,
  overwrite = FALSE,
  append = TRUE,
  field_types = NULL,
  validate_field_types = TRUE,
  validate_upload = TRUE
)

Arguments

`ph.data`	The name of a single data.table/data.frame to be loaded to SQL Server
`db_conn`	The name of the relevant open database connection to SQL Server
`chunk_size`	The number of rows that you desire to have per upload 'chunk'
`schema_name`	The name of the schema where you want to write the data
`table_name`	The name of the table where you want to write the data
`overwrite`	Do you want to overwrite an existing table? Logical (T\|F). Default `overwrite = FALSE`.
`append`	Do you want to append to an existing table? Logical (T\|F). Default `append = TRUE`.
`field_types`	Optional! A named character vector with the desired TSQL datatypes for your upload. For example, `c(col1 = 'int', col2 = 'float', col3 = 'date')`
`validate_field_types`	Do you want to validate TSQL field types using `rads::tsql_validate_field_types`? Logical (T\|F). Default `validate_field_types = TRUE`.
`validate_upload`	Do you want to validate that all rows have been uploaded? Logical (T\|F). Default `validate_upload = TRUE`.

Details

overwrite & append are intentionally redundant in order to reduce the risk of accidentally overwriting a table. Note that it is illogical for overwrite & append to have the same value.

The names in field_types must be the same as the names in ph.data. Also note that field_types is only processed when append = FALSE. This prevents conflicts with data types in pre-existing SQL tables.

validate_field_types = TRUE is ignored if the field_types argument is not provided.

Examples


 library(data.table)
 mydt = data.table(col1 = 1:10000L,  # create integer
                   col2 = 1:10000/3) # create float
 mydt[, col3 := as.Date(Sys.Date()) - col1] # create date
 mydt[, col4 := as.character(col3)] # create string
 myfieldtypes <- c(col1 = 'int', col2 = 'float', col3 = 'date', col4 = 'nvarchar(255)')

 tsql_chunk_loader(
   ph.data = mydt,
   db_conn = rads::validate_hhsaw_key(), # connect to Azure 16
   chunk_size = 3333,
   schema_name = Sys.getenv("USERNAME"),
   table_name = 'JustTesting',
   overwrite = TRUE,
   append = FALSE,
   field_types = myfieldtypes,
   validate_field_types = TRUE,
   validate_upload = TRUE
 )

PHSKC-APDE/rads documentation built on April 14, 2025, 10:47 a.m.