add_chunk: Add a chunk field to a data frame

View source: R/chunk.R

add_chunkR Documentation

Add a chunk field to a data frame

Description

This auxiliary function adds a field, if necessary, to a data frame so that each compartment of the data frame that corresponds to a unique combination of the chunk fields has a size below a certain threshold. This resulting data frame can then be safely used in dbAppendTable() becauase Presto has a size limit on any discrete INSERT INTO statement.

Usage

add_chunk(
  value,
  base_chunk_fields = NULL,
  chunk_size = 1e+06,
  new_chunk_field_name = "aux_chunk_idx"
)

Arguments

value

The original data frame.

base_chunk_fields

A character vector of existing field names that are used to split the data frame before checking the chunk size.

chunk_size

Maximum size (in bytes) of the VALUES statement encoding each unique chunk. Default to 1,000,000 bytes (i.e. 1Mb).

new_chunk_field_name

A string indicating the new chunk field name. Default to "aux_chunk_idx".

Examples

## Not run: 
# returns the original data frame because it's within size
add_chunk(iris)
# add a new aux_chunk_idx field
add_chunk(iris, chunk_size = 2000)
# the new aux_chunk_idx field is added on top of Species
add_chunk(iris, chunk_size = 2000, base_chunk_fields = c("Species"))

## End(Not run)

RPresto documentation built on Nov. 2, 2023, 5:58 p.m.