fread_chunked | R Documentation |
datatable::fread()
fread_chunked
is a helper function that wraps around fread
to allow chunk-wise operations on data as it loads.
By itself fread
can load delimited files extremely fast; however, it does not have extensive nor easy-to-use capabilities to perform operations
while the data streams into R.
fread_chunked(
file_location,
filter_col,
filter_v,
chunk_function = NULL,
chunk_size = 1000000L,
...
)
file_location |
Location of target file to load (any file compatible with |
filter_col |
Target column to perform filtering operation. |
filter_v |
Vector of values to perform filtering on (categorical by default via 'in' operator). |
chunk_function |
A custom function to perform instead of the default behaviour of filtering on a single column. |
chunk_size |
Size of each chunk to perform operations (default: 1e6L). |
... |
Additional parameters to pass to |
This function by default will filter data based upon a provided column ID and filtering vector. However, a custom function can also be provided for more flexible operations to be performed on each chunk. The common use-case is while working with extremely large data, where the entire dataset would never fit into the available computer memory. When datasets contains much more information than needed for a particular analysis the chunk-wise filtering will ensure data loaded is reduced to the filtering criteria required without, hopefully, hitting RAM limits.
There are several options to perform chunk reading in R. In addition to this function, you could also
explore the package chunked and readr::read_csv_chunked()
. However, at some point, it may be
more suitable to simply have the data stored in a database for more efficient operations outside of R.
Datatable (passed through the chunk-wise function)
read_chunked
## Not run:
file_of_interest <- '/path/to/file/myfile.csv'
# Filter based upon an ID column or similar
ids_of_interest <- c(1, 2, 3)
chunk_loaded_file <- fread_chunked(file_of_interest, filter_col = recordID, filter_v = ids_of_interest)
# Example of custom provided function
# ... perform chunked load an filter if ID is in any of several columns
custom_chunk_f <- function(chunk) {
chunk[chunk[, Reduce(`|`, lapply(.SD, `%in%`, filter_v)),
.SDcols = c('recordID1', 'recordID2', 'recordID3', 'recordID4', 'recordID5')]]
}
chunk_loaded_file <- fread_chunked(file_of_interest, filter_v = ids_of_interest, chunk_function = custom_chunk_f)
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.