Description Usage Arguments Details Value Examples
This function attempts to subsample one data.table without making copies. Compared to direct subsampling, this can result to up to 1.1X memory efficiency. In most cases, you get a NEGATIVE memory efficiency even with frequent garbage collects. Use this only if you are working with super large datasets that fills up your RAM.
1 2 | DTsubsample(DT, kept, remove = FALSE, low_mem = FALSE, collect = 0,
silent = TRUE)
|
DT |
Type: data.table. The data.table to combine on. |
kept |
Type: vector of integers. The rows to select for subsampling. |
remove |
Type: boolean. Whether the argument |
low_mem |
Type: boolean. Unallows DT (up to) twice in memory by deleting |
collect |
Type: integer. Forces a garbage collect every |
silent |
Type: boolean. Force silence during garbage collection iterations at no speed cost. Defaults to |
Warning: DT
is a pointer only even if you pass the object to this function. This is how memory efficiency is achieved.
The subsampled data.table.
1 2 3 4 5 6 7 8 9 | library(data.table)
DT <- data.frame(matrix(nrow = 5000000, ncol = 10))
DT <- setDT(DT)
DT[is.na(DT)] <- 1
colnames(DT) <- paste(colnames(DT), "xx", sep = "")
kept <- 1:4000000
DT_sub <- DTsubsample(DT, sample(5e6, 4e6, FALSE), collect = 5, silent = TRUE)
#DT_sub <- DT[sample(5e6, 4e6, FALSE), ] #works good
DT_sub <- DTsubsample(DT, sample(4e6, 3e6, FALSE), low_mem = TRUE, collect = 5, silent = TRUE)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.