DTcolsample: data.table colsampling (nearly without) copy

Description Usage Arguments Details Value Examples

Description

This function attempts to subsample one data.table without making copies. Well, you could just use DT[, (mycols) := NULL] (for removal) or DT <- DT[, (mycols), with = FALSE] for selecting...

Usage

1
2
DTcolsample(DT, kept, remove = FALSE, low_mem = FALSE, collect = 0,
  silent = TRUE)

Arguments

DT

Type: data.table. The data.table to combine on.

kept

Type: vector of integers or vector of characters. The columns to select to keep.

remove

Type: boolean. Whether the argument kept acts as a removal (keep all columns which are not in kept). Defaults to FALSE.

low_mem

Type: boolean. Unallows DT (up to) twice in memory by deleting DT (WARNING: empties your DT) to save memory when set to TRUE. Setting it to FALSE allow DT to reside (up to) twice in memory, therefore memory usage increases. Defaults to FALSE.

collect

Type: integer. Forces a garbage collect every collect iterations to clear up memory. Setting this to 1 along with low_mem = TRUE leads to the lowest possible memory usage one can ever get to merge two data.tables. It also prints verbose information about the process everytime it garbage collects. Setting this to 0 leads to no garbage collect. Lower values increases the time required to subsample the data.table. Defauls to 0.

silent

Type: boolean. Force silence during garbage collection iterations at no speed cost. Defaults to TRUE.

Details

Warning: DT is a pointer only even if you pass the object to this function. This is how memory efficiency is achieved.

Value

The subsampled data.table.

Examples

1
2
3
4
5
6
library(data.table)
DT <- data.frame(matrix(nrow = 50, ncol = 10))
DT <- setDT(DT)
colnames(DT) <- paste(colnames(DT), "xx", sep = "")
DT <- DTcolsample(DT, kept = 1:8, remove = FALSE, low_mem = TRUE)
DT <- DTcolsample(DT, kept = 1:6, remove = TRUE, low_mem = TRUE)

Laurae2/Laurae documentation built on May 8, 2019, 7:59 p.m.