rsapply: Regular Expression Apply Function for data.table

Description Usage Arguments Examples

View source: R/featureEngineering.R

Description

This function allows you to match columns to input into an lapply function based on a regular expression.

Usage

1
rsapply(X, M, FUN, ..., assign = NULL, by)

Arguments

X

a data.table or data.frame object.

M

a character vector containing regular expressions prepended with tilde (~) and/or a fixed string (without the tilde).

FUN

the function to be applied to each element of X. This can be a value if assign is supplied.

...

optional arguments to FUN.

assign

a character string containing what column name to prepend each assignment to. Can be left an empty string "" for in-place transformation.

by

a character vector of column names to group the operation by.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
data(iris)
iris.dt <- data.table(iris)
rsapply(iris.dt, "~*Sepal", as.character, assign = "ch") # build new features with a character conversion, each column prepended with 'ch'
rsapply(iris.dt, "~*ch$", 1, assign = "") # different type: will have data.table warning
rsapply(iris.dt, "~*ch$", "2", assign = "") # same type: no data.table warning
rsapply(iris.dt, "~*ch$", NULL, assign = "") # remove all the columns that end in 'ch'
str(rsapply(iris.dt, "~*Sepal", as.character))
rsapply(iris.dt, c("~Sepal","~Petal"), quantile, probs = 1:3/4) # calculate the first 3 quantiles for all columns that have Sepal or Petal
rsapply(iris.dt, c("~Sepal","~Petal"), quantile, probs = 1:3/4, by = "Species") # calculate the first 3 quantiles for all Sepal or Petal grouped by Species
# Find the mean difference between 1st and 3rd quantile of all species for all Length only columns
rsapply(
  rsapply(
    rsapply(iris.dt, c("~Sepal","~Petal"), quantile, probs = c(1,3)/4, by = "Species"),
    c("~Sepal","~Petal"), function(x) max(x) - min(x), by = "Species"),
  c("~Length"), mean
)
rsapply(iris.dt, c("~Sepal","~Petal"), mean, by = "Species")[, .(ratio = Sepal.Length / Sepal.Width)] # Chain a new column called ratio which computes the ratio of Sepal Length and Width
melt(rsapply(iris.dt, c("~Sepal","~Petal"), mean, by = "Species"), id.vars = "Species") # Naturally can use melt and dcast for pivoting
rsapply(rsapply(dt, "~*SEGMENT*", function(x) ifelse(is.na(x), -1, x), assign = "_NEW"), "~*SEGMENT", print) # imputation
rsapply(rsapply(dt, "~*SEGMENT*", function(x) ifelse(is.na(x), -1, x), assign = ""), "~*SEGMENT", print) # in place imputation

num.col <- colnames(dd)[dd[, lapply(.SD, function(x) class(x)[1]) == "numeric"]] # only get the numeric attributes
rsapply(dd, num.col, print) # Print the columns
rsapply(dd, num.col, function(x) ifelse(is.na(x),-1,x), assign = "") # in place imputation for numeric only attributes
rsapply(dd, rsClass(dd, "numeric"), rsPrint) # fetch only the numeric attributes utilising rsClass

ivanliu1989/RQuant documentation built on Sept. 13, 2019, 11:53 a.m.