View source: R/fsubset_ftransform_fmutate.R
ftransform | R Documentation |
ftransform
is a much faster version of transform
for data frames. It returns the data frame with new columns computed and/or existing columns modified or deleted. settransform
does all of that by reference. fcompute
computes and returns new columns. These functions evaluate all arguments simultaneously, allow list-input (nested pipelines) and disregard grouped data.
Catering to the tidyverse user, v1.7.0 introduced fmutate
, providing familiar functionality i.e. arguments are evaluated sequentially, computation on grouped data is done by groups, and functions can be applied to multiple columns using across
. See also the Details.
# dplyr-style mutate (sequential evaluation + across() feature)
fmutate(.data, ..., .keep = "all", .cols = NULL)
mtt(.data, ..., .keep = "all", .cols = NULL) # Shorthand for fmutate
# Modify and return data frame
ftransform(.data, ...)
ftransformv(.data, vars, FUN, ..., apply = TRUE)
tfm(.data, ...) # Shorthand for ftransform
tfmv(.data, vars, FUN, ..., apply = TRUE)
# Modify data frame by reference
settransform(.data, ...)
settransformv(.data, ...) # Same arguments as ftransformv
settfm(.data, ...) # Shorthand for settransform
settfmv(.data, ...)
# Replace/add modified columns in/to a data frame
ftransform(.data) <- value
tfm(.data) <- value # Shorthand for ftransform<-
# Compute columns, returned as a new data frame
fcompute(.data, ..., keep = NULL)
fcomputev(.data, vars, FUN, ..., apply = TRUE, keep = NULL)
.data |
a data frame or named list of columns. |
... |
further arguments of the form |
vars |
variables to be transformed by applying |
FUN |
a single function yielding a result of length |
apply |
logical. |
value |
a named list of replacements, it will be treated like an evaluated list of |
keep |
select columns to preserve using column names, indices or a function (e.g. |
.keep |
either one of |
.cols |
for expressions involving |
The ...
arguments to ftransform
are tagged
vector expressions, which are evaluated in the data frame
.data
. The tags are matched against names(.data)
, and for
those that match, the values replace the corresponding variable in
.data
, whereas the others are appended to .data
. It is also possible to delete columns by assigning NULL
to them, i.e. ftransform(data, colk = NULL)
removes colk
from the data. Note that names(.data)
and the names of the ...
arguments are checked for uniqueness beforehand, yielding an error if this is not the case.
Since collapse v1.3.0, is is also possible to pass a single named list to ...
, i.e. ftransform(data, newdata)
. This list will be treated like a list of tagged vector expressions. Note the different behavior: ftransform(data, list(newcol = col1))
is the same as ftransform(data, newcol = col1)
, whereas ftransform(data, newcol = as.list(col1))
creates a list column. Something like ftransform(data, as.list(col1))
gives an error because the list is not named. See Examples.
The function ftransformv
added in v1.3.2 provides a fast replacement for the functions dplyr::mutate_at
and dplyr::mutate_if
(without the grouping feature) facilitating mutations of groups of columns (dplyr::mutate_all
is already accounted for by dapply
). See Examples.
The function settransform
does all of that by reference, but uses base-R's copy-on modify semantics, which is equivalent to replacing the data with <-
(thus it is still memory efficient but the data will have a different memory address afterwards).
The function fcompute(v)
works just like ftransform(v)
, but returns only the changed / computed columns without modifying or appending the data in .data
. See Examples.
The function fmutate
added in v1.7.0, provides functionality familiar from dplyr 1.0.0 and higher. It evaluates tagged vector expressions sequentially and does operations by groups on a grouped frame (thus it is slower than ftransform
if you have many tagged expressions or a grouped data frame). Note however that collapse does not depend on rlang, so things like lambda expressions are not available. Note also that fmutate
operates differently on grouped data whether you use .FAST_FUN
or base R functions / functions from other packages. With .FAST_FUN
(including .OPERATOR_FUN
, excluding fhdbetween
/ fhdwithin
/ HDW
/ HDB
), fmutate
performs an efficient vectorized execution, i.e. the grouping object from the grouped data frame is passed to the g
argument of these functions, and for .FAST_STAT_FUN
also TRA = "replace_fill"
is set (if not overwritten by the user), yielding internal grouped computation by these functions without the need for splitting the data by groups. For base R and other functions, fmutate
performs classical split-apply combine computing i.e. the relevant columns of the data are selected and split into groups, the expression is evaluated for each group, and the result is recombined and suitably expanded to match the original data frame. Note that it is not possible to mix vectorized and standard execution in the same expression!! Vectorized execution is performed if any .FAST_FUN
or .OPERATOR_FUN
is part of the expression, thus a code like mtcars |> gby(cyl) |> fmutate(new = fmin(mpg) / min(mpg))
will be expanded to something like mtcars |> gby(cyl) |> ftransform(new = fmin(mpg, g = GRP(.), TRA = "replace_fill") / min(mpg))
and then executed, i.e. fmin(mpg)
will be executed in a vectorized way, and min(mpg)
will not be executed by groups at all.
The modified data frame .data
, or, for fcompute
, a new data frame with the columns computed on .data
. All attributes of .data
are preserved.
ftransform
ignores grouped data. This is on purpose as it allows non-grouped transformation inside a pipeline on grouped data, and affords greater flexibility and performance in programming with the .FAST_FUN
. In particular, you can run a nested pipeline inside ftransform
, and decide which expressions should be grouped, and you can use the ad-hoc grouping functionality of the .FAST_FUN
, allowing operations where different groupings are applied simultaneously in an expression. See Examples or the answer provided here.
fmutate
on the other hand supports grouped operations just like dplyr::mutate
, but works in two different ways depending on whether you use .FAST_FUN
in an expression or other functions. See the Examples.
across
, fsummarise
, Data Frame Manipulation, Collapse Overview
## fmutate() examples ---------------------------------------------------------------
# Please note that expressions are vectorized whenever they contain 'ANY' fast function
mtcars |>
fgroup_by(cyl, vs, am) |>
fmutate(mean_mpg = fmean(mpg), # Vectorized
mean_mpg_base = mean(mpg), # Non-vectorized
mpg_cumpr = fcumsum(mpg) / fsum(mpg), # Vectorized
mpg_cumpr_base = cumsum(mpg) / sum(mpg), # Non-vectorized
mpg_cumpr_mixed = fcumsum(mpg) / sum(mpg)) # Vectorized: division by overall sum
# Using across: here fmean() gets vectorized across both groups and columns (requiring a single
# call to fmean.data.frame which goes to C), whereas weighted.mean needs to be called many times.
mtcars |> fgroup_by(cyl, vs, am) |>
fmutate(across(disp:qsec, list(mu = fmean, mu2 = weighted.mean), w = wt, .names = "flip"))
# Can do more complex things...
mtcars |> fgroup_by(cyl) |>
fmutate(res = resid(lm(mpg ~ carb + hp, weights = wt)))
# Since v1.9.0: supports arbitrary expressions returning suitable lists
## Not run:
mtcars |> fgroup_by(cyl) |>
fmutate(broom::augment(lm(mpg ~ carb + hp, weights = wt)))
# Same thing using across() (supported before 1.9.0)
modelfun <- function(data) broom::augment(lm(mpg ~ carb + hp, data, weights = wt))
mtcars |> fgroup_by(cyl) |>
fmutate(across(c(mpg, carb, hp, wt), modelfun, .apply = FALSE))
## End(Not run)
## ftransform() / fcompute() examples: ----------------------------------------------
## ftransform modifies and returns a data.frame
head(ftransform(airquality, Ozone = -Ozone))
head(ftransform(airquality, new = -Ozone, Temp = (Temp-32)/1.8))
head(ftransform(airquality, new = -Ozone, new2 = 1, Temp = NULL)) # Deleting Temp
head(ftransform(airquality, Ozone = NULL, Temp = NULL)) # Deleting columns
# With collapse's grouped and weighted functions, complex operations are done on the fly
head(ftransform(airquality, # Grouped operations by month:
Ozone_Month_median = fmedian(Ozone, Month, TRA = "fill"),
Ozone_Month_sd = fsd(Ozone, Month, TRA = "replace"),
Ozone_Month_centered = fwithin(Ozone, Month)))
# Grouping by month and above/below average temperature in each month
head(ftransform(airquality, Ozone_Month_high_median =
fmedian(Ozone, list(Month, Temp > fbetween(Temp, Month)), TRA = "fill")))
## ftransformv can be used to modify multiple columns using a function
head(ftransformv(airquality, 1:3, log))
head(`[<-`(airquality, 1:3, value = lapply(airquality[1:3], log))) # Same thing in base R
head(ftransformv(airquality, 1:3, log, apply = FALSE))
head(`[<-`(airquality, 1:3, value = log(airquality[1:3]))) # Same thing in base R
# Using apply = FALSE yields meaningful performance gains with collapse functions
# This calls fwithin.default, and repeates the grouping by month 3 times:
head(ftransformv(airquality, 1:3, fwithin, Month))
# This calls fwithin.data.frame, and only groups one time -> 5x faster!
head(ftransformv(airquality, 1:3, fwithin, Month, apply = FALSE))
# This also works for grouped and panel data frames (calling fwithin.grouped_df)
airquality |> fgroup_by(Month) |>
ftransformv(1:3, fwithin, apply = FALSE) |> head()
# But this gives the WRONG result (calling fwithin.default). Need option apply = FALSE!!
airquality |> fgroup_by(Month) |>
ftransformv(1:3, fwithin) |> head()
# For grouped modification of single columns in a grouped dataset, we can use GRP():
library(magrittr)
airquality |> fgroup_by(Month) %>%
ftransform(W_Ozone = fwithin(Ozone, GRP(.)), # Grouped centering
sd_Ozone_m = fsd(Ozone, GRP(.), TRA = "replace"), # In-Month standard deviation
sd_Ozone = fsd(Ozone, TRA = "replace"), # Overall standard deviation
sd_Ozone2 = fsd(Ozone, TRA = "fill"), # Same, overwriting NA's
sd_Ozone3 = fsd(Ozone)) |> head() # Same thing (calling alloc())
## For more complex mutations we can use ftransform with compound pipes
airquality |> fgroup_by(Month) %>%
ftransform(get_vars(., 1:3) |> fwithin() |> flag(0:2)) |> head()
airquality %>% ftransform(STD(., cols = 1:3) |> replace_na(0)) |> head()
# The list argument feature also allows flexible operations creating multiple new columns
airquality |> # The variance of Wind and Ozone, by month, weighted by temperature:
ftransform(fvar(list(Wind_var = Wind, Ozone_var = Ozone), Month, Temp, "replace")) |> head()
# Same as above using a grouped data frame (a bit more complex)
airquality |> fgroup_by(Month) %>%
ftransform(fselect(., Wind, Ozone) |> fvar(Temp, "replace") |> add_stub("_var", FALSE)) |>
fungroup() |> head()
# This performs 2 different multi-column grouped operations (need c() to make it one list)
ftransform(airquality, c(fmedian(list(Wind_Day_median = Wind,
Ozone_Day_median = Ozone), Day, TRA = "replace"),
fsd(list(Wind_Month_sd = Wind,
Ozone_Month_sd = Ozone), Month, TRA = "replace"))) |> head()
## settransform(v) works like ftransform(v) but modifies a data frame in the global environment..
settransform(airquality, Ratio = Ozone / Temp, Ozone = NULL, Temp = NULL)
head(airquality)
rm(airquality)
# Grouped and weighted centering
settransformv(airquality, 1:3, fwithin, Month, Temp, apply = FALSE)
head(airquality)
rm(airquality)
# Suitably lagged first-differences
settransform(airquality, get_vars(airquality, 1:3) |> fdiff() |> flag(0:2))
head(airquality)
rm(airquality)
# Same as above using magrittr::`%<>%`
airquality %<>% ftransform(get_vars(., 1:3) |> fdiff() |> flag(0:2))
head(airquality)
rm(airquality)
# It is also possible to achieve the same thing via a replacement method (if needed)
ftransform(airquality) <- get_vars(airquality, 1:3) |> fdiff() |> flag(0:2)
head(airquality)
rm(airquality)
## fcompute only returns the modified / computed columns
head(fcompute(airquality, Ozone = -Ozone))
head(fcompute(airquality, new = -Ozone, Temp = (Temp-32)/1.8))
head(fcompute(airquality, new = -Ozone, new2 = 1))
# Can preserve existing columns, computed ones are added to the right if names are different
head(fcompute(airquality, new = -Ozone, new2 = 1, keep = 1:3))
# If given same name as preserved columns, preserved columns are replaced in order...
head(fcompute(airquality, Ozone = -Ozone, new = 1, keep = 1:3))
# Same holds for fcomputev
head(fcomputev(iris, is.numeric, log)) # Same as:
iris |> get_vars(is.numeric) |> dapply(log) |> head()
head(fcomputev(iris, is.numeric, log, keep = "Species")) # Adds in front
head(fcomputev(iris, is.numeric, log, keep = names(iris))) # Preserve order
# Keep a subset of the data, add standardized columns
head(fcomputev(iris, 3:4, STD, apply = FALSE, keep = names(iris)[3:5]))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.