BY | R Documentation |
BY
is an S3 generic that efficiently applies functions over vectors or matrix- and data frame columns by groups. Similar to dapply
it seeks to retain the structure and attributes of the data, but can also output to various standard formats. A simple parallelism is also available.
BY(x, ...)
## Default S3 method:
BY(x, g, FUN, ..., use.g.names = TRUE, sort = .op[["sort"]], reorder = TRUE,
expand.wide = FALSE, parallel = FALSE, mc.cores = 1L,
return = c("same", "vector", "list"))
## S3 method for class 'matrix'
BY(x, g, FUN, ..., use.g.names = TRUE, sort = .op[["sort"]], reorder = TRUE,
expand.wide = FALSE, parallel = FALSE, mc.cores = 1L,
return = c("same", "matrix", "data.frame", "list"))
## S3 method for class 'data.frame'
BY(x, g, FUN, ..., use.g.names = TRUE, sort = .op[["sort"]], reorder = TRUE,
expand.wide = FALSE, parallel = FALSE, mc.cores = 1L,
return = c("same", "matrix", "data.frame", "list"))
## S3 method for class 'grouped_df'
BY(x, FUN, ..., reorder = TRUE, keep.group_vars = TRUE, use.g.names = FALSE)
x |
a vector, matrix, data frame or alike object. |
g |
a |
FUN |
a function, can be scalar- or vector-valued. For vector valued functions see also |
... |
further arguments to |
use.g.names |
logical. Make group-names and add to the result as names (default method) or row-names (matrix and data frame methods). For vector-valued functions (row-)names are only generated if the function itself creates names for the statistics e.g. |
sort |
logical. Sort the groups? Internally passed to |
reorder |
logical. If a vector-valued function is passed that preserves the data length, |
expand.wide |
logical. If |
parallel |
logical. |
mc.cores |
integer. Argument to |
return |
an integer or string indicating the type of object to return. The default |
keep.group_vars |
grouped_df method: Logical. |
BY
is a re-implementation of the Split-Apply-Combine computing paradigm. It is faster than tapply
, by
, aggregate
and (d)plyr, and preserves data attributes just like dapply
.
It is principally a wrapper around lapply(gsplit(x, g), FUN, ...)
, that uses gsplit
for optimized splitting and also strongly optimizes on the internal code compared to base R functions. For more details look at the documentation for dapply
which works very similar (apart from the splitting performed in BY
). The function is intended for simple cases involving flexible computation of statistics across groups using a single function e.g. iris |> gby(Species) |> BY(IQR)
is simpler than iris |> gby(Species) |> smr(acr(.fns = IQR))
etc..
X
where FUN
was applied to every column split by g
.
dapply
, collap
, Fast Statistical Functions, Data Transformations, Collapse Overview
v <- iris$Sepal.Length # A numeric vector
g <- GRP(iris$Species) # A grouping
## default vector method
BY(v, g, sum) # Sum by species
head(BY(v, g, scale)) # Scale by species (please use fscale instead)
BY(v, g, fquantile) # Species quantiles: by default stacked
BY(v, g, fquantile, expand.wide = TRUE) # Wide format
## matrix method
m <- qM(num_vars(iris))
BY(m, g, sum) # Also return as matrix
BY(m, g, sum, return = "data.frame") # Return as data.frame.. also works for computations below
head(BY(m, g, scale))
BY(m, g, fquantile)
BY(m, g, fquantile, expand.wide = TRUE)
ml <- BY(m, g, fquantile, expand.wide = TRUE, # Return as list of matrices
return = "list")
ml
# Unlisting to Data Frame
unlist2d(ml, idcols = "Variable", row.names = "Species")
## data.frame method
BY(num_vars(iris), g, sum) # Also returns a data.fram
BY(num_vars(iris), g, sum, return = 2) # Return as matrix.. also works for computations below
head(BY(num_vars(iris), g, scale))
BY(num_vars(iris), g, fquantile)
BY(num_vars(iris), g, fquantile, expand.wide = TRUE)
BY(num_vars(iris), g, fquantile, # Return as list of matrices
expand.wide = TRUE, return = "list")
## grouped data frame method
giris <- fgroup_by(iris, Species)
giris |> BY(sum) # Compute sum
giris |> BY(sum, use.g.names = TRUE, # Use row.names and
keep.group_vars = FALSE) # remove 'Species' and groups attribute
giris |> BY(sum, return = "matrix") # Return matrix
giris |> BY(sum, return = "matrix", # Matrix with row.names
use.g.names = TRUE)
giris |> BY(.quantile) # Compute quantiles (output is stacked)
giris |> BY(.quantile, names = TRUE, # Wide output
expand.wide = TRUE)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.