data-transformations | R Documentation |
collapse provides an ensemble of functions to perform common data transformations efficiently and user friendly:
dapply
applies functions to rows or columns of matrices and data frames, preserving the data format.
BY
is an S3 generic for efficient Split-Apply-Combine computing, similar to dapply
.
A set of arithmetic operators facilitates row-wise %rr%
, %r+%
, %r-%
, %r*%
, %r/%
and
column-wise %cr%
, %c+%
, %c-%
, %c*%
, %c/%
replacing and sweeping operations involving a vector and a matrix or data frame / list. Since v1.7, the operators %+=%
, %-=%
, %*=%
and %/=%
do column- and element- wise math by reference, and the function setop
can also perform sweeping out rows by reference.
(set)TRA
is a more advanced S3 generic to efficiently perform (groupwise) replacing and sweeping out of statistics, either by creating a copy of the data or by reference.
Supported operations are:
Integer-id | String-id | Description | ||
0 | "na" or "replace_na" | replace only missing values | ||
1 | "fill" or "replace_fill" | replace everything | ||
2 | "replace" | replace data but preserve missing values | ||
3 | "-" | subtract | ||
4 | "-+" | subtract group-statistics but add group-frequency weighted average of group statistics | ||
5 | "/" | divide | ||
6 | "%" | compute percentages | ||
7 | "+" | add | ||
8 | "*" | multiply | ||
9 | "%%" | modulus | ||
10 | "-%%" | subtract modulus |
All of collapse's Fast Statistical Functions have a built-in TRA
argument for faster access (i.e. you can compute (groupwise) statistics and use them to transform your data with a single function call).
fscale/STD
is an S3 generic to perform (groupwise and / or weighted) scaling / standardizing of data and is orders of magnitude faster than scale
.
fwithin/W
is an S3 generic to efficiently perform (groupwise and / or weighted) within-transformations / demeaning / centering of data. Similarly fbetween/B
computes (groupwise and / or weighted) between-transformations / averages (also a lot faster than ave
).
fhdwithin/HDW
, shorthand for 'higher-dimensional within transform', is an S3 generic to efficiently center data on multiple groups and partial-out linear models (possibly involving many levels of fixed effects and interactions). In other words, fhdwithin/HDW
efficiently computes residuals from linear models. Similarly fhdbetween/HDB
, shorthand for 'higher-dimensional between transformation', computes the corresponding means or fitted values.
flag/L/F
, fdiff/D/Dlog
and fgrowth/G
are S3 generics to compute sequences of lags / leads and suitably lagged and iterated (quasi-, log-) differences and growth rates on time series and panel data. fcumsum
flexibly computes (grouped, ordered) cumulative sums. More in Time Series and Panel Series.
STD, W, B, HDW, HDB, L, D, Dlog
and G
are parsimonious wrappers around the f-
functions above representing the corresponding transformation 'operators'. They have additional capabilities when applied to data-frames (i.e. variable selection, formula input, auto-renaming and id-variable preservation), and are easier to employ in regression formulas, but are otherwise identical in functionality.
Function / S3 Generic | Methods | Description | ||
dapply | No methods, works with matrices and data frames | Apply functions to rows or columns | ||
BY | default, matrix, data.frame, grouped_df | Split-Apply-Combine computing | ||
%(r/c)(r/+/-/*//)% | No methods, works with matrices and data frames / lists | Row- and column-arithmetic | ||
(set)TRA | default, matrix, data.frame, grouped_df | Replace and sweep out statistics (by reference) | ||
fscale/STD | default, matrix, data.frame, pseries, pdata.frame, grouped_df | Scale / standardize data | ||
fwithin/W | default, matrix, data.frame, pseries, pdata.frame, grouped_df | Demean / center data | ||
fbetween/B | default, matrix, data.frame, pseries, pdata.frame, grouped_df | Compute means / average data | ||
fhdwithin/HDW | default, matrix, data.frame, pseries, pdata.frame | High-dimensional centering and lm residuals | ||
fhdbetween/HDB | default, matrix, data.frame, pseries, pdata.frame | High-dimensional averages and lm fitted values | ||
flag/L/F , fdiff/D/Dlog , fgrowth/G , fcumsum | default, matrix, data.frame, pseries, pdata.frame, grouped_df | (Sequences of) lags / leads, differences, growth rates and cumulative sums |
Collapse Overview, Fast Statistical Functions, Time Series and Panel Series
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.