qF-qG-finteraction | R Documentation |

`qF`

, shorthand for 'quick-factor' implements very fast factor generation from atomic vectors using either radix ordering or index hashing followed by sorting.

`qG`

, shorthand for 'quick-group', generates a kind of factor-light without the levels attribute but instead an attribute providing the number of levels. Optionally the levels / groups can be attached, but without converting them to character (which can have large performance implications). Objects have a class 'qG'.

`finteraction`

generates a factor or 'qG' object by interacting multiple vectors or factors. In that process missing values are always replaced with a level and unused levels/combinations are always dropped.

*collapse* internally makes optimal use of factors and 'qG' objects when passed as grouping vectors to statistical functions (`g/by`

, or `t`

arguments) i.e. typically no further grouping or ordering is performed and objects are used directly by statistical C/C++ code.

```
qF(x, ordered = FALSE, na.exclude = TRUE, sort = .op[["sort"]], drop = FALSE,
keep.attr = TRUE, method = "auto")
qG(x, ordered = FALSE, na.exclude = TRUE, sort = .op[["sort"]],
return.groups = FALSE, method = "auto")
is_qG(x)
as_factor_qG(x, ordered = FALSE, na.exclude = TRUE)
finteraction(..., factor = TRUE, ordered = FALSE, sort = factor && .op[["sort"]],
method = "auto", sep = ".")
itn(...) # Shorthand for finteraction
```

`x` |
a atomic vector, factor or quick-group. | ||||||||||||||||||||||||||

`ordered` |
logical. Adds a class 'ordered'. | ||||||||||||||||||||||||||

`na.exclude` |
logical. | ||||||||||||||||||||||||||

`sort` |
logical. | ||||||||||||||||||||||||||

`drop` |
logical. If | ||||||||||||||||||||||||||

`keep.attr` |
logical. If | ||||||||||||||||||||||||||

`method` |
an integer or character string specifying the method of computation:
Note that for | ||||||||||||||||||||||||||

`return.groups` |
logical. | ||||||||||||||||||||||||||

`factor` |
logical. | ||||||||||||||||||||||||||

`sep` |
character. The separator passed to | ||||||||||||||||||||||||||

`...` |
multiple atomic vectors or factors, or a single list of equal-length vectors or factors. See Details. |

Whenever a vector is passed to a Fast Statistical Function such as `fmean(mtcars, mtcars$cyl)`

, is is grouped using `qF`

, or `qG`

if `use.g.names = FALSE`

.

`qF`

is a combination of `as.factor`

and `factor`

. Applying it to a vector i.e. `qF(x)`

gives the same result as `as.factor(x)`

. `qF(x, ordered = TRUE)`

generates an ordered factor (same as `factor(x, ordered = TRUE)`

), and `qF(x, na.exclude = FALSE)`

generates a level for missing values (same as `factor(x, exclude = NULL)`

). An important addition is that `qF(x, na.exclude = FALSE)`

also adds a class 'na.included'. This prevents *collapse* functions from checking missing values in the factor, and is thus computationally more efficient. Therefore factors used in grouped operations should preferably be generated using `qF(x, na.exclude = FALSE)`

. Setting `sort = FALSE`

gathers the levels in first-appearance order (unless `method = "radix"`

and `x`

is numeric, in which case the levels are always sorted). This often gives a noticeable speed improvement.

There are 3 internal methods of computation: radix ordering, hashing, and Rcpp sugar hashing. Radix ordering is done by combining the functions `radixorder`

and `groupid`

. It is generally faster than hashing for large numeric data and pre-sorted data (although there are exceptions). Hashing uses `group`

, followed by `radixorder`

on the unique elements if `sort = TRUE`

. It is generally fastest for character data. Rcpp hashing uses `Rcpp::sugar::sort_unique`

and `Rcpp::sugar::match`

. This is often less efficient than the former on large data, but the sorting properties (relying on `std::sort`

) may be superior in borderline cases where `radixorder`

fails to deliver exact lexicographic ordering of factor levels.

Regarding speed: In general `qF`

is around 5x faster than `as.factor`

on character data and about 30x faster on numeric data. Automatic method dispatch typically does a good job delivering optimal performance.

`qG`

is in the first place a programmers function. It generates a factor-'light' class 'qG' consisting of only an integer grouping vector and an attribute providing the number of groups. It is slightly faster and more memory efficient than `GRP`

for grouping atomic vectors, and also convenient as it can be stored in a data frame column, which are the main reasons for its existence.

`finteraction`

is simply a wrapper around `as_factor_GRP(GRP.default(X))`

, where X is replaced by the arguments in '...' combined in a list (so its not really an interaction function but just a multivariate grouping converted to factor, see `GRP`

for computational details). In general: All vectors, factors, or lists of vectors / factors passed can be interacted. Interactions always create a level for missing values and always drop unused levels.

`qF`

returns an (ordered) factor. `qG`

returns an object of class 'qG': an integer grouping vector with an attribute `"N.groups"`

indicating the number of groups, and, if `return.groups = TRUE`

, an attribute `"groups"`

containing the vector of unique groups / elements in `x`

corresponding to the integer-id. `finteraction`

can return either.

An efficient alternative for character vectors with multithreading support is provided by `kit::charToFact`

.

`qG(x, sort = FALSE, na.exclude = FALSE, method = "hash")`

internally calls `group(x)`

which can also be used directly and also supports multivariate groupings where `x`

can be a list of vectors.

Neither `qF`

nor `qG`

reorder groups / factor levels. An exception was added in v1.7, when calling `qF(f, sort = FALSE)`

on a factor `f`

, the levels are recast in first appearance order. These objects can however be converted into one another using `qF/qG`

or the direct method `as_factor_qG`

(called inside `qF`

). It is also possible to add a class 'ordered' (`ordered = TRUE`

) and to create am extra level / integer for missing values (`na.exclude = FALSE`

) if factors or 'qG' objects are passed to `qF`

or `qG`

.

`group`

, `groupid`

, `GRP`

, Fast Grouping and Ordering, Collapse Overview

```
cylF <- qF(mtcars$cyl) # Factor from atomic vector
cylG <- qG(mtcars$cyl) # Quick-group from atomic vector
cylG # See the simple structure of this object
cf <- qF(wlddev$country) # Bigger data
cf2 <- qF(wlddev$country, na.exclude = FALSE) # With na.included class
dat <- num_vars(wlddev)
# cf2 is faster in grouped operations because no missing value check is performed
library(microbenchmark)
microbenchmark(fmax(dat, cf), fmax(dat, cf2))
finteraction(mtcars$cyl, mtcars$vs) # Interacting two variables (can be factors)
head(finteraction(mtcars)) # A more crude example..
finteraction(mtcars$cyl, mtcars$vs, factor = FALSE) # Returns 'qG', by default unsorted
group(mtcars[c("cyl", "vs")]) # Same thing. Use whatever syntax is more convenient
```

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.