group: Fast Hash-Based Grouping

View source: R/GRP.R

groupR Documentation

Fast Hash-Based Grouping

Description

group() scans the rows of a data frame (or atomic vector / list of atomic vectors), assigning to each unique row an integer id - starting with 1 and proceeding in first-appearance order of the rows. The function is written in C and optimized for R's data structures. It is the workhorse behind functions like GRP / fgroup_by, collap, qF, qG, finteraction and funique, when called with argument sort = FALSE.

Usage

group(x, starts = FALSE, group.sizes = FALSE)

Arguments

x

an atomic vector or data frame / list of equal-length atomic vectors.

starts

logical. If TRUE, an additional attribute "starts" is attached giving a vector of group starts (= index of first-occurrence of unique rows).

group.sizes

logical. If TRUE, an additional attribute "group.sizes" is attached giving the size of each group.

Details

A data frame is grouped on a column-by-column basis, starting from the leftmost column. For each new column the grouping vector obtained after the previous column is also fed back into the hash function so that unique values are determined on a running basis. The algorithm terminates as soon as the number of unique rows reaches the size of the data frame. Missing values are also grouped just like any other values. Invoking arguments starts and/or group.sizes requires an additional pass through the final grouping vector.

Value

An object is of class 'qG' see qG.

Author(s)

The Hash Function and inspiration was taken from the excellent kit package by Morgan Jacob, the algorithm was developed by Sebastian Krantz.

See Also

GRPid, Fast Grouping and Ordering, Collapse Overview

Examples

# Let's replicate what funique does
g <- group(wlddev, starts = TRUE)
if(attr(g, "N.groups") == fnrow(wlddev)) wlddev else
   ss(wlddev, attr(g, "starts"))


collapse documentation built on Nov. 3, 2024, 9:08 a.m.