reduceDataFrame: Reduces and expands a 'DataFrame'

View source: R/reduce.R

reduceDataFrameR Documentation

Reduces and expands a DataFrame

Description

A long dataframe can be reduced by mergeing certain rows into a single one. These new variables are constructed as a SimpleList containing all the original values. Invariant columns, i.e columns that have the same value along all the rows that need to be merged, can be shrunk into a new variables containing that invariant value (rather than in list columns). The grouping of rows, i.e. the rows that need to be shrunk together as one, is defined by a vector.

The opposite operation is expand. But note that for a DataFrame to be expanded back, it must not to be simplified.

Usage

reduceDataFrame(x, k, count = FALSE, simplify = TRUE, drop = FALSE)

expandDataFrame(x, k = NULL)

Arguments

x

The DataFrame to be reduced or expanded.

k

A ‘vector’ of length nrow(x) defining the grouping based on which the DataFrame will be shrunk.

count

logical(1) specifying of an additional column (called by default .n) with the tally of rows shrunk into on new row should be added. Note that if already existing, .n will be silently overwritten.

simplify

A logical(1) defining if invariant columns should be converted to simple lists. Default is TRUE.

drop

A logical(1) specifying whether the non-invariant columns should be dropped altogether. Default is FALSE.

Value

An expanded (reduced) DataFrame.

Missing values

Missing values do have an important effect on reduce. Unless all values to be reduces are missing, they will result in an non-invariant column, and will be dropped with drop = TRUE. See the example below.

The presence of missing values can have side effects in higher level functions that rely on reduction of DataFrame objects.

Author(s)

Laurent Gatto

Examples

library("IRanges")

k <- sample(100, 1e3, replace = TRUE)
df <- DataFrame(k = k,
                x = round(rnorm(length(k)), 2),
                y = seq_len(length(k)),
                z = sample(LETTERS, length(k), replace = TRUE),
                ir = IRanges(seq_along(k), width = 10),
                r = Rle(sample(5, length(k), replace = TRUE)),
                invar = k + 1)
df

## Shinks the DataFrame
df2 <- reduceDataFrame(df, df$k)
df2

## With a tally of the number of members in each group
reduceDataFrame(df, df$k, count = TRUE)

## Much faster, but more crowded result
df3 <- reduceDataFrame(df, df$k, simplify = FALSE)
df3

## Drop all non-invariant columns
reduceDataFrame(df, df$k, drop = TRUE)

## Missing values
d <- DataFrame(k = rep(1:3, each = 3),
               x = letters[1:9],
               y = rep(letters[1:3], each = 3),
               y2 = rep(letters[1:3], each = 3))
d

## y is invariant and can be simplified
reduceDataFrame(d, d$k)
## y isn't not dropped
reduceDataFrame(d, d$k, drop = TRUE)

## BUT with a missing value
d[1, "y"] <- NA
d

## y isn't invariant/simplified anymore
reduceDataFrame(d, d$k)
## y now gets dropped
reduceDataFrame(d, d$k, drop = TRUE)

lgatto/Features documentation built on Sept. 22, 2024, 7:13 p.m.