reduceDataFrame: Reduces and expands a 'DataFrame'
In lgatto/Features: Quantitative features for mass spectrometry data

reduceDataFrame

R Documentation

Reduces and expands a `DataFrame`

Description

A long dataframe can be reduced by mergeing certain rows into a single one. These new variables are constructed as a SimpleList containing all the original values. Invariant columns, i.e columns that have the same value along all the rows that need to be merged, can be shrunk into a new variables containing that invariant value (rather than in list columns). The grouping of rows, i.e. the rows that need to be shrunk together as one, is defined by a vector.

The opposite operation is expand. But note that for a DataFrame to be expanded back, it must not to be simplified.

Usage

reduceDataFrame(x, k, count = FALSE, simplify = TRUE, drop = FALSE)

expandDataFrame(x, k = NULL)

Arguments

`x`	The `DataFrame` to be reduced or expanded.
`k`	A ‘vector’ of length `nrow(x)` defining the grouping based on which the `DataFrame` will be shrunk.
`count`	`logical(1)` specifying of an additional column (called by default `.n`) with the tally of rows shrunk into on new row should be added. Note that if already existing, `.n` will be silently overwritten.
`simplify`	A `logical(1)` defining if invariant columns should be converted to simple lists. Default is `TRUE`.
`drop`	A `logical(1)` specifying whether the non-invariant columns should be dropped altogether. Default is `FALSE`.

Value

An expanded (reduced) DataFrame.

Missing values

Missing values do have an important effect on reduce. Unless all values to be reduces are missing, they will result in an non-invariant column, and will be dropped with drop = TRUE. See the example below.

The presence of missing values can have side effects in higher level functions that rely on reduction of DataFrame objects.

Author(s)

Laurent Gatto

Examples

library("IRanges")

k <- sample(100, 1e3, replace = TRUE)
df <- DataFrame(k = k,
                x = round(rnorm(length(k)), 2),
                y = seq_len(length(k)),
                z = sample(LETTERS, length(k), replace = TRUE),
                ir = IRanges(seq_along(k), width = 10),
                r = Rle(sample(5, length(k), replace = TRUE)),
                invar = k + 1)
df

## Shinks the DataFrame
df2 <- reduceDataFrame(df, df$k)
df2

## With a tally of the number of members in each group
reduceDataFrame(df, df$k, count = TRUE)

## Much faster, but more crowded result
df3 <- reduceDataFrame(df, df$k, simplify = FALSE)
df3

## Drop all non-invariant columns
reduceDataFrame(df, df$k, drop = TRUE)

## Missing values
d <- DataFrame(k = rep(1:3, each = 3),
               x = letters[1:9],
               y = rep(letters[1:3], each = 3),
               y2 = rep(letters[1:3], each = 3))
d

## y is invariant and can be simplified
reduceDataFrame(d, d$k)
## y isn't not dropped
reduceDataFrame(d, d$k, drop = TRUE)

## BUT with a missing value
d[1, "y"] <- NA
d

## y isn't invariant/simplified anymore
reduceDataFrame(d, d$k)
## y now gets dropped
reduceDataFrame(d, d$k, drop = TRUE)

lgatto/Features documentation built on June 9, 2025, 9:35 a.m.