# Aggregation in `matsbyname` In matsbyname: An Implementation of Matrix Mathematics

```knitr::opts_chunk\$set(
collapse = TRUE,
comment = "#>"
)
library(dplyr)
library(tibble)
library(tidyr)
library(matsbyname)
library(matsindf)
```

## Introduction

The `matsbyname` package provides several functions to assist renaming and aggregating rows and columns of matrices. This vignette shows how to use those functions.

## `aggregate_byname()`

By default (`aggregation_map = NULL, margin = c(1,2), pattern_type = "exact"`), `aggregate_byname()` sums all rows and columns with the same names, with the effect that remaining row and column names are unique.

```m <- matrix(c(1, 2, 3, 4,
5, 6, 7, 8,
9, 10, 11, 12), nrow = 3, ncol = 4, byrow = TRUE,
dimnames = list(c("duck", "duck", "goose"),
c("John", "Paul", "George", "Ringo")))
m
aggregate_byname(m)
```

An `aggregation_map` can be provided, giving instructions for rows or columns to aggregate and the names of the results. `aggregation_map` should be a list of named strings. The entries in the list give names of rows and columns to be aggregated. The names of the list entries provide the names of the resulting aggregates.

```m
aggregate_byname(m, aggregation_map = list(birds = c("duck", "goose"),
guitarists = c("John", "Paul", "George")))
```

The margin over which the aggregation is to be performed is given by the `margin` argument (`1` for rows, `2` for columns).

```m
aggregate_byname(m, aggregation_map = list(Beatles = c("John", "Paul", "George", "Ringo")),
margin = 2)
```

`aggregation_map` can use regular expressions to identify rows and columns to aggregate. Use `pattern_type = "literal"` for this feature.

```m
aggregate_byname(m, aggregation_map = list(guitarists = "^[JPG]"),
margin = 2, pattern_type = "literal")
```

Note that rows and columns of aggregated matrices are always sorted alphabetically.

```m
aggregate_byname(m, aggregation_map = list(birds = c("duck", "goose"),
zguitarists = c("John", "Paul", "George")))
```

It is an error to aggregate over a margin and leave identically named rows or columns. The following function call will fail, because it aggregates over both rows and columns (using the default `margin = c(1,2)`) with nothing in the `aggregation_map` to aggregate the two "duck" rows.

```# Not run
aggregate_byname(m, aggregation_map = list(Beatles = c("John", "Paul", "George", "Ringo")))
```

## Pieces

Commonly, row and column names are complex, carrying information in prefixes, suffixes, and prepositional phrases. `matsbyname` can aggregate by pieces of a name, using the `RCLabels` package internally.

We'll use the following matrix to demonstrate aggregating by pieces. The rows and columns use different notations for names ("bracket" notation for rows and "arrow" notation for columns). The renaming and aggregation capabilities of `matsbyname` still work, despite the different notations.

```m_pieces <- matrix(c(1, 2, 3,
4, 5, 6), nrow = 2, ncol = 3, byrow = TRUE,
dimnames = list(c("Electricity [from Coal]", "Electricity [from Solar]"),
c("Motors -> MD", "Cars -> MD", "LED lamps -> Light")))
m_pieces
```

## `rename_to_piece_byname()`

Rows and columns can be renamed to their prefixes, suffixes, or objects of prepositions, as demonstrated below.

```m_pieces
rename_to_piece_byname(m_pieces, piece = "pref", margin = 1,
notation = RCLabels::bracket_notation)
rename_to_piece_byname(m_pieces, piece = "suff", margin = 1,
notation = RCLabels::bracket_notation)
rename_to_piece_byname(m_pieces, piece = "from", margin = 1,
notation = RCLabels::bracket_notation)
rename_to_piece_byname(m_pieces, piece = "pref", margin = 2,
notation = RCLabels::arrow_notation)
rename_to_piece_byname(m_pieces, piece = "suff", margin = 2,
notation = RCLabels::arrow_notation)
```

These renamings can be used for aggregations in which identically named rows and columns are summed with `aggregate_byname()`.

## `aggregate_pieces_byname()`

`aggregate_pieces_byname()` bundles renaming and aggregating tasks in a single function call. First, rows and/or columns are renamed to the requested `piece` with the `rename_to_piece_byname()` function. Then, aggregation is performed via `aggregate_byname()`, according to an `aggregation_map` and the `pattern_type`, if provided. With the default `aggregation_map = NULL`, identically named pieces are aggregated together.

```m_pieces
# Aggregate Electricity in rows
aggregate_pieces_byname(m_pieces, piece = "pref", margin = 1,
notation = RCLabels::bracket_notation)
# Aggregate useful energy types in columns
aggregate_pieces_byname(m_pieces, piece = "suff", margin = 2,
notation = RCLabels::arrow_notation)
```

When an `aggregation_map` is supplied, it applies to the requested `piece`, not to the original row and/or column names, as shown below.

```m_pieces
# Aggregate by original energy type
aggregate_pieces_byname(m_pieces, piece = "from", margin = 1,
notation = RCLabels::bracket_notation,
aggregation_map = list(`All sources` = c("Coal", "Solar")))

aggregate_pieces_byname(m_pieces, piece = "suff", margin = 2,
notation = RCLabels::arrow_notation,
aggregation_map = list(`Transport` = "MD"))
```

## Aggregations of lists and data frames of matrices

The functions for renaming and aggregating can be used on lists and data frames of matrices.

```m_pieces
res <- rename_to_piece_byname(list(m_pieces, m_pieces),
piece = list("pref", "suff"),
margin = list(1, 2),
notation = list(RCLabels::bracket_notation,
RCLabels::arrow_notation))
res
df <- tibble::tibble(mats = list(m_pieces, m_pieces),
pce = list("suff", "pref"),
mgn = list(1, 2),
am = list(list(Sources = c("Coal", "Solar")),
list(Transport = c("Motors", "Cars"))),
notn = list(RCLabels::from_notation, RCLabels::arrow_notation))
df
res2 <- df %>%
dplyr::mutate(
aggregated = aggregate_pieces_byname(mats, piece = pce, margin = mgn,
aggregation_map = am, notation = notn)
)
res2
res2\$aggregated[]
res2\$aggregated[]
```

## Aggregation via `dplyr::summarise()`

Another type of aggregation is aided by the metadata columns of a `matsindf`-style data frame. With single numbers, an aggregation might look like this:

```df_simple <- tibble::tribble(~key, ~val,
"A", 1,
"A", 2,
"B", 10)
df_simple
df_simple %>%
dplyr::group_by(key) %>%
dplyr::summarise(val = sum(val))
```

The same aggregation gives unexpected results with the default arguments to the `sum_byname()` function (specifically, `.summarise = FALSE`), because `sum_byname()` is ambiguous for a data frame. Should the column be returned unchanged, because each element is interpreted as the augend for a series of sums that is missing addends, in which case the length of the returned object is the same as the length of the input? Or should the list of objects be summed down the column, returning only a single item (for each group), as in the `dplyr::summarise()` function? (See the vignette titled "Using summarise in matsbyname" for additional detail about this ambiguity.) In the example below, the grouping has no effect on the `summarise()` function, because `sum_byname(.summarise = FALSE)` assumes that each row of `val` is an augend without an addend.

```# 2 rows are expected. 3 are observed.
df_simple %>%
dplyr::group_by(key) %>%
dplyr::summarise(val = sum_byname(val), .groups = "drop")
```

To signal intention to summarise down the `val` column, set `.summarise = TRUE` in the call to `sum_byname()`. Note that `sum_byname(.summarise = TRUE)` always returns a list column, because if the summarised column were to contain matrices, it must be a list column.

```res <- df_simple %>%
dplyr::group_by(key) %>%
dplyr::summarise(val = sum_byname(val, .summarise = TRUE))
# res\$val is a list column.
res
res\$val
```

The `.summarise = TRUE` argument works when there are matrices in a `matsindf` data frame, too.

```m <- matrix(c(11, 12, 13,
21, 22, 23), nrow = 2, ncol = 3, byrow = TRUE,
dimnames = list(c("r1", "r2"), c("c1", "c2", "c3")))
df <- tibble::tibble(key = c("A", "A", "B"), m = list(m, m, m))
unexpected <- df %>%
dplyr::group_by(key) %>%
dplyr::summarise(m = sum_byname(m), .groups = "drop")
# 2 rows are expected. 3 are observed.
unexpected
res <- df %>%
dplyr::group_by(key) %>%
dplyr::summarise(m = sum_byname(m, .summarise = TRUE))
res
res\$m[]
res\$m[]
```

## Working with aggregation maps

An aggregation map is defined to be a named list. But the source of that named list is often a data frame in which many-to-one relationships are defined. `agg_table_to_agg_map()` assists converting from a two-column data frame to an aggregation map.

```df <- tibble::tribble(~member, ~role, ~band,
"John", "guitarists", "The Beatles",
"Paul", "guitarists", "The Beatles",
"George", "guitarists", "The Beatles",
"Ringo", "drummers", "The Beatles",
"Mick", "singers", "Rolling Stones",
"Keith", "guitarists", "Rolling Stones",
"Ronnie", "guitarists", "Rolling Stones",
"Bill", "guitarists", "Rolling Stones",
"Charlie", "drummers", "Rolling Stones")
df
bands_membs_agg_map <- agg_table_to_agg_map(df, few_colname = "band", many_colname = "member")
bands_membs_agg_map
agg_table_to_agg_map(df, few_colname = "role", many_colname = "member")
```

In a similar manner, an aggregation map can be converted to a data frame to assist with join operations with data frames.

```agg_map_to_agg_table(bands_membs_agg_map,
few_colname = "bands",
many_colname = "members")
```

## Summary

The `matsbyname` package simplifies aggregation of matrix rows and columns based on row and column names or pieces of row and column names. In particular the functions `aggregate_byname()`, `rename_to_piece_byname()`, and `aggregate_pieces_byname()` provide flexibility in how renaming and aggregation can be accomplished. When working with aggregation maps, the functions `agg_table_to_agg_map()` and `agg_map_to_agg_table()` assist conversion from one data shape to another.

## Try the matsbyname package in your browser

Any scripts or data that you put into this service are public.

matsbyname documentation built on April 2, 2022, 1:06 a.m.