knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) library(dplyr) library(tibble) library(matsbyname) library(matsindf)
The matsbyname
package provides several
functions to assist renaming and aggregating rows and columns of matrices.
This vignette shows how to use those functions.
aggregate_byname()
By default (aggregation_map = NULL, margin = c(1,2), pattern_type = "exact"
),
aggregate_byname()
sums all rows and columns with the same names,
with the effect that remaining row and column names are unique.
m <- matrix(c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12), nrow = 3, ncol = 4, byrow = TRUE, dimnames = list(c("duck", "duck", "goose"), c("John", "Paul", "George", "Ringo"))) m aggregate_byname(m)
An aggregation_map
can be provided,
giving instructions for rows or columns to aggregate and
the names of the results.
aggregation_map
should be a list of named strings.
The entries in the list give names of rows and columns to be aggregated.
The names of the list entries provide the names of the resulting aggregates.
m aggregate_byname(m, aggregation_map = list(birds = c("duck", "goose"), guitarists = c("John", "Paul", "George")))
The margin over which the aggregation is to be performed
is given by the margin
argument
(1
for rows, 2
for columns).
m aggregate_byname(m, aggregation_map = list(Beatles = c("John", "Paul", "George", "Ringo")), margin = 2)
aggregation_map
can use regular expressions to identify rows and columns to aggregate.
Use pattern_type = "literal"
for this feature.
m aggregate_byname(m, aggregation_map = list(guitarists = "^[JPG]"), margin = 2, pattern_type = "literal")
Note that rows and columns of aggregated matrices are always sorted alphabetically.
m aggregate_byname(m, aggregation_map = list(birds = c("duck", "goose"), zguitarists = c("John", "Paul", "George")))
It is an error to aggregate over a margin and leave
identically named rows or columns.
The following function call will fail,
because it aggregates over both rows and columns
(using the default margin = c(1,2)
)
with nothing in the aggregation_map
to aggregate the two "duck" rows.
The error is informative:
"Row names not unique. Duplicated row names are: duck".
# Not run aggregate_byname(m, aggregation_map = list(Beatles = c("John", "Paul", "George", "Ringo")))
Commonly, row and column names are complex,
carrying information
in prefixes, suffixes, and prepositional phrases.
matsbyname
can aggregate by pieces of a name,
using the RCLabels
package internally.
We'll use the following matrix to demonstrate aggregating by pieces.
The rows and columns use different notations for names
(RCLabels::bracket_notation
for rows and
RCLabels::arrow_notation
for columns).
The renaming and aggregation capabilities of matsbyname
still work, despite the different notations.
m_pieces <- matrix(c(1, 2, 3, 4, 5, 6), nrow = 2, ncol = 3, byrow = TRUE, dimnames = list(c("Electricity [from Coal]", "Electricity [from Solar]"), c("Motors -> MD", "Cars -> MD", "LED lamps -> Light"))) m_pieces
rename_to_piece_byname()
Rows and columns can be renamed to their prefixes, suffixes, or objects of prepositions, as demonstrated below.
m_pieces rename_to_piece_byname(m_pieces, piece = "pref", margin = 1, notation = RCLabels::bracket_notation) rename_to_piece_byname(m_pieces, piece = "suff", margin = 1, notation = RCLabels::bracket_notation) rename_to_piece_byname(m_pieces, piece = "from", margin = 1, notation = RCLabels::bracket_notation) rename_to_piece_byname(m_pieces, piece = "pref", margin = 2, notation = RCLabels::arrow_notation) rename_to_piece_byname(m_pieces, piece = "suff", margin = 2, notation = RCLabels::arrow_notation)
In the examples above,
renaming was accomplished by specifying the notation
for row or column names.
But notation for row and column labels can also be inferred
via RCLabels::infer_notation()
.
To infer notation when renaming,
set inf_notation = TRUE
(the default) and
give a list of notations from which the notation can be inferred
in the notation
argument.
By default, notation = list(RCLabels::notations_list)
, because
each notation in RCLabels::notations_list
is a list itself.
m_pieces rename_to_piece_byname(m_pieces, piece = "pref", margin = 1)
When inferring notation, both margins can be renamed at the same time, despite having different notations.
rename_to_piece_byname(m_pieces, piece = "pref", margin = c(1, 2))
But margin = list(c(1, 2))
is the default, so the code can be simpler still.
rename_to_piece_byname(m_pieces, piece = "pref") rename_to_piece_byname(m_pieces, piece = "suff")
Sometimes, a row or column label can match more than one possible notation
.
In the above example, the row names are inferred
to conform to RCLabels::bracket_notation
,
the first match in RCLabels::notations_list
.
To specify the most-specific notation,
set choose_most_specific = TRUE
.
With choose_most_specific = TRUE
, RCLabels::from_notation
is inferred,
the suffixes are different, and
the renamed rows no longer contain "from ".
rename_to_piece_byname(m_pieces, piece = "suff", choose_most_specific = TRUE)
Note that "noun" is a synonym for "pref".
rename_to_piece_byname(m_pieces, piece = "noun")
The margin can be specified using row or column types from which the numerical margin is inferred.
m_pieces_with_types <- m_pieces %>% setrowtype("Product") %>% setcoltype("Industry") m_pieces_with_types m_pieces_with_types %>% rename_to_piece_byname(piece = "pref", margin = "Product") m_pieces_with_types %>% rename_to_piece_byname(piece = "suff", margin = "Product") m_pieces_with_types %>% rename_to_piece_byname(piece = "from", margin = "Product") m_pieces_with_types %>% rename_to_piece_byname(piece = "suff", margin = "Product", choose_most_specific = TRUE) m_pieces_with_types %>% rename_to_piece_byname(piece = "suff", margin = "Industry")
Such renamings can be used for aggregations
in which identically named rows and columns are summed
with aggregate_byname()
,
as demonstrated in section below.
aggregate_pieces_byname()
aggregate_pieces_byname()
bundles
renaming and aggregating tasks in a single function call.
First, rows and/or columns are renamed to the requested piece
with the rename_to_piece_byname()
function.
Then, aggregation is performed via aggregate_byname()
,
according to an aggregation_map
and the pattern_type
,
if provided.
With the default aggregation_map = NULL
,
identically named pieces are aggregated together.
m_pieces # Aggregate Electricity in rows aggregate_pieces_byname(m_pieces, piece = "pref", margin = 1, notation = RCLabels::bracket_notation) # Aggregate useful energy types in columns aggregate_pieces_byname(m_pieces, piece = "suff", margin = 2, notation = RCLabels::arrow_notation)
When an aggregation_map
is supplied,
it applies to the requested piece
,
not to the original row and/or column names, as shown below.
m_pieces # Aggregate by original energy type aggregate_pieces_byname(m_pieces, piece = "from", margin = 1, notation = RCLabels::bracket_notation, aggregation_map = list(`All sources` = c("Coal", "Solar"))) aggregate_pieces_byname(m_pieces, piece = "suff", margin = 2, notation = RCLabels::arrow_notation, aggregation_map = list(`Transport` = "MD"))
The functions for renaming and aggregating can be used on lists and data frames of matrices.
m_pieces res <- rename_to_piece_byname(list(m_pieces, m_pieces), piece = list("pref", "suff"), margin = list(1, 2), notation = list(RCLabels::bracket_notation, RCLabels::arrow_notation)) res df <- tibble::tibble(mats = list(m_pieces, m_pieces), pce = list("suff", "pref"), mgn = list(1, 2), am = list(list(Sources = c("Coal", "Solar")), list(Transport = c("Motors", "Cars"))), notn = list(RCLabels::from_notation, RCLabels::arrow_notation)) df res2 <- df %>% dplyr::mutate( aggregated = aggregate_pieces_byname(mats, piece = pce, margin = mgn, aggregation_map = am, notation = notn) ) res2 res2$aggregated[[1]] res2$aggregated[[2]]
dplyr::summarise()
Another type of aggregation is aided by
the metadata columns of a matsindf
-style data frame.
With single numbers, an aggregation might look like this:
df_simple <- tibble::tribble(~key, ~val, "A", 1, "A", 2, "B", 10) df_simple df_simple %>% dplyr::group_by(key) %>% dplyr::summarise(val = sum(val))
The same aggregation gives unexpected results
with the default arguments to the sum_byname()
function
(specifically, .summarise = FALSE
),
because sum_byname()
is ambiguous for a data frame.
Should the column be returned unchanged, because each element is interpreted as
the augend for a series of sums that is missing addends,
in which case the length of the returned object is the same as the
length of the input?
Or should the list of objects be summed down the column,
returning only a single item (for each group),
as in the dplyr::summarise()
function?
(See the vignette titled "Using summarise in matsbyname"
for additional detail about this ambiguity.)
In the example below, the grouping has no effect
on the summarise()
function, because sum_byname(.summarise = FALSE)
assumes that each row of val
is an augend without an addend.
# 2 rows are expected. 3 are observed. df_simple %>% dplyr::group_by(key) %>% dplyr::summarise(val = sum_byname(val), .groups = "drop")
To signal intention to summarise down the val
column,
set .summarise = TRUE
in the call to sum_byname()
.
Note that sum_byname(.summarise = TRUE)
always returns a list column,
because if the summarised column were to contain matrices,
it must be a list column.
res <- df_simple %>% dplyr::group_by(key) %>% dplyr::summarise(val = sum_byname(val, .summarise = TRUE)) # res$val is a list column. res res$val
The .summarise = TRUE
argument works when there are matrices
in a matsindf
data frame, too.
m <- matrix(c(11, 12, 13, 21, 22, 23), nrow = 2, ncol = 3, byrow = TRUE, dimnames = list(c("r1", "r2"), c("c1", "c2", "c3"))) df <- tibble::tibble(key = c("A", "A", "B"), m = list(m, m, m)) unexpected <- df %>% dplyr::group_by(key) %>% dplyr::summarise(m = sum_byname(m), .groups = "drop") # 2 rows are expected. 3 are observed. unexpected res <- df %>% dplyr::group_by(key) %>% dplyr::summarise(m = sum_byname(m, .summarise = TRUE)) res res$m[[1]] res$m[[2]]
An aggregation map is defined to be a named list.
But the source of that named list is often a data frame
in which many-to-one relationships are defined.
agg_table_to_agg_map()
assists converting from a two-column data frame
to an aggregation map.
df <- tibble::tribble(~member, ~role, ~band, "John", "guitarists", "The Beatles", "Paul", "guitarists", "The Beatles", "George", "guitarists", "The Beatles", "Ringo", "drummers", "The Beatles", "Mick", "singers", "Rolling Stones", "Keith", "guitarists", "Rolling Stones", "Ronnie", "guitarists", "Rolling Stones", "Bill", "guitarists", "Rolling Stones", "Charlie", "drummers", "Rolling Stones") df bands_membs_agg_map <- agg_table_to_agg_map(df, few_colname = "band", many_colname = "member") bands_membs_agg_map agg_table_to_agg_map(df, few_colname = "role", many_colname = "member")
In a similar manner, an aggregation map can be converted to a data frame to assist with join operations with data frames.
agg_map_to_agg_table(bands_membs_agg_map, few_colname = "bands", many_colname = "members")
The matsbyname
package simplifies aggregation of matrix rows and columns
based on row and column names
or pieces of row and column names.
In particular the functions
aggregate_byname()
,
rename_to_piece_byname()
, and
aggregate_pieces_byname()
provide flexibility in how renaming and aggregation can be accomplished.
When working with aggregation maps,
the functions
agg_table_to_agg_map()
and agg_map_to_agg_table()
assist conversion from one data shape to another.
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.