knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) library(dplyr) library(matsbyname) library(matsindf) library(tidyr)
matsindf_apply()
is a powerful and versatile function
that enables analysis with lists and data frames by applying
FUN
in helpful ways.
The function is called matsindf_apply()
,
because it can be used to apply FUN
to a matsindf
data frame,
a data frame that contains matrices as individual entries in a data frame.
(A matsindf
data frame can be created by
calling collapse_to_matrices()
, as demonstrated below.)
But matsindf_apply()
can apply FUN
across much more:
data frames of single numbers,
lists of matrices,
lists of single numbers, and
individual numbers.
This vignette demonstrates matsindf_apply()
,
starting with simple examples and
proceeding toward sophisticated analyses.
The basis of all analyses conducted with matsindf_apply()
is a function (FUN
) to be applied across data
supplied in .dat
or ...
.
FUN
must return a named list of variables as
its result.
Here is an example function that both adds and subtracts its arguments,
a
and b
, and
returns a list containing its result, c
and d
.
example_fun <- function(a, b){ return(list(c = matsbyname::sum_byname(a, b), d = matsbyname::difference_byname(a, b))) }
Similar to lapply()
and its siblings,
additional argument(s) to matsindf_apply()
include
the data over which FUN
is to be applied.
These arguments can, in the first instance,
be supplied as named arguments to the ...
argument
of matsindf_apply()
.
All arguments in ...
must be named.
The ...
arguments to matsindf_apply()
are passed to FUN
according to their names.
In this case, the output of matsindf_apply()
is the the named list returned by FUN
.
matsindf_apply(FUN = example_fun, a = 2, b = 1)
Passing an additional argument (z = 2
)
causes an unused argument error,
because example_fun
does not have a z
argument.
tryCatch( matsindf_apply(FUN = example_fun, a = 2, b = 1, z = 2), error = function(e){e} )
Failing to pass a needed argument (b
)
causes an error that indicates the missing argument.
tryCatch( matsindf_apply(FUN = example_fun, a = 2), error = function(e){e} )
Alternatively, arguments to FUN
can be given
in a named list to .dat
, the first argument of matsindf_apply()
.
When a value is assigned to .dat
,
the return value from matsindf_apply()
contains all named variables in .dat
(in this case both a
and b
)
in addition to the results provided by FUN
(in this case both c
and d
).
matsindf_apply(list(a = 2, b = 1), FUN = example_fun)
Extra variables are tolerated in .dat
,
because .dat
is considered to be a store of data
from which variables can be drawn as needed.
matsindf_apply(list(a = 2, b = 1, z = 42), FUN = example_fun)
In contrast, arguments to ...
are named explicitly by the user,
so including an extra argument in ...
is considered an error,
as shown above.
If a named argument is supplied by both .dat
and ...
,
the argument in ...
takes precedence,
overriding the argument in .dat
.
matsindf_apply(list(a = 2, b = 1), FUN = example_fun, a = 10)
When supplying both .dat
and ...
,
...
can contain named strings of length 1
which are interpreted as mappings
from named items in .dat
to arguments in the signature of FUN
.
In the example below,
a = "z"
indicates that argument a
to FUN
should be supplied by item z
in .dat
.
matsindf_apply(list(a = 2, b = 1, z = 42), FUN = example_fun, a = "z")
If a named argument appears in both .dat
and the output of FUN
,
a name collision occurs in the output of matsindf_apply()
, and
a warning is issued.
tryCatch( matsindf_apply(list(a = 2, b = 1, c = 42), FUN = example_fun), warning = function(w){w} )
FUN
can accept more than just numerics.
example_fun_with_string()
accepts a character string and a numeric.
However, because ...
argument that is a character string
of length 1
has special meaning
(namely mapping variables in .dat
to arguments of FUN
),
passing a character string of length 1
can cause an error.
To get around the problem, wrap the single string
in a list, as shown below.
example_fun_with_string <- function(str_a, b) { a <- as.numeric(str_a) list(added = matsbyname::sum_byname(a, b), subtracted = matsbyname::difference_byname(a, b)) } # Causes an error tryCatch( matsindf_apply(FUN = example_fun_with_string, str_a = "1", b = 2), error = function(e){e} ) # To solve the problem, wrap "1" in list(). matsindf_apply(FUN = example_fun_with_string, str_a = list("1"), b = 2) matsindf_apply(FUN = example_fun_with_string, str_a = list("1"), b = list(2)) matsindf_apply(FUN = example_fun_with_string, str_a = list("1", "3"), b = list(2, 4)) matsindf_apply(.dat = list(str_a = list("1"), b = list(2)), FUN = example_fun_with_string) matsindf_apply(.dat = list(m = list("1"), n = list(2)), FUN = example_fun_with_string, str_a = "m", b = "n")
matsindf_apply()
and data frames.dat
can also contain a data frame (or tibble),
both of which are fancy lists.
When .dat
is a data frame or tibble,
the output of matsindf_apply()
is a tibble, and
FUN
acts like a specialized dplyr::mutate()
,
adding new columns at the right of .dat
.
matsindf_apply(.dat = data.frame(str_a = c("1", "3"), b = c(2, 4)), FUN = example_fun_with_string) matsindf_apply(.dat = data.frame(str_a = c("1", "3"), b = c(2, 4)), FUN = example_fun_with_string, str_a = "str_a", b = "b") matsindf_apply(.dat = data.frame(m = c("1", "3"), n = c(2, 4)), FUN = example_fun_with_string, str_a = "m", b = "n")
Additional niceties are available when .dat
is a data frame or a tibble.
matsindf_apply()
works when the data frame is filled with single numeric values,
as is typical.
df <- data.frame(a = 2:4, b = 1:3) matsindf_apply(df, FUN = example_fun)
But matsindf_apply()
also works with matsindf
data frames,
data frames in which each cell of the data frame is filled with a single matrix.
To demonstrate use of matsindf_apply()
with a matsindf
data frame,
we'll construct a simple matsindf
data frame (midf
)
using functions in this package.
# Create a tidy data frame containing data for matrices tidy <- tibble::tibble(Year = rep(c(rep(2017, 4), rep(2018, 4)), 2), matnames = c(rep("U", 8), rep("V", 8)), matvals = c(1:4, 11:14, 21:24, 31:34), rownames = c(rep(c(rep("p1", 2), rep("p2", 2)), 2), rep(c(rep("i1", 2), rep("i2", 2)), 2)), colnames = c(rep(c("i1", "i2"), 4), rep(c("p1", "p2"), 4))) |> dplyr::mutate( rowtypes = case_when( matnames == "U" ~ "Product", matnames == "V" ~ "Industry", TRUE ~ NA_character_ ), coltypes = case_when( matnames == "U" ~ "Industry", matnames == "V" ~ "Product", TRUE ~ NA_character_ ) ) tidy # Convert to a matsindf data frame midf <- tidy |> dplyr::group_by(Year, matnames) |> collapse_to_matrices(rowtypes = "rowtypes", coltypes = "coltypes") |> tidyr::pivot_wider(names_from = "matnames", values_from = "matvals") # Take a look at the midf data frame and some of the matrices it contains. midf midf$U[[1]] midf$V[[1]]
With midf
in hand, we can demonstrate use of
tidyverse
-style
functional programming to perform
matrix algebra within a data frame.
The functions of the matsbyname
package
(such as difference_byname()
below)
can be used for this purpose.
result <- midf |> dplyr::mutate( W = difference_byname(transpose_byname(V), U) ) result result$W[[1]] result$W[[2]]
This way of performing matrix calculations works equally well
within a 2-row matsindf
data frame
(as shown above) or
within a 1000-row matsindf
data frame.
matsindf_apply()
Users can write their own functions using matsindf_apply()
.
A flexible calc_W()
function can be written as follows.
calc_W <- function(.DF = NULL, U = "U", V = "V", W = "W") { # The inner function does all the work. W_func <- function(U_mat, V_mat){ # When we get here, U_mat and V_mat will be single matrices or single numbers, # not a column in a data frame or an item in a list. if (length(U_mat) == 0 & length(V_mat == 0)) { # Tolerate zero-length arguments by returning a zero-length # a list with the correct name and return type. return(list(numeric()) |> magrittr::setnames(W)) } # Calculate W_mat from the inputs U_mat and V_mat. W_mat <- matsbyname::difference_byname( matsbyname::transpose_byname(V_mat), U_mat) # Return a named list. list(W_mat) |> magrittr::set_names(W) } # The body of the main function consists of a call to matsindf_apply # that specifies the inner function in the FUN argument. matsindf_apply(.DF, FUN = W_func, U_mat = U, V_mat = V) }
This style of writing matsindf_apply()
functions is incredibly versatile,
leveraging the capabilities of both the matsindf
and matsbyname
packages.
(Indeed, the Recca
package
uses matsindf_apply()
heavily and
is built upon the functions in the matsindf
and matsbyname
packages.)
Functions written like calc_W()
can operate in ways similar to matsindf_apply()
itself.
To demonstrate, we'll use calc_W()
in all the ways that matsindf_apply()
can be used,
going in the reverse order to our demonstration of the capabilities of matsindf_apply()
above.
calc_W()
can be used as a specialized mutate
function
that operates on matsindf
data frames.
midf |> calc_W()
The added column could be given a different name from the default ("W
")
using the W
argument.
midf |> calc_W(W = "W_prime")
As with matsindf_apply()
,
column names in midf
can be mapped to the arguments of calc_W()
by the arguments to calc_W()
.
midf |> dplyr::rename(X = U, Y = V) |> calc_W(U = "X", V = "Y")
calc_W()
can operate on lists of single matrices, too.
This approach works, because the default values for the
U
and V
arguments to calc_W()
are
"U" and "V", respectively.
The input list members (in this case midf$U[[1]]
and midf$V[[1]]
)
are returned with the output, because
list(U = midf$U[[1]], V = midf$V[[1]])
is passed to the .dat
argument
of matsindf_apply()
.
calc_W(list(U = midf$U[[1]], V = midf$V[[1]]))
It may be clearer to name the arguments as required by the calc_W()
function
without wrapping in a list first,
as shown below.
But in this approach, the input matrices are not returned with the output,
because arguments U
and V
are passed to the ...
argument of matsindf_apply()
,
not the .dat
argument of matsindf_apply()
.
calc_W(U = midf$U[[1]], V = midf$V[[1]])
calc_W()
can operate on data frames containing single numbers.
data.frame(U = c(1, 2), V = c(3, 4)) |> calc_W()
Finally, calc_W()
can be applied to single numbers,
and the result is 1x1 matrix.
calc_W(U = 2, V = 3)
It is good practice to write internal functions
that tolerate zero-length inputs, as calc_W()
does.
Doing so, enables results from different calculations to be rbind
ed together.
calc_W(U = numeric(), V = numeric()) calc_W(list(U = numeric(), V = numeric())) res <- calc_W(list(U = c(2, 3, 4, 5), V = c(3, 4, 5, 6))) res0 <- calc_W(list(U = numeric(), V = numeric())) dplyr::bind_rows(res, res0)
This vignette demonstrated use of
the versatile matsindf_apply()
function.
Inputs to matsindf_apply()
can be
matsindf_apply()
can be used for programming, and
functions constructed as demonstrated above
share characteristics with matsindf_apply()
:
dplyr::mutate()
operators, andAdd the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.