A matsindf_apply primer
In matsindf: Matrices in Data Frames

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
library(dplyr)
library(matsbyname)
library(matsindf)
library(tidyr)

Introduction

matsindf_apply() is a powerful and versatile function that enables analysis with lists and data frames by applying FUN in helpful ways. The function is called matsindf_apply(), because it can be used to apply FUN to a matsindf data frame, a data frame that contains matrices as individual entries in a data frame. (A matsindf data frame can be created by calling collapse_to_matrices(), as demonstrated below.)

But matsindf_apply() can apply FUN across much more: data frames of single numbers, lists of matrices, lists of single numbers, and individual numbers. This vignette demonstrates matsindf_apply(), starting with simple examples and proceeding toward sophisticated analyses.

The basics

The basis of all analyses conducted with matsindf_apply() is a function (FUN) to be applied across data supplied in .dat or .... FUN must return a named list of variables as its result. Here is an example function that both adds and subtracts its arguments, a and b, and returns a list containing its result, c and d.

example_fun <- function(a, b){
  return(list(c = matsbyname::sum_byname(a, b), 
              d = matsbyname::difference_byname(a, b)))
}

Similar to lapply() and its siblings, additional argument(s) to matsindf_apply() include the data over which FUN is to be applied. These arguments can, in the first instance, be supplied as named arguments to the ... argument of matsindf_apply(). All arguments in ... must be named. The ... arguments to matsindf_apply() are passed to FUN according to their names. In this case, the output of matsindf_apply() is the the named list returned by FUN.

matsindf_apply(FUN = example_fun, a = 2, b = 1)

Passing an additional argument (z = 2) causes an unused argument error, because example_fun does not have a z argument.

tryCatch(
  matsindf_apply(FUN = example_fun, a = 2, b = 1, z = 2),
  error = function(e){e}
)

Failing to pass a needed argument (b) causes an error that indicates the missing argument.

tryCatch(
  matsindf_apply(FUN = example_fun, a = 2),
  error = function(e){e}
)

Alternatively, arguments to FUN can be given in a named list to .dat, the first argument of matsindf_apply(). When a value is assigned to .dat, the return value from matsindf_apply() contains all named variables in .dat (in this case both a and b) in addition to the results provided by FUN (in this case both c and d).

matsindf_apply(list(a = 2, b = 1), FUN = example_fun)

Extra variables are tolerated in .dat, because .dat is considered to be a store of data from which variables can be drawn as needed.

matsindf_apply(list(a = 2, b = 1, z = 42), FUN = example_fun)

In contrast, arguments to ... are named explicitly by the user, so including an extra argument in ... is considered an error, as shown above.

Some details

If a named argument is supplied by both .dat and ..., the argument in ... takes precedence, overriding the argument in .dat.

matsindf_apply(list(a = 2, b = 1), FUN = example_fun, a = 10)

When supplying both .dat and ..., ... can contain named strings of length 1 which are interpreted as mappings from named items in .dat to arguments in the signature of FUN. In the example below, a = "z" indicates that argument a to FUN should be supplied by item z in .dat.

matsindf_apply(list(a = 2, b = 1, z = 42),
               FUN = example_fun, a = "z")

If a named argument appears in both .dat and the output of FUN, a name collision occurs in the output of matsindf_apply(), and a warning is issued.

tryCatch(
  matsindf_apply(list(a = 2, b = 1, c = 42), FUN = example_fun),
  warning = function(w){w}
)

FUN can accept more than just numerics. example_fun_with_string() accepts a character string and a numeric. However, because ... argument that is a character string of length 1 has special meaning (namely mapping variables in .dat to arguments of FUN), passing a character string of length 1 can cause an error. To get around the problem, wrap the single string in a list, as shown below.

example_fun_with_string <- function(str_a, b) {
  a <- as.numeric(str_a)
  list(added = matsbyname::sum_byname(a, b), subtracted = matsbyname::difference_byname(a, b))
}

# Causes an error
tryCatch(
  matsindf_apply(FUN = example_fun_with_string, str_a = "1", b = 2),
  error = function(e){e}
)
# To solve the problem, wrap "1" in list().
matsindf_apply(FUN = example_fun_with_string, str_a = list("1"), b = 2)
matsindf_apply(FUN = example_fun_with_string, str_a = list("1"), b = list(2))
matsindf_apply(FUN = example_fun_with_string, 
               str_a = list("1", "3"), 
               b = list(2, 4))
matsindf_apply(.dat = list(str_a = list("1"), b = list(2)), FUN = example_fun_with_string)
matsindf_apply(.dat = list(m = list("1"), n = list(2)), FUN = example_fun_with_string, 
               str_a = "m", b = "n")

`matsindf_apply()` and data frames

.dat can also contain a data frame (or tibble), both of which are fancy lists. When .dat is a data frame or tibble, the output of matsindf_apply() is a tibble, and FUN acts like a specialized dplyr::mutate(), adding new columns at the right of .dat.

matsindf_apply(.dat = data.frame(str_a = c("1", "3"), b = c(2, 4)), 
               FUN = example_fun_with_string)
matsindf_apply(.dat = data.frame(str_a = c("1", "3"), b = c(2, 4)), 
               FUN = example_fun_with_string, 
               str_a = "str_a", b = "b")
matsindf_apply(.dat = data.frame(m = c("1", "3"), n = c(2, 4)), 
               FUN = example_fun_with_string, 
               str_a = "m", b = "n")

Additional niceties are available when .dat is a data frame or a tibble. matsindf_apply() works when the data frame is filled with single numeric values, as is typical.

df <- data.frame(a = 2:4, b = 1:3)
matsindf_apply(df, FUN = example_fun)

But matsindf_apply() also works with matsindf data frames, data frames in which each cell of the data frame is filled with a single matrix. To demonstrate use of matsindf_apply() with a matsindf data frame, we'll construct a simple matsindf data frame (midf) using functions in this package.

# Create a tidy data frame containing data for matrices
tidy <- tibble::tibble(Year = rep(c(rep(2017, 4), rep(2018, 4)), 2),
                       matnames = c(rep("U", 8), rep("V", 8)),
                       matvals = c(1:4, 11:14, 21:24, 31:34),
                       rownames = c(rep(c(rep("p1", 2), rep("p2", 2)), 2), 
                                    rep(c(rep("i1", 2), rep("i2", 2)), 2)),
                       colnames = c(rep(c("i1", "i2"), 4), 
                                    rep(c("p1", "p2"), 4))) |>
  dplyr::mutate(
    rowtypes = case_when(
      matnames == "U" ~ "Product",
      matnames == "V" ~ "Industry", 
      TRUE ~ NA_character_
    ),
    coltypes = case_when(
      matnames == "U" ~ "Industry",
      matnames == "V" ~ "Product",
      TRUE ~ NA_character_
    )
  )

tidy

# Convert to a matsindf data frame
midf <- tidy |>  
  dplyr::group_by(Year, matnames) |> 
  collapse_to_matrices(rowtypes = "rowtypes", coltypes = "coltypes") |> 
  tidyr::pivot_wider(names_from = "matnames", values_from = "matvals")

# Take a look at the midf data frame and some of the matrices it contains.
midf
midf$U[[1]]
midf$V[[1]]

With midf in hand, we can demonstrate use of tidyverse-style functional programming to perform matrix algebra within a data frame. The functions of the matsbyname package (such as difference_byname() below) can be used for this purpose.

result <- midf |> 
  dplyr::mutate(
    W = difference_byname(transpose_byname(V), U)
  )
result
result$W[[1]]
result$W[[2]]

This way of performing matrix calculations works equally well within a 2-row matsindf data frame (as shown above) or within a 1000-row matsindf data frame.

Programming with `matsindf_apply()`

Users can write their own functions using matsindf_apply(). A flexible calc_W() function can be written as follows.

calc_W <- function(.DF = NULL, U = "U", V = "V", W = "W") {
  # The inner function does all the work.
  W_func <- function(U_mat, V_mat){
    # When we get here, U_mat and V_mat will be single matrices or single numbers, 
    # not a column in a data frame or an item in a list.
    if (length(U_mat) == 0 & length(V_mat == 0)) {
      # Tolerate zero-length arguments by returning a zero-length
      # a list with the correct name and return type.
      return(list(numeric()) |> magrittr::setnames(W))
    }
    # Calculate W_mat from the inputs U_mat and V_mat.
    W_mat <- matsbyname::difference_byname(
      matsbyname::transpose_byname(V_mat), 
      U_mat)
    # Return a named list.
    list(W_mat) |> magrittr::set_names(W)
  }
  # The body of the main function consists of a call to matsindf_apply
  # that specifies the inner function in the FUN argument.
  matsindf_apply(.DF, FUN = W_func, U_mat = U, V_mat = V)
}

This style of writing matsindf_apply() functions is incredibly versatile, leveraging the capabilities of both the matsindf and matsbyname packages. (Indeed, the Recca package uses matsindf_apply() heavily and is built upon the functions in the matsindf and matsbyname packages.)

Functions written like calc_W() can operate in ways similar to matsindf_apply() itself. To demonstrate, we'll use calc_W() in all the ways that matsindf_apply() can be used, going in the reverse order to our demonstration of the capabilities of matsindf_apply() above.

calc_W() can be used as a specialized mutate function that operates on matsindf data frames.

midf |> calc_W()

The added column could be given a different name from the default ("W") using the W argument.

midf |> calc_W(W = "W_prime")

As with matsindf_apply(), column names in midf can be mapped to the arguments of calc_W() by the arguments to calc_W().

midf |> 
  dplyr::rename(X = U, Y = V) |> 
  calc_W(U = "X", V = "Y")

calc_W() can operate on lists of single matrices, too. This approach works, because the default values for the U and V arguments to calc_W() are "U" and "V", respectively. The input list members (in this case midf$U[[1]] and midf$V[[1]]) are returned with the output, because list(U = midf$U[[1]], V = midf$V[[1]]) is passed to the .dat argument of matsindf_apply().

calc_W(list(U = midf$U[[1]], V = midf$V[[1]]))

It may be clearer to name the arguments as required by the calc_W() function without wrapping in a list first, as shown below. But in this approach, the input matrices are not returned with the output, because arguments U and V are passed to the ... argument of matsindf_apply(), not the .dat argument of matsindf_apply().

calc_W(U = midf$U[[1]], V = midf$V[[1]])

calc_W() can operate on data frames containing single numbers.

data.frame(U = c(1, 2), V = c(3, 4)) |> calc_W()

Finally, calc_W() can be applied to single numbers, and the result is 1x1 matrix.

calc_W(U = 2, V = 3)

It is good practice to write internal functions that tolerate zero-length inputs, as calc_W() does. Doing so, enables results from different calculations to be rbinded together.

calc_W(U = numeric(), V = numeric())
calc_W(list(U = numeric(), V = numeric()))

res <- calc_W(list(U = c(2, 3, 4, 5), V = c(3, 4, 5, 6)))
res0 <- calc_W(list(U = numeric(), V = numeric()))
dplyr::bind_rows(res, res0)

Conclusion

This vignette demonstrated use of the versatile matsindf_apply() function. Inputs to matsindf_apply() can be

single numbers,
matrices, or
data frames with appropriately-named columns.

matsindf_apply() can be used for programming, and functions constructed as demonstrated above share characteristics with matsindf_apply():

they can be used as specialized dplyr::mutate() operators, and
they can be applied to single numbers, matrices, or data frames with appropriately-named columns.