knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) library(matrixset)
library(tidyverse) animals <- as.matrix(MASS::Animals) log_animals <- log(animals) animal_info <- MASS::Animals %>% rownames_to_column("Animal") %>% mutate(is_extinct = case_when(Animal %in% c("Dipliodocus", "Triceratops", "Brachiosaurus") ~ TRUE, TRUE ~ FALSE), class = case_when(Animal %in% c("Mountain beaver", "Guinea pig", "Golden hamster", "Mouse", "Rabbit", "Rat") ~ "Rodent", Animal %in% c("Potar monkey", "Gorilla", "Human", "Rhesus monkey", "Chimpanzee") ~ "Primate", Animal %in% c("Cow", "Goat", "Giraffe", "Sheep") ~ "Ruminant", Animal %in% c("Asian elephant", "African elephant") ~ "Elephantidae", Animal %in% c("Grey wolf") ~ "Canine", Animal %in% c("Cat", "Jaguar") ~ "Feline", Animal %in% c("Donkey", "Horse") ~ "Equidae", Animal == "Pig" ~ "Sus", Animal == "Mole" ~ "Talpidae", Animal == "Kangaroo" ~ "Macropodidae", TRUE ~ "Dinosaurs")) %>% select(-body, -brain) animals_ms <- matrixset(msr = animals, log_msr = log_animals, row_info = animal_info, row_key = "Animal") animals_ms <- animals_ms %>% annotate_column(unit = case_when(.colname == "body" ~ "kg", TRUE ~ "g"))
matrixset
MatricesThere are two ways to apply functions to the matrices of a matrixset
object.
The first one is through the apply_*
family, which will be covered here.
The second is through mutate_matrix()
, covered in the next section.
There are 3 functions in the apply_*
family:
apply_matrix()
: The functions must take a matrix as input. In base R, this
is similar to simply calling fun(matrix_object)
.apply_row()
: The functions must take a vector as input. The vector will be
a matrix row. In base R, this is akin to
apply(matrix_object, 1, fun, simplify = FALSE)
.apply_column()
: The functions must take a vector as input. The vector will
be a matrix column. In base R, this is similar to
apply(matrix_object, 2, fun, simplify = FALSE)
.Each of these function will loop on the matrixset
object's matrices to apply
the functions. In the case of apply_row()
and apply_column()
, an additional
loop on the margin (row or column, as applicable) is executed, so that the
functions are applied to each matrix and margin.
To see the functions in action, we will use the following object:
animals_ms
We will use the following custom printing functions for compactness purposes.
show_matrix <- function(x) { if (nrow(x) > 4) { newx <- head(x, 4) storage.mode(newx) <- "character" newx <- rbind(newx, rep("...", ncol(x))) } else newx <- x newx } show_vector <- function(x) { newx <- if (length(x) > 4) { c(as.character(x[1:4]), "...") } else x newx } show_lst <- function(x) { lapply(x, function(u) { if (is.matrix(u)) show_matrix(u) else if (is.vector(u)) show_vector(u) else u }) }
So now, let's see the apply_matrix()
in action.
library(magrittr) library(purrr) out <- animals_ms %>% apply_matrix(exp, ~ mean(.m, trim=.1), foo=asinh, pow = ~ 2^.m, reg = ~ { is_alive <- !is_extinct lm(.m ~ is_alive + class) }) # out[[1]] %>% map(~ if (is.matrix(.x)) {head(.x, 5)} else .x) show_lst(out[[1]])
We have showcased several features of the apply_*
functions:
You probably have noticed the use of .m
. This is a pronoun that is accessible
inside apply_matrix()
and refers to the current matrix in the internal loop.
Similar pronouns exists for apply_row()
and apply_column()
, and they are
respecticely .i
and .j
.
The returned object is a list of lists. The first layer is for each matrix and the second layer is for each function call.
Let's now showcase the row/column version with a apply_column()
example:
out <- animals_ms %>% apply_column(exp, ~ mean(.j, trim=.1), foo=asinh, pow = ~ 2^.j, reg = ~ { is_alive <- !is_extinct lm(.j ~ is_alive + class) }) out[[1]] %>% map(show_lst)
The idea is similar, but in the returned object, there is a third list layer:
the first layer for the matrices, the second layer for the columns (it would
be rows for apply_row()
) and the third layer for the functions.
Note as well the use of the .j
pronoun instead of .m
.
The apply_*
functions understand data grouping and will execute on the proper
matrix/vector subsets.
animals_ms %>% row_group_by(class) %>% apply_matrix(exp, ~ mean(.m, trim=.1), foo=asinh, pow = ~ 2^.m, reg = ~ { is_alive <- !is_extinct lm(.m ~ is_alive) })
As one can see, the output format differs in situation of grouping. We still end
up with a list with an element for each matrix, but each of these element is now
a tibble
.
Each tibble has a column called .vals
, where the function results are stored.
This column is a list, one element per group. The group labels are given by the
other columns of the tibble. For a given group, things are like the ungrouped
version: further sub-lists for rows/columns - if applicable - and function
values.
Similar to the apply()
function that has a simplify
argument, the output
structured can be simplified, baring two conditions:
is.vector
returns TRUE
.If the conditions are met, each apply_*
function has two simplified version
available: _dfl
and dfw
.
Below is the _dfl
flavor in action. We point out two things to notice:
apply_column_dfl
(and _dfw
), a .column
column stores the column ID
(.row
for apply_row_*
).lm
result in a list
so that the outcome is vector.animals_ms %>% apply_matrix_dfl(~ mean(.m, trim=.1), MAD=mad, reg = ~ { is_alive <- !is_extinct list(lm(.m ~ is_alive + class)) })
animals_ms %>% apply_column_dfl(~ mean(.j, trim=.1), MAD=mad, reg = ~ { is_alive <- !is_extinct list(lm(.j ~ is_alive + class)) })
If using apply_column_dfw
in this context, you wouldn't notice a difference in
output format.
The difference between the two lies when the vectors are of length > 1.
animals_ms %>% apply_row_dfl(rg = ~ range(.i), qt = ~ quantile(.i, probs = c(.25, .75)))
animals_ms %>% apply_row_dfw(rg = ~ range(.i), qt = ~ quantile(.i, probs = c(.25, .75)))
We can observe three things:
It may happen that you need to get information about the current group. For this reason, the following context functions are made available:
current_n_row()
and current_n_column()
. They each give the number of rows
and columns, respectively, of the current matrix.
They are the context equivalent of nrow()
and ncol()
.
current_row_info()
and current_column_info()
. They give access to the
current row/column annotation data frame. The are the context equivlent
of row_info()
and column_info()
.
row_pos()
and column_pos()
. They give the current row/column indices.
The indices are the the ones before matrix subsetting.
row_rel_pos()
and column_rel_pos()
. They give the row/column indices
relative to the current matrix. They are equivalent to
seq_len(current_n_row())
/seq_len(current_n_column())
.
For instance, a simple way of knowing the number of animals per group could be
animals_ms %>% row_group_by(class) %>% apply_matrix_dfl(n = ~ current_n_row()) %>% .$msr
The context functions can also be of use when one or more traits are shared (in name) between rows and columns.
Here's a pseudo-code example:
# ms_object %>% # apply_matrix( ~ { # ctrt <- current_column_info()$common_trait # rtrt <- current_row_info()$common_trait # # do something with ctrt and rtrt # })
It may happen that a variable in the calling environment shares its name with
a trait of a matrixset
object.
You can make it explicit which version of the variable you are using the pronouns
.data
(the trait annotation version) and .env
.
reg_expr <- expr({ is_alive <- !is_extinct list(lm(.j ~ is_alive + class)) }) animals_ms %>% apply_column_dfl(~ mean(.j, trim=.1), MAD=mad, reg = ~ !!reg_expr)
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.