join: Add meta info from another 'matrixset' or a 'data.frame'

joinR Documentation

Add meta info from another matrixset or a data.frame

Description

The operation is done through a join operation between the row meta info data.frame (join_row_info()) of .ms and y (or its row meta info data.frame if it is a matrixset object). The function join_column_info() does the equivalent operation for column meta info.

The default join operation is a left join (type == 'left'), but most of dplyr's joins are available ('left', 'inner', 'right', 'full', 'semi' or 'anti').

The matrixset paradigm of unique row/column names is enforced so if a .ms data.frame row matches multiple ones in y, the default behavior is to issue a condition error.

This can be modified by setting new tag names via the argument names_glue.

Usage

join_row_info(
  .ms,
  y,
  type = "left",
  by = NULL,
  adjust = FALSE,
  names_glue = NULL,
  suffix = c(".x", ".y"),
  na_matches = c("na", "never")
)

join_column_info(
  .ms,
  y,
  type = "left",
  by = NULL,
  adjust = FALSE,
  names_glue = NULL,
  suffix = c(".x", ".y"),
  na_matches = c("na", "never")
)

Arguments

.ms

A matrixset object

y

A matrixset object or a data.frame.

type

Joining type, one of 'left', 'inner', 'right', 'full', 'semi' or 'anti'.

by

The names of the variable to join by. The default, NULL, results in slightly different behavior depending if y is a matrixset or a data.frame. If a matrixset, the meta info tag of each object (the tag is the column that holds the row names/column names in the meta info data frame - typically ".rowname" or ".colname" unless specified otherwise at matrixset creation) is used for by. If a data.frame, a natural join is used. For more details, see dplyr's dplyr::join(). Note that the cross-join is not available.

adjust

A logical. By default (FALSE), the join operation is not permitted to filter or augment the number of rows of the meta info data frame. If TRUE, this will be allowed. In the case where the data frame is augmented, the matrices of .ms will be augmented accordingly by padding with NAs ( except for the NULL matrices).

Alternatively, adjust can be a single string, one of 'pad_x' or 'from_y'. Choosing "pad_x" is equivalent to TRUE. When choosing "from_y", padding is done using values from y, but only

  1. if y is a matrixset

  2. for y matrices that are named the same in x

  3. If padding rows, only columns common between x and y will use y values. The same logic is applied when padding columns.

Other values are padded with NA.

names_glue

a parameter that may allow multiple matches. By default, (NULL), no multiple matches are allowed since the resulting tag names will no longer be unique.

The value of names_glue can be logical, with the value FALSE being equivalent to NULL. If TRUE, then the resulting new tag names will be enforced to be unique by adding a number index, i.e. a number index will be glued to the tag names (hence the argument name).

Finally, names_glue can be a string, where you supply a glue specification that uses the variable names found in y (columns for data frames, traits for matrixsets) columns to create a custom new tag name. A special value .tag allows you to access the original tag name. Note that currently only the curly brackets () can be used in the glue specification.

When making the unique tag names, only the non-unique names are modified. Also, adjust = TRUE must be enforced for names_glue to work.

suffix

Suffixes added to disambiguate trait variables. See dplyr's dplyr::join().

na_matches

How to handle missing values when matching. See dplyr's dplyr::join().

Value

A matrixset with updated row or column meta info, with all .ms traits and y traits. If some traits share the same names - and were not included in by - suffixes will be appended to these names.

If adjustment was allowed, the dimensions of the new matrixset may differ from the original one.

Groups

When y is a matrixset, only groups from .ms are used, if any. Group update is the same as in dplyr.

Examples

ms1 <- remove_row_annotation(student_results, class, teacher)
ms <- join_row_info(ms1, student_results)

ms <- join_row_info(ms1, student_results, by = c(".rowname", "previous_year_score"))

# This will throw an error
ms2 <- remove_row_annotation(filter_row(student_results, class %in% c("classA", "classC")),
                             class, teacher, previous_year_score)
ms <- tryCatch(join_row_info(ms2, student_results, type = "full"),
               error = function(e) e)
is(ms, "error") # TRUE
ms$message

# Now it works.
ms <- join_row_info(ms2, student_results, type = "full", adjust = TRUE)
dim(ms2)
dim(ms)
matrix_elm(ms, 1)

# Similarly, this will fail because tag names are no longer unique
meta <- tibble::tibble(sample = c("student 2", "student 2"),
                      msr = c("height", "weight"),
                      value = c(145, 32))
ms <- tryCatch(join_row_info(student_results, meta, by = c(".rowname"="sample")),
               error = function(e) e)
is(ms, "error") # TRUE
ms$message

# This works, by forcing the tag names to be unique. Notice that we suppress
# the warning for now. We'll come back to it.
suppressWarnings(
   join_row_info(student_results, meta, by = c(".rowname"="sample"),
                 adjust = TRUE, names_glue = TRUE)
)
# Here's the warning: we're being told there was a change in tag names
(purrr::quietly(join_row_info)(student_results, meta,
                               by = c(".rowname"="sample"), adjust = TRUE,
                               names_glue = TRUE))$warnings

# You can have better control on how the tag change occurs, for instance by
# appending the msr value to the name
suppressWarnings(
   join_row_info(student_results, meta, by = c(".rowname"="sample"),
                 adjust = TRUE, names_glue = "{.tag}_{msr}")
)
# In this specific example, the {.tag} was superfluous, since the default is
# to append after the tag name
suppressWarnings(
   join_row_info(student_results, meta, by = c(".rowname"="sample"),
                 adjust = TRUE, names_glue = "{msr}")
)
# But the keyword is useful if you want to shuffle order
suppressWarnings(
   join_row_info(student_results, meta, by = c(".rowname"="sample"),
                 adjust = TRUE, names_glue = "{msr}.{.tag}")
)

# You are warned when there is a change in traits
meta <- tibble::tibble(sample = c("student 2", "student 2"),
                       class = c("classA", "classA"),
                       msr = c("height", "weight"),
                       value = c(145, 32))
(purrr::quietly(join_row_info)(student_results, meta,
                               by = c(".rowname"="sample"), adjust = TRUE,
                               names_glue = TRUE))$warnings[2]

# Groups are automatically adjusted
sr_gr <- row_group_by(student_results, class)
gr_orig <- row_group_meta(row_group_by(student_results, class)) |> tidyr::unnest(.rows)
suppressWarnings(
  new_gr <- join_row_info(sr_gr, meta, by = c(".rowname" = "sample", "class"),
                          adjust = TRUE, names_glue = TRUE) |>
   row_group_meta() |> tidyr::unnest(.rows)
)
list(gr_orig, new_gr)

# In the example above, the join operation changed the class of 'class',
# which in turn changed the grouping meta info. You are warned of both.
(purrr::quietly(join_row_info)(sr_gr, meta,
                               by = c(".rowname"="sample", "class"),
                               adjust = TRUE,  names_glue = TRUE))$warnings

# A change in trait name that was used for grouping will result in losing the
# grouping. You are warning of the change in grouping structure.
(purrr::quietly(join_row_info)(sr_gr, meta,
                               by = c(".rowname"="sample"),
                               adjust = TRUE,  names_glue = TRUE))$warnings


matrixset documentation built on April 3, 2025, 6:32 p.m.