join: Add meta info from another 'matrixset' or a 'data.frame'
In matrixset: Creating, Manipulating and Annotating Matrix Ensemble

join	R Documentation

Add meta info from another `matrixset` or a `data.frame`

Description

The operation is done through a join operation between the row meta info data.frame (join_row_info()) of .ms and y (or its row meta info data.frame if it is a matrixset object). The function join_column_info() does the equivalent operation for column meta info.

The default join operation is a left join (type == 'left'), but most of dplyr's joins are available ('left', 'inner', 'right', 'full', 'semi' or 'anti').

The matrixset paradigm of unique row/column names is enforced so if a .ms data.frame row matches multiple ones in y, the default behavior is to issue a condition error.

This can be modified by setting new tag names via the argument names_glue.

Usage

join_row_info(
  .ms,
  y,
  type = "left",
  by = NULL,
  adjust = FALSE,
  names_glue = NULL,
  suffix = c(".x", ".y"),
  na_matches = c("na", "never")
)

join_column_info(
  .ms,
  y,
  type = "left",
  by = NULL,
  adjust = FALSE,
  names_glue = NULL,
  suffix = c(".x", ".y"),
  na_matches = c("na", "never")
)

Arguments

`.ms`	A `matrixset` object
`y`	A `matrixset` object or a `data.frame`.
`type`	Joining type, one of 'left', 'inner', 'right', 'full', 'semi' or 'anti'.
`by`	The names of the variable to join by. The default, `NULL`, results in slightly different behavior depending if `y` is a `matrixset` or a `data.frame`. If a `matrixset`, the meta info tag of each object (the tag is the column that holds the row names/column names in the meta info data frame - typically ".rowname" or ".colname" unless specified otherwise at `matrixset` creation) is used for `by`. If a `data.frame`, a natural join is used. For more details, see `dplyr`'s `dplyr::join()`. Note that the cross-join is not available.
`adjust`	A logical. By default (`FALSE`), the join operation is not permitted to filter or augment the number of rows of the meta info data frame. If `TRUE`, this will be allowed. In the case where the data frame is augmented, the matrices of `.ms` will be augmented accordingly by padding with `NA`s ( except for the `NULL` matrices). Alternatively, `adjust` can be a single string, one of 'pad_x' or 'from_y'. Choosing "pad_x" is equivalent to `TRUE`. When choosing "from_y", padding is done using values from `y`, but only if `y` is a `matrixset` for `y` matrices that are named the same in `x` If padding rows, only columns common between `x` and `y` will use `y` values. The same logic is applied when padding columns. Other values are padded with `NA`.
`names_glue`	a parameter that may allow multiple matches. By default, (`NULL`), no multiple matches are allowed since the resulting tag names will no longer be unique. The value of `names_glue` can be `logical`, with the value `FALSE` being equivalent to `NULL`. If `TRUE`, then the resulting new tag names will be enforced to be unique by adding a number index, i.e. a number index will be glued to the tag names (hence the argument name). Finally, `names_glue` can be a string, where you supply a glue specification that uses the variable names found in `y` (columns for data frames, traits for matrixsets) columns to create a custom new tag name. A special value `.tag` allows you to access the original tag name. Note that currently only the curly brackets () can be used in the glue specification. When making the unique tag names, only the non-unique names are modified. Also, `adjust = TRUE` must be enforced for `names_glue` to work.
`suffix`	Suffixes added to disambiguate trait variables. See `dplyr`'s `dplyr::join()`.
`na_matches`	How to handle missing values when matching. See `dplyr`'s `dplyr::join()`.

Value

A matrixset with updated row or column meta info, with all .ms traits and y traits. If some traits share the same names - and were not included in by - suffixes will be appended to these names.

If adjustment was allowed, the dimensions of the new matrixset may differ from the original one.

Groups

When y is a matrixset, only groups from .ms are used, if any. Group update is the same as in dplyr.

Examples

ms1 <- remove_row_annotation(student_results, class, teacher)
ms <- join_row_info(ms1, student_results)

ms <- join_row_info(ms1, student_results, by = c(".rowname", "previous_year_score"))

# This will throw an error
ms2 <- remove_row_annotation(filter_row(student_results, class %in% c("classA", "classC")),
                             class, teacher, previous_year_score)
ms <- tryCatch(join_row_info(ms2, student_results, type = "full"),
               error = function(e) e)
is(ms, "error") # TRUE
ms$message

# Now it works.
ms <- join_row_info(ms2, student_results, type = "full", adjust = TRUE)
dim(ms2)
dim(ms)
matrix_elm(ms, 1)

# Similarly, this will fail because tag names are no longer unique
meta <- tibble::tibble(sample = c("student 2", "student 2"),
                      msr = c("height", "weight"),
                      value = c(145, 32))
ms <- tryCatch(join_row_info(student_results, meta, by = c(".rowname"="sample")),
               error = function(e) e)
is(ms, "error") # TRUE
ms$message

# This works, by forcing the tag names to be unique. Notice that we suppress
# the warning for now. We'll come back to it.
suppressWarnings(
   join_row_info(student_results, meta, by = c(".rowname"="sample"),
                 adjust = TRUE, names_glue = TRUE)
)
# Here's the warning: we're being told there was a change in tag names
(purrr::quietly(join_row_info)(student_results, meta,
                               by = c(".rowname"="sample"), adjust = TRUE,
                               names_glue = TRUE))$warnings

# You can have better control on how the tag change occurs, for instance by
# appending the msr value to the name
suppressWarnings(
   join_row_info(student_results, meta, by = c(".rowname"="sample"),
                 adjust = TRUE, names_glue = "{.tag}_{msr}")
)
# In this specific example, the {.tag} was superfluous, since the default is
# to append after the tag name
suppressWarnings(
   join_row_info(student_results, meta, by = c(".rowname"="sample"),
                 adjust = TRUE, names_glue = "{msr}")
)
# But the keyword is useful if you want to shuffle order
suppressWarnings(
   join_row_info(student_results, meta, by = c(".rowname"="sample"),
                 adjust = TRUE, names_glue = "{msr}.{.tag}")
)

# You are warned when there is a change in traits
meta <- tibble::tibble(sample = c("student 2", "student 2"),
                       class = c("classA", "classA"),
                       msr = c("height", "weight"),
                       value = c(145, 32))
(purrr::quietly(join_row_info)(student_results, meta,
                               by = c(".rowname"="sample"), adjust = TRUE,
                               names_glue = TRUE))$warnings[2]

# Groups are automatically adjusted
sr_gr <- row_group_by(student_results, class)
gr_orig <- row_group_meta(row_group_by(student_results, class)) |> tidyr::unnest(.rows)
suppressWarnings(
  new_gr <- join_row_info(sr_gr, meta, by = c(".rowname" = "sample", "class"),
                          adjust = TRUE, names_glue = TRUE) |>
   row_group_meta() |> tidyr::unnest(.rows)
)
list(gr_orig, new_gr)

# In the example above, the join operation changed the class of 'class',
# which in turn changed the grouping meta info. You are warned of both.
(purrr::quietly(join_row_info)(sr_gr, meta,
                               by = c(".rowname"="sample", "class"),
                               adjust = TRUE,  names_glue = TRUE))$warnings

# A change in trait name that was used for grouping will result in losing the
# grouping. You are warning of the change in grouping structure.
(purrr::quietly(join_row_info)(sr_gr, meta,
                               by = c(".rowname"="sample"),
                               adjust = TRUE,  names_glue = TRUE))$warnings

matrixset documentation built on April 3, 2025, 6:32 p.m.