se_rbind: Combine SummarizedExperiment objects by row

se_rbindR Documentation

Combine SummarizedExperiment objects by row

Description

Combine SummarizedExperiment objects by row, using rbind() logic.

Usage

se_rbind(
  se_list,
  colnames_from = "_(n|p|neg|pos)_",
  colnames_to = "_X_",
  colnames_keep = NULL,
  colData_action = c("identical", "all"),
  colData_sep = ";",
  verbose = FALSE,
  ...
)

Arguments

se_list

list of SummarizedExperiment objects.

colnames_from

character vector of patterns used with gsub() to convert colnames() for each object in se_list to an identifier that will be shared across all objects in se_list.

colnames_to

character vector of replacements used with gsub() alongside each entry in colnames_from to convert colnames() for each object in se_list to an identifier that will be shared across all objects in se_list.

colData_action

character string indicating the action used to combine colData() across se_list:

  • "identical": retain only those columns in colData() which are identical in all se_list objects.

  • "all": retain all columns, but convert columns with mismatched values to store comma-delimited values.

colData_sep

character string used as delimiter when colData_action="all" and when values in a column in colData() differs across objects in se_list. Only values that differ are delimited, to minimize redundancy.

...

additional arguments are ignored.

Details

This function is intended to help the process of calling SummarizedExperiment::rbind().

The process:

  1. Convert colnames() for each entry in se_list using colnames_from and colnames_to. This step is useful when each object in se_list may be using a different set of colnames(). For example "sample_p_12" and "sample_n_12" might be equivalent, so renaming them with colnames_from=c("_[np]_") and colnames_to=c("_X_") would convert both values to "sample_X_12".

  2. Subset each object in se_list using only shared colnames().

  3. Determine how to handle colData() columns that are not identical:

    • colData_action="identical": will only keep columns whose values are identical across all objects in se_list.

    • colData_action="all": will keep columns in colData(), however non-identical columns will be converted to character and values will be comma-delimited.

  4. Perform rbind().

TODO:

  • Write equivalent se_cbind() - it will wait until there is a driving use case.

  • Consider retaining only shared assayNames() across se_list.

  • Consider optionally retaining user-defined assayNames(). (Alternatively, the user can subset the assayNames upfront, though it might be tedious). The recommended pattern in that case:

se <- se_rbind(
   se_list=lapply(se_list, function(se){
      assays(se) <- assays(se)[assay_names];
      return(se)
   })
)

Value

SummarizedExperiment object whose colData() has been processed according to colData_action - either keeping only columns with identical values, or keeping all values delimited as a character string when values differ.

See Also

Other jamses utilities: fold_to_log2fold(), log2fold_to_fold(), mark_stat_hits(), se_collapse_by_column(), se_collapse_by_row(), shrinkDataFrame(), shrink_df(), strsplitOrdered(), sub_split_vector(), update_function_params(), update_list_elements()

Examples

m1 <- matrix(rnorm(100), ncol=10);
colnames(m1) <- paste0("sample_p_", 1:10);
rownames(m1) <- paste0("row_", 1:10);
m2 <- matrix(rnorm(100), ncol=10);
colnames(m2) <- paste0("sample_n_", 1:10);
rownames(m2) <- paste0("row_", 11:20);
sample_id <- gsub("_[np]_", "_X_", colnames(m1));
m1
m2
se1 <- SummarizedExperiment::SummarizedExperiment(
   assays=list(counts=m1),
   rowData=data.frame(measurement=rownames(m1)),
   colData=data.frame(sample=colnames(m1),
      sample_id=sample_id))
se2 <- SummarizedExperiment::SummarizedExperiment(
   assays=list(counts=m2),
   rowData=data.frame(measurement=rownames(m2)),
   colData=data.frame(sample=colnames(m2),
      sample_id=sample_id))
# this step fails because colnames are not shared
# do.call(SummarizedExperiment::rbind, list(se1, se2))

# keep only identical colData columns
se12 <- se_rbind(list(se1, se2))
SummarizedExperiment::colData(se12)

# keep all colData columns
se12all <- se_rbind(list(se1, se2),
   colData_action="all")
SummarizedExperiment::colData(se12all)


jmw86069/jamses documentation built on May 31, 2024, 1:36 p.m.