merge_proteomics_se: merge proteomics SE objects

merge_proteomics_seR Documentation

merge proteomics SE objects

Description

merge proteomics SE objects

Usage

merge_proteomics_se(
  SE1,
  SE2,
  rowname1 = "SYMBOL",
  rowname2 = "SYMBOL",
  rowData_colnames_intersect = TRUE,
  colData_colnames_intersect = TRUE,
  rowData_colnames_unique = c("percentCoverage", "numPepsUnique", "scoreUnique"),
  assay_names = NULL,
  se_names = c("A", "B"),
  startN = 2,
  verbose = TRUE,
  ...
)

Arguments

SE1, SE2

SummarizedExperiment objects to be merged into one output object.

rowname1, rowname2

character string that describes which SummarizedExperiment::rowData() annotation to use to create appropriate rownames to be merged. This approach is useful when merging data based upon gene symbol, instead of a protein accession or peptide sequence. The intent is to allow "equivalent" rows to be combined across SE1 and SE2, while non-equivalent rows unique to SE1 or SE2 are represented on their own row.

The default values assume each proteomics SE object contains a rowData column "SYMBOL" with the official gene symbol represented on each row. This column is appropriate if proteomics data already represents abundance measurements which were already aggregated to the protein-level (i.e. gene locus level). The data will therefore be merged based upon the gene symbol. In the event that multiple rows represent the same gene symbol, they will be renamed using jamba::makeNames(..., renameFirst=FALSE) so that the entries will be merged in order they appear in each dataset.

However, if the input data contains peptide-level measurements, the appropriate column should contain the peptide sequence, so that the data is merged based upon equivalent peptide sequences.

If rowname1 or rowname2 contain multiple values, and/or are not equal to each other, a new column "merge_key" is created in both SE1 and SE2, and populated with relevant values. When multiple columns are indicated, they are concatenated using jamba::pasteByRow() to fill the column "merge_key". Then both rowname1 and rowname2 are redefined to "merge_key". Note that any pre-existing "merge_key" column will be overwritten.

A combination of "rownames" and colnames(rowData()) can be used.

The argument value should contain one value from either:

  1. colnames(rowData()) for the relevant object SE1 or SE2, representing a row annotation to use as the merge key. Note that any empty values (NA or blank string "") will be replaced by existing rownames().

  2. "rownames" to indicate that existing rownames() of the relevant object SE1 or SE2 should be used as the merge key. Note that if a column "rownames" already exists in rowData() it will be used as-is.

rowData_colnames_intersect, colData_colnames_intersect

logical indicating whether to retain only the intersection of colnames(rowData()) and colnames(colData()) in the output rowData and colData, respectively.

  • TRUE: only the intersection is retained in the output data, default.

  • FALSE: not yet implemented.

rowData_colnames_unique

character vector with optional colnames(rowData()) which should be retained in a uniquely-named output column, to keep its values distinct between SE1 and SE2. This argument is useful for something like "score" where independent datasets are expected to have unique values, and which may be important to compare. Note that columns not already being retained will be ignored.

assay_names

character vector with one or more specific assay names to retain in the output data. By default, all assay names are retained.

se_names

character vector length=2 to define the output labels used to indicate which rows and columns were present in SE1 and SE2.

startN

integer number passed to jamba::makeNames() to define the suffix number for the first versioned output. Note that renameFirst=FALSE so the first occurrence of a character string will not be renamed. When startN=2, subsequent repeated entries will have suffix "_v2", then "_v3" and so on.

...

additional arguments are passed to jamba::makeNames().

Details

See notes for specific arguments for a description of how data is merged relative to rows and rowData(), columns and colData().

The general strategy is to merge equivalent rows to integrate rows across SE1 and SE2, but to force columns (sample measurements) to be unique across SE1 and SE2.

This process is somewhat similar to calling cbind(), in that the sample columns are extended. However, the rows are merged where possible.

No assay measurement values are lost during this process.

See Also

Other jam utility functions: cardinality(), color_complement(), convert_PD_df_to_SE(), convert_imputed_assays_to_na(), curate_se_colData(), curate_to_df_by_pattern(), design2layout(), get_numeric_transform(), handle_df_args(), nmat_summary(), nmatlist_summary(), rmd_tab_iterator(), rowNormScale(), summit_from_vector()


jmw86069/platjam documentation built on Sept. 26, 2024, 3:31 p.m.