aggregate_by_new_id: Aggregate expression values based on new identifiers

View source: R/aggregate_genes.R

aggregate_by_new_idR Documentation

Aggregate expression values based on new identifiers

Description

For statistical analysis original gene identifiers (e.g. vendor specific probe set identifiers) often need to be mapped to new gene identifiers (e.g. Ensembl gene identifiers or HGNC gene symbols). This function aggregates expression values of original identifiers that map to the same new gene identifier by e.g. selecting the one with the largest average expression across all samples.

Usage

aggregate_by_new_id(
  se,
  assay = 1,
  col.new = "symbol",
  sep = "///",
  method = "max_median"
)

Arguments

se

RangedSummarizedExperiment-class object

assay

Character or integer. Name or number of assay used for aggregating.

col.new

Character or integer. Name or number of column in rowData to be used as new gene identifier.

sep

Character. Separator for multiple gene identifiers or names (default: "///" used by GEO).

method

Method to use for aggregating: "max_median" (default), "max_mean"

Value

RangedSummarizedExperiment-class object with aggregated expression values

Examples

library(SummarizedExperiment)
data("se.probeset")

## restrict to subset of probesets (for illustration only)
genes = c("DDX3Y", "EIF1AY", "KDM5D", "NLGN4Y",
          "RPS4Y1", "TXLNG2P", "UTY", "XIST")
ind = unlist(sapply(genes, function(g) {
    grep(g, rowData(se.probeset)$Gene.symbol)}))
se.probeset = se.probeset[ind, ]
print(se.probeset)

## aggregate by gene symbol
se.gene = aggregate_by_new_id(se = se.probeset,
                              assay = "exprs.log",
                              col.new = "Gene.symbol",
                              sep = "///")
print(se.gene)

szymczak-lab/QCnormSE documentation built on March 25, 2023, 1:05 p.m.