aggregate_by_new_id: Aggregate expression values based on new identifiers
In szymczak-lab/QCnormSE: Quality Control of Normalized Gene Expression Data

aggregate_by_new_id

R Documentation

Aggregate expression values based on new identifiers

Description

For statistical analysis original gene identifiers (e.g. vendor specific probe set identifiers) often need to be mapped to new gene identifiers (e.g. Ensembl gene identifiers or HGNC gene symbols). This function aggregates expression values of original identifiers that map to the same new gene identifier by e.g. selecting the one with the largest average expression across all samples.

Usage

aggregate_by_new_id(
  se,
  assay = 1,
  col.new = "symbol",
  sep = "///",
  method = "max_median"
)

Arguments

`se`	`RangedSummarizedExperiment-class` object
`assay`	Character or integer. Name or number of assay used for aggregating.
`col.new`	Character or integer. Name or number of column in rowData to be used as new gene identifier.
`sep`	Character. Separator for multiple gene identifiers or names (default: "///" used by GEO).
`method`	Method to use for aggregating: "max_median" (default), "max_mean"

Value

RangedSummarizedExperiment-class object with aggregated expression values

Examples

library(SummarizedExperiment)
data("se.probeset")

## restrict to subset of probesets (for illustration only)
genes = c("DDX3Y", "EIF1AY", "KDM5D", "NLGN4Y",
          "RPS4Y1", "TXLNG2P", "UTY", "XIST")
ind = unlist(sapply(genes, function(g) {
    grep(g, rowData(se.probeset)$Gene.symbol)}))
se.probeset = se.probeset[ind, ]
print(se.probeset)

## aggregate by gene symbol
se.gene = aggregate_by_new_id(se = se.probeset,
                              assay = "exprs.log",
                              col.new = "Gene.symbol",
                              sep = "///")
print(se.gene)

szymczak-lab/QCnormSE documentation built on March 25, 2023, 1:05 p.m.