anno_eset: Annotate Gene Expression Matrix and Remove Duplicated Genes

View source: R/anno_eset.R

anno_esetR Documentation

Annotate Gene Expression Matrix and Remove Duplicated Genes

Description

Annotates an expression matrix with gene symbols using provided annotation data, filters out missing or invalid symbols, handles duplicate gene entries, and removes uninformative rows. The function supports multiple aggregation methods for resolving duplicate gene symbols.

Usage

anno_eset(
  eset,
  annotation,
  symbol = "symbol",
  probe = "probe_id",
  method = "mean"
)

Arguments

eset

Expression matrix or ExpressionSet object containing gene expression data.

annotation

Data frame containing annotation information for probes. Built-in options include 'anno_hug133plus2', 'anno_rnaseq', and 'anno_illumina'.

symbol

Character string specifying the column name in 'annotation' that represents gene symbols. Default is '"symbol"'.

probe

Character string specifying the column name in 'annotation' that represents probe identifiers. Default is '"probe_id"'.

method

Character string specifying the aggregation method for duplicate gene symbols. Options are '"mean"', '"sum"', or '"sd"'. Default is '"mean"'.

Details

The function performs the following operations:

  1. Filters probes with missing symbols or labeled as '"NA_NA"'

  2. Matches probes between expression set and annotation data

  3. Merges annotation with expression data

  4. Handles duplicate gene symbols using specified aggregation method

  5. Removes rows with all zeros, all NAs, or missing values in the first column

Value

Annotated and cleaned gene expression matrix with gene symbols as row names.

Author(s)

Dongqiang Zeng

Examples

# Create a small example expression matrix
eset_mat <- matrix(runif(100), nrow = 10, ncol = 10)
rownames(eset_mat) <- paste0("Probe", 1:10)
colnames(eset_mat) <- paste0("Sample", 1:10)

# Create a matching annotation data frame
anno_df <- data.frame(
  probe_id = paste0("Probe", 1:10),
  symbol = c("Gene1", "Gene1", "Gene2", "Gene3", "Gene4",
             "Gene5", "Gene6", "Gene7", "Gene8", "Gene9")
)

# Annotate
result <- anno_eset(eset = eset_mat, annotation = anno_df)
head(result)

IOBR documentation built on May 30, 2026, 5:07 p.m.