remove_duplicate_genes: Remove Duplicate Gene Symbols in Gene Expression Data

View source: R/remove_duplicate_genes.R

remove_duplicate_genesR Documentation

Remove Duplicate Gene Symbols in Gene Expression Data

Description

This function addresses duplicate gene symbols in a gene expression dataset by aggregating the expression data for duplicate entries. Users can choose between mean, standard deviation, or sum for aggregation. This is useful for preparing data where duplicates can lead to issues in downstream analyses.

Usage

remove_duplicate_genes(eset, column_of_symbol, method = "mean")

Arguments

eset

A data frame or matrix representing gene expression data, with gene symbols as one of the columns.

column_of_symbol

The name of the column containing gene symbols in 'eset'.

method

The aggregation method to apply for duplicate gene symbols: "mean" for averaging, "sd" for standard deviation, or "sum" for the sum of values. Default is "mean".

Value

A modified version of 'eset' where duplicate gene symbols have been aggregated according to the specified method. The gene symbols are set as row names in the returned data frame or matrix.

Author(s)

Dongqiang Zeng

Examples


# loading eset
data("eset_stad", package = "IOBR")
# annotation
eset_stad <- anno_eset(eset = eset_stad, annotation = anno_rnaseq)
eset_stad <- rownames_to_column(eset_stad, var = "id")

# Creating duplicate gene names
eset_stad[2:3, "id"] <- "MT-CO1"
# Counting the number of identical names
summary(duplicated(eset_stad$id))
# De-duplication of rows with the same gene name using the average value
eset_stad<-remove_duplicate_genes(eset = eset_stad, column_of_symbol = "id", method = "mean")
summary(duplicated(eset_stad$id))

IOBR/IOBR documentation built on April 3, 2025, 2:19 p.m.