View source: R/remove_duplicate_genes.R
remove_duplicate_genes | R Documentation |
This function addresses duplicate gene symbols in a gene expression dataset by aggregating the expression data for duplicate entries. Users can choose between mean, standard deviation, or sum for aggregation. This is useful for preparing data where duplicates can lead to issues in downstream analyses.
remove_duplicate_genes(eset, column_of_symbol, method = "mean")
eset |
A data frame or matrix representing gene expression data, with gene symbols as one of the columns. |
column_of_symbol |
The name of the column containing gene symbols in 'eset'. |
method |
The aggregation method to apply for duplicate gene symbols: "mean" for averaging, "sd" for standard deviation, or "sum" for the sum of values. Default is "mean". |
A modified version of 'eset' where duplicate gene symbols have been aggregated according to the specified method. The gene symbols are set as row names in the returned data frame or matrix.
Dongqiang Zeng
# loading eset
data("eset_stad", package = "IOBR")
# annotation
eset_stad <- anno_eset(eset = eset_stad, annotation = anno_rnaseq)
eset_stad <- rownames_to_column(eset_stad, var = "id")
# Creating duplicate gene names
eset_stad[2:3, "id"] <- "MT-CO1"
# Counting the number of identical names
summary(duplicated(eset_stad$id))
# De-duplication of rows with the same gene name using the average value
eset_stad<-remove_duplicate_genes(eset = eset_stad, column_of_symbol = "id", method = "mean")
summary(duplicated(eset_stad$id))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.