R/tcgaConvRownames.R

Defines functions tcgaConvRownames

Documented in tcgaConvRownames

#' @title Convert the rownames of a data matrix from 'ENSEMBL IDs' to 'HUGO Gene Symbols'. 
#'
#' @description \code{tcgaConvRownames} converts the rownames of a data matrix (gene identifiers) from 'ENSEMBL IDs' to 'HUGO Gene Symbols', and then summarize the expression of duplicated genes by taking the average.
#'
#' @param data A data matrix, with rows referring to genes and columns to samples. Can be the output from \code{\link[mirNet]{tcgaTableGenerator}}.
#'
#' @return A data matrix, with gene identifiers converted from 'ENSEMBL IDs' to 'HUGO Gene Symbols'.
#'
#' @seealso \code{\link[mirNet]{tcgaTableGenerator}} for generating a gene expression data matrix from single FPKM files downloaded from GDC Data Portal.
#'
#' @import org.Hs.eg.db
#' @importFrom magrittr "%>%"
#' @importFrom dplyr group_by
#' @importFrom dplyr summarise_all
#' @importFrom AnnotationDbi mapIds
#' 
#' @export tcgaConvRownames
#'
#' @examples
#' tcgaConvRownames(data)



tcgaConvRownames <- function(data){

    symbols <- mapIds(org.Hs.eg.db, keys = sapply(strsplit(rownames(data), "\\."), '[', 1), keytype = 'ENSEMBL', column = 'SYMBOL')
    id <- which(!is.na(symbols))
    data2 <- data[id, ]
    rownames(data2) <- symbols[id]

    data3 <- as.data.frame(data2) %>% group_by(rownames(data2)) %>% summarise_all(mean) %>% as.data.frame(stringsAsFactors = FALSE)
    rownames(data3) <- data3[, 1]
    data3[, -1]
}
YC3/mirNet documentation built on Sept. 3, 2020, 3:25 a.m.