R/genes_freq.R

Defines functions genes_freq

Documented in genes_freq

#' @title Extract numbers of genes containing single or multiple transcripts
#'
#' @description This function takes a dataframe of annotations provided by Gencode as input. It calculates the number of different transcripts of a gene. The output is a dataframe containing the number of genes having 1,2,3... or several isoforms and their percentage.
#' @usage genes_freq(x)
#' @param x The name of the downloaded  gtf file from GENCODE website
#' @export
#' @keywords
#' @seealso
#' @return A dataframe of number of genes and their percentage
#' examples \dontrun {
#' # You don't have to run this
#' load_gtf("gencode.v27.lncRNAs.gtf")
#' genes_freq(gencode.v27.lncRNAs.gtf)
#’}
genes_freq<- function(x) {
  bb <- subset(x, x$type=="transcript")
  cc <- subset(bb, select = c("gene_id", "transcript_id"))
  dd <- as.data.frame(table(cc$gene_id))
  ee <- as.data.frame(table(dd$Freq))
  colnames(ee) <- c("num_of_transcripts", "genes_freq")
  ee$percentage <- round(ee$genes_freq / sum(ee$genes_freq) * 100, digits = 3)
  assign(deparse(substitute(genes_freq_df)), ee, envir = .GlobalEnv)
}
monahton/GencodeInterrogator documentation built on Dec. 24, 2019, 1:31 p.m.