parse_GTF_info: Process GTF file.

Description Usage Arguments Value Examples

View source: R/process_files.r

Description

parses the info column of a GTF file to extract the gene name and length.

Usage

1
2
parse_GTF_info(gtf, gene_names = TRUE, gene_length = TRUE,
  remove_duplicates = TRUE)

Arguments

gtf

gtf file that has been imported using import_GTF. The gtf file must include a column named info that contains data in the following format: a character vector in the format "gene_id "ENSMUSG00000102693"; gene_version "1"; gene_name "4933401J01Rik"; gene_source "havana"; gene_biotype "TEC"

gene_names

if true, add a column containing gene name

gene_length

if true, add a column containing gene length

remove_duplicates

remove rows if gene names are duplicated. The row that is kept is the first instance of that gene. To remove duplicates based on something other that gene name, use the separate remove_duplicates function.

Value

A data frame containing all the genes from gtf_file

Examples

1
parse_GTF_info(gtf_file)

laurabiggins/GOcategoryStats documentation built on Oct. 27, 2019, 11:36 a.m.