process_gencodefile: Preprocess gencode file
In svenstringer/genematrix: Create a Table with Gene Annotation and Output in Convenient Format

Description Usage Arguments Details Value Examples

Filter gencode file to include only genes on chromosome 1-22,X,Y,M and reformat before returning as data.table

1	process_gencodefile(gencode_path)

gencode_path

file path of downloaded .gtf.gz file

Gencode files are downloaded from http://www.gencodegenes.org/releases/grch37_mapped_releases.html

Assumed input format .gtf.gz:

1) chromosome name chr{1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,X,Y,M} or GRC accession
2) annotation source {ENSEMBL,HAVANA}
3) feature type {gene,transcript,exon,CDS,UTR,start_codon,stop_codon,Selenocysteine}
4) genomic start location integer-value (1-based)
5) genomic end location integer-value
6) score (not used)
7) genomic strand {+,-}
8) genomic phase (for CDS features) {0,1,2,.}
9) additional information as key-value pairs

First, the entries are filtered on feature_type == 'gene' and status == 'KNOWN'. This mostly excludes transcripts. The 'chr' prefix is removed from chromosome values and any chromosomes other than 1-2,X,Y,M are removed. The resulting chromosome values are cast into an ordered factor (ordering: 1-22,X,Y,M). Then additional columns are extracted from the key,value pairs in the info column. Any genes with gene_types in c('misc_RNA','snoRNA','snRNA') are removed. Finally the redundant columns score, phase, and info are removed and a new column ensembl_gene_id is created from gene_id that does not contain subnumbering (i.e. id is x instead of x.y). The resulting file still contains duplicate gene names, but these will be removed after the merge with the canonical hgnc data.

processed gencode table as data.table

## Not run: 
gencode_data <- process_gencodefile(gencode_path)

## End(Not run)

svenstringer/genematrix documentation built on May 30, 2019, 8:48 p.m.

svenstringer/genematrix index

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

svenstringer/genematrix
Create a Table with Gene Annotation and Output in Convenient Format

process_gencodefile: Preprocess gencode file
In svenstringer/genematrix: Create a Table with Gene Annotation and Output in Convenient Format

Description

Usage

Arguments

Details

Value

Examples

Related to process_gencodefile in svenstringer/genematrix...

R Package Documentation

Browse R Packages

We want your feedback!

svenstringer/genematrix Create a Table with Gene Annotation and Output in Convenient Format

process_gencodefile: Preprocess gencode file In svenstringer/genematrix: Create a Table with Gene Annotation and Output in Convenient Format

Description

Usage

Arguments

Details

Value

Examples

Related to process_gencodefile in svenstringer/genematrix...

R Package Documentation

Browse R Packages

We want your feedback!

svenstringer/genematrix
Create a Table with Gene Annotation and Output in Convenient Format

process_gencodefile: Preprocess gencode file
In svenstringer/genematrix: Create a Table with Gene Annotation and Output in Convenient Format