annotate_gr_from_gtf: Annotates a granges object with overlapping genes from gtf...

View source: R/Annotate.R

annotate_gr_from_gtfR Documentation

Annotates a granges object with overlapping genes from gtf file.

Description

gr is the genomic ranges that need to be annotation. Ideally original input should be in the format:

Usage

annotate_gr_from_gtf(
  gr,
  invert_strand = FALSE,
  gtf_gr = NULL,
  annotationType = "any",
  transcriptDetails = FALSE,
  gtf_TxDb,
  annotation_correction = TRUE,
  genome = NULL,
  pA_motif_max_position = 50,
  AAA_motif_min_position = 10,
  polystretch_length = 13,
  max_mismatch = 1
)

Arguments

gr

a granges object of peaks to annotate

invert_strand

Boolean to signifiy if strand of gr peaks should be inversed

gtf_gr

granges gtf file that contains annotation information

annotationType

can be assigned "any" or "within". Default is "any" which states that the peak with gr must overlap annotation feature (eg exon)

transcriptDetails

Boolean. If false will only return gene name. If true will return internal transcript position feature (eg exon/intron)

gtf_TxDb

same as gtf_gr but as a TxDb object.

annotation_correction

Boolean. When multiple overlapping genes are identified will prioritise gene based on annotation. 3'UTR annotation trumps all other annotation.

genome

genome object. If NOT NULL then will perform pA motif analysis.

pA_motif_max_position

Any AAUAAA after this position are not considered (default 50nt)

AAA_motif_min_position

Any polyA/polyT stretches before this postion are not considered (default 10)

polystretch_length

: the length of A or T to search for (default 13)

max_mismatch

: The number of mismatches tolerated in polystretch

Details

chr8:70331172-70331574:+ # chr:start-end:strand

This could already exist within an R object or you can copy it in via readClipboard.

gr <- GRanges(readClipboard())

You need to run the following code: gtf_file <- "u:/Reference/hg38/hg38_gene.gtf.gz" gtf_file <- "u:/Reference/mm10/mm10_gene.gtf.gz" gtf_file <- "u:/Reference/mm10/cellranger_genes.gtf.gz" gtf_gr <- rtracklayer::import(gtf_file) gtf_TxDb <- GenomicFeatures::makeTxDbFromGFF(gtf_file, format="gtf")

annotationType can be c("any", "start", "end", "within", "equal"),

Value

a dataframe with appended columns containing annotation

Examples

library(Sierra)

 # Generate peaks for Cxcl12, Arhgap10, Mast4,  using mm10 coordinates:
 gr_peaks <- GenomicRanges::GRanges(c("chr6:117174600-117175065:+",
            "chr6:117180975-117181367:+",
            "chr8:77250366-77250686:-",
            "chr8:77426400-77517833:-",
            "chr13:102905701-102906230:-",
            "chr13:103139934-103171545:-"))
            
 # Load other files from vignette           
 extdata_path <- system.file("extdata",package = "Sierra")
 reference.file <- paste0(extdata_path,"/Vignette_cellranger_genes_subset.gtf")
 
 # convert gtf file to both granges and a TXDb object
 gtf_gr <- rtracklayer::import(reference.file)
 gtf_TxDb <- GenomicFeatures::makeTxDbFromGFF(reference.file, format="gtf")
 
 genome <- BSgenome.Mmusculus.UCSC.mm10::BSgenome.Mmusculus.UCSC.mm10          
  
 annotate_gr_from_gtf(gr = gr_peaks, gtf_gr = gtf_gr,
                      gtf_TxDb = gtf_TxDb, genome = genome)

 annotate_gr_from_gtf(gr = gr_peaks, gtf_gr = gtf_gr,
                      gtf_TxDb = gtf_TxDb, genome = genome, transcriptDetails=TRUE)                                            

VCCRI/Sierra documentation built on July 3, 2023, 6:39 a.m.