annotate_gr_from_gtf | R Documentation |
gr is the genomic ranges that need to be annotation. Ideally original input should be in the format:
annotate_gr_from_gtf(
gr,
invert_strand = FALSE,
gtf_gr = NULL,
annotationType = "any",
transcriptDetails = FALSE,
gtf_TxDb,
annotation_correction = TRUE,
genome = NULL,
pA_motif_max_position = 50,
AAA_motif_min_position = 10,
polystretch_length = 13,
max_mismatch = 1
)
gr |
a granges object of peaks to annotate |
invert_strand |
Boolean to signifiy if strand of gr peaks should be inversed |
gtf_gr |
granges gtf file that contains annotation information |
annotationType |
can be assigned "any" or "within". Default is "any" which states that the peak with gr must overlap annotation feature (eg exon) |
transcriptDetails |
Boolean. If false will only return gene name. If true will return internal transcript position feature (eg exon/intron) |
gtf_TxDb |
same as gtf_gr but as a TxDb object. |
annotation_correction |
Boolean. When multiple overlapping genes are identified will prioritise gene based on annotation. 3'UTR annotation trumps all other annotation. |
genome |
genome object. If NOT NULL then will perform pA motif analysis. |
pA_motif_max_position |
Any AAUAAA after this position are not considered (default 50nt) |
AAA_motif_min_position |
Any polyA/polyT stretches before this postion are not considered (default 10) |
polystretch_length |
: the length of A or T to search for (default 13) |
max_mismatch |
: The number of mismatches tolerated in polystretch |
chr8:70331172-70331574:+ # chr:start-end:strand
This could already exist within an R object or you can copy it in via readClipboard.
gr <- GRanges(readClipboard())
You need to run the following code: gtf_file <- "u:/Reference/hg38/hg38_gene.gtf.gz" gtf_file <- "u:/Reference/mm10/mm10_gene.gtf.gz" gtf_file <- "u:/Reference/mm10/cellranger_genes.gtf.gz" gtf_gr <- rtracklayer::import(gtf_file) gtf_TxDb <- GenomicFeatures::makeTxDbFromGFF(gtf_file, format="gtf")
annotationType can be c("any", "start", "end", "within", "equal"),
a dataframe with appended columns containing annotation
library(Sierra)
# Generate peaks for Cxcl12, Arhgap10, Mast4, using mm10 coordinates:
gr_peaks <- GenomicRanges::GRanges(c("chr6:117174600-117175065:+",
"chr6:117180975-117181367:+",
"chr8:77250366-77250686:-",
"chr8:77426400-77517833:-",
"chr13:102905701-102906230:-",
"chr13:103139934-103171545:-"))
# Load other files from vignette
extdata_path <- system.file("extdata",package = "Sierra")
reference.file <- paste0(extdata_path,"/Vignette_cellranger_genes_subset.gtf")
# convert gtf file to both granges and a TXDb object
gtf_gr <- rtracklayer::import(reference.file)
gtf_TxDb <- GenomicFeatures::makeTxDbFromGFF(reference.file, format="gtf")
genome <- BSgenome.Mmusculus.UCSC.mm10::BSgenome.Mmusculus.UCSC.mm10
annotate_gr_from_gtf(gr = gr_peaks, gtf_gr = gtf_gr,
gtf_TxDb = gtf_TxDb, genome = genome)
annotate_gr_from_gtf(gr = gr_peaks, gtf_gr = gtf_gr,
gtf_TxDb = gtf_TxDb, genome = genome, transcriptDetails=TRUE)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.