View source: R/makeGRangesFromGff.R
makeGRangesFromGff | R Documentation |
GRanges
) from a GFF/GTF fileMake genomic ranges (GRanges
) from a GFF/GTF file
makeGRangesFromGff(
file,
level = c("genes", "transcripts"),
ignoreVersion = FALSE,
extraMcols = TRUE
)
file |
|
level |
|
ignoreVersion |
|
extraMcols |
|
Remote URLs and compressed files are supported.
GRanges
.
The GFF (General Feature Format) format consists of one line per feature, each containing 9 columns of data, plus optional track definition lines.
The GTF (General Transfer Format) format is identical to GFF version 2.
The UCSC website has detailed conventions on the GFF3 format, including the metadata columns.
CDS
: CoDing Ssequence. A contiguous sequence that contains a
genomic interval bounded by start and stop codons. CDS refers to the
portion of a genomic DNA sequence that is translated, from the start codon
to the stop codon.
exon
: Genomic interval containing 5' UTR (five_prime_UTR
), CDS, and
3' UTR (three_prime_UTR
).
mRNA
: Processed (spliced) mRNA transcript.
See also:
Currently makeGRangesFromGff()
supports genomes from these sources:
Ensembl
GENCODE
RefSeq
UCSC
FlyBase
WormBase
Note that makeGRangesFromEnsembl()
offers native support for Ensembl genome
builds and returns additional useful metadata that isn't defined inside a
GFF/GTF file.
If you must load a GFF/GTF file directly, then use makeGRangesFromGff()
.
Example URLs:
Example URLs:
Annotations available from Ensembl and GENCODE are very similar.
The GENCODE annotation is made by merging the manual gene annotation produced by the Ensembl-Havana team and the Ensembl-genebuild automated gene annotation. The GENCODE annotation is the default gene annotation displayed in the Ensembl browser. The GENCODE releases coincide with the Ensembl releases, although GENCODE can skip an Ensembl release if there is no update to the annotation with respect to the previous release. In practical terms, the GENCODE annotation is essentially identical to the Ensembl annotation.
However, GENCODE handles pseudoautosomal regions (PAR) differently than Ensembl. The Ensembl GTF file only includes this annotation once, for chromosome X. However, GENCODE GTF/GFF3 files include the annotation in the PAR regions of both chromosomes. You'll see these genes contain a "_PAR_Y" suffix.
Additionally, GENCODE GFF/GTF files import with a gene identifier containing
a suffix, which differs slightly from the Ensembl GFF/GTF spec
(e.g. GENCODE: ENSG00000000003.14
; Ensembl: ENSG00000000003
).
The GENCODE FAQ has additional details.
Refer to the current RefSeq spec for details.
Example URLs:
See also:
https://ftp.ncbi.nih.gov/gene/DATA/gene2refseq.gz
Example URLs:
UCSC Homo sapiens hg38 GTF files: hg38.knownGene.gtf.gz, hg38.ncbiRefSeq.gtf.gz,
Related URLs:
Example URLs:
Example URLs:
Updated 2023-12-04.
rtracklayer::import()
.
GenomicFeatures::makeTxDbFromGRanges()
.
GenomicFeatures::makeTxDbFromGFF()
.
GenomicFeatures:::.make_splicings()
.
tximeta:::getRanges()
.
AnnotationDbi::columns()
.
GenomeInfoDb::GenomeDescription-class
, which describes useful organism
,
commonName
, providerVersion
, provider
, and releaseDate
accessors.
## Ensembl ====
file <- AcidBase::pasteUrl(
"ftp.ensembl.org",
"pub",
"release-108",
"gtf",
"homo_sapiens",
"Homo_sapiens.GRCh38.108.gtf.gz",
protocol = "ftp"
)
genes <- makeGRangesFromGff(file = file, level = "genes")
summary(genes)
## > transcripts <- makeGRangesFromGff(file = file, level = "transcripts")
## > summary(transcripts)
## GENCODE ====
## > file <- AcidBase::pasteUrl(
## > "ftp.ebi.ac.uk",
## > "pub",
## > "databases",
## > "gencode",
## > "Gencode_human",
## > "release_42",
## > "gencode.v42.annotation.gtf.gz",
## > protocol = "ftp"
## > )
## > genes <- makeGRangesFromGff(file = file, level = "genes")
## > summary(genes)
## > transcripts <- makeGRangesFromGff(file = file, level = "transcripts")
## > summary(transcripts)
## RefSeq ====
## > file <- AcidBase::pasteUrl(
## > "ftp.ncbi.nlm.nih.gov",
## > "genomes",
## > "refseq",
## > "vertebrate_mammalian",
## > "Homo_sapiens",
## > "all_assembly_versions",
## > "GCF_000001405.40_GRCh38.p14",
## > "GCF_000001405.40_GRCh38.p14_genomic.gff.gz",
## > protocol = "ftp"
## > )
## > genes <- makeGRangesFromGff(file = file, level = "genes")
## > summary(genes)
## > transcripts <- makeGRangesFromGff(file = file, level = "transcripts")
## > summary(transcripts)
## UCSC ====
## > file <- AcidBase::pasteUrl(
## > "hgdownload.soe.ucsc.edu",
## > "goldenPath",
## > "hg38",
## > "bigZips",
## > "genes",
## > "hg38.knownGene.gtf.gz",
## > protocol = "ftp"
## > )
## > genes <- makeGRangesFromGff(file = file, level = "genes")
## > summary(genes)
## > transcripts <- makeGRangesFromGff(file = file, level = "transcripts")
## > summary(transcripts)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.