tr2g_gtf | R Documentation |
This function reads a GTF file and extracts the transcript ID and
corresponding gene ID. This function assumes that the GTF file is properly
formatted. See http://mblab.wustl.edu/GTF2.html for a detailed
description of proper GTF format. Note that GFF3 files have a somewhat
different and more complicated format in the attribute field, which this
function does not support. See http://gmod.org/wiki/GFF3 for a detailed
description of proper GFF3 format. To extract transcript and gene information
from GFF3 files, see the function tr2g_gff3
in this package.
tr2g_gtf(
file,
Genome = NULL,
get_transcriptome = TRUE,
out_path = ".",
write_tr2g = TRUE,
transcript_id = "transcript_id",
gene_id = "gene_id",
gene_name = "gene_name",
transcript_version = "transcript_version",
gene_version = "gene_version",
version_sep = ".",
transcript_biotype_col = "transcript_biotype",
gene_biotype_col = "gene_biotype",
transcript_biotype_use = "all",
gene_biotype_use = "all",
chrs_only = TRUE,
compress_fa = FALSE,
save_filtered_gtf = TRUE,
overwrite = FALSE
)
file |
Path to a GTF file to be read. The file can remain gzipped. Use
|
Genome |
Either a |
get_transcriptome |
Logical, whether to extract transcriptome from
genome with the GTF file. If filtering biotypes or chromosomes, the filtered
|
out_path |
Directory to save the outputs written to disk. If this directory does not exist, then it will be created. Defaults to the current working directory. |
write_tr2g |
Logical, whether to write tr2g to disk. If |
transcript_id |
Character vector of length 1. Tag in |
gene_id |
Character vector of length 1. Tag in |
gene_name |
Character vector of length 1. Tag in |
transcript_version |
Character vector of length 1. Tag in |
gene_version |
Character vector of length 1. Tag in |
version_sep |
Character to separate bewteen the main ID and the version number. Defaults to ".", as in Ensembl. |
transcript_biotype_col |
Character vector of length 1. Tag in
|
gene_biotype_col |
Character vector of length 1. Tag in |
transcript_biotype_use |
Character, can be "all" or
a vector of transcript biotypes to be used. Transcript biotypes aren't
entirely the same as gene biotypes. For instance, in Ensembl annotation,
|
gene_biotype_use |
Character, can be "all", "cellranger", or
a vector of gene biotypes to be used. If "cellranger", then the biotypes
used by Cell Ranger's reference are used. See |
chrs_only |
Logical, whether to include chromosomes only, for GTF and
GFF files can contain annotations for scaffolds, which are not incorporated
into chromosomes. This will also exclude haplotypes. Defaults to |
compress_fa |
Logical, whether to compress the output fasta file. If
|
save_filtered_gtf |
Logical. If filtering type, biotypes, and/or
chromosomes, whether to save the filtered |
overwrite |
Logical, whether to overwrite if files with names of outputs written to disk already exist. |
Transcript and gene versions may not be present in all GTF files, so these
arguments are optional. This function has arguments for transcript and gene
version numbers because Ensembl IDs have version numbers. For Ensembl IDs, we
recommend including the version number, since a change in version number
signals a change in the entity referred to by the ID after reannotation. If a
version is used, then it will be appended to the ID, separated by
version_sep
.
The transcript and gene IDs are The attribute
field (the last
field) of GTF files can be complicated and inconsistent across different
sources. Please check the attribute
tags in your GTF file and consider
the arguments of this function carefully. The defaults are set according to
Ensembl GTF files; defaults may not work for files from other sources. Due to
the general lack of standards for the attribute
field, you may need to
further clean up the output of this function.
A data frame at least 2 columns: gene
for gene ID,
transcript
for transcript ID, and optionally, gene_name
for
gene names.
ensembl_gene_biotypes ensembl_tx_biotypes cellranger_biotypes
Other functions to retrieve transcript and gene info:
sort_tr2g()
,
tr2g_EnsDb()
,
tr2g_TxDb()
,
tr2g_ensembl()
,
tr2g_fasta()
,
tr2g_gff3()
,
transcript2gene()
toy_path <- system.file("testdata", package = "BUSpaRse")
file_use <- paste(toy_path, "gtf_test.gtf", sep = "/")
# Default
tr2g <- tr2g_gtf(file = file_use, get_transcriptome = FALSE,
write_tr2g = FALSE, save_filtered_gtf = FALSE)
# Excluding version numbers
tr2g <- tr2g_gtf(file = file_use, transcript_version = NULL,
gene_version = NULL, get_transcriptome = FALSE,
write_tr2g = FALSE, save_filtered_gtf = FALSE)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.