tr2g_ensembl | R Documentation |
This function queries Ensembl biomart to convert transcript IDs to gene IDs.
tr2g_ensembl(
species,
type = c("vertebrate", "metazoa", "plant", "fungus", "protist"),
out_path = ".",
write_tr2g = TRUE,
other_attrs = NULL,
use_gene_name = TRUE,
use_transcript_version = TRUE,
use_gene_version = TRUE,
transcript_biotype_col = "transcript_biotype",
gene_biotype_col = "gene_biotype",
transcript_biotype_use = "all",
gene_biotype_use = "all",
chrs_only = TRUE,
ensembl_version = NULL,
overwrite = FALSE,
verbose = TRUE,
...
)
species |
Character vector of length 1, Latin name of the species of interest. |
type |
Character, must be one of "vertebrate", "metazoa", "plant", "fungus" and "protist". Passing "vertebrate" will use the default www.ensembl.org host. Gene annotation of some common invertebrate model organisms, such as Drosophila melanogaster, are available on www.ensembl.org so for these invertebrate model organisms, "vertebrate" can be used for this argument. Passing values other than "vertebrate" will use other Ensembl hosts. For animals absent from www.ensembl.org, try "metazoa". |
out_path |
Directory to save the outputs written to disk. If this directory does not exist, then it will be created. Defaults to the current working directory. |
write_tr2g |
Logical, whether to write tr2g to disk. If |
other_attrs |
Character vector. Other attributes to get from Ensembl,
such as gene symbol and position on the genome.
Use |
use_gene_name |
Logical, whether to get gene names. |
use_transcript_version |
Logical, whether to include version number in
the Ensembl transcript ID. To decide whether to
include transcript version number, check whether version numbers are included
in the |
use_gene_version |
Logical, whether to include version number in the Ensembl gene ID. Unlike transcript version number, it's up to you whether to include gene version number. |
transcript_biotype_col |
Character vector of length 1. Tag in
|
gene_biotype_col |
Character vector of length 1. Tag in |
transcript_biotype_use |
Character, can be "all" or
a vector of transcript biotypes to be used. Transcript biotypes aren't
entirely the same as gene biotypes. For instance, in Ensembl annotation,
|
gene_biotype_use |
Character, can be "all", "cellranger", or
a vector of gene biotypes to be used. If "cellranger", then the biotypes
used by Cell Ranger's reference are used. See |
chrs_only |
Logical, whether to include chromosomes only, for GTF and
GFF files can contain annotations for scaffolds, which are not incorporated
into chromosomes. This will also exclude haplotypes. Defaults to |
ensembl_version |
Integer version number of Ensembl (e.g. 94 for the
October 2018 release). This argument defaults to |
overwrite |
Logical, whether to overwrite if files with names of outputs written to disk already exist. |
verbose |
Whether to display progress. |
... |
Othe arguments to be passed to |
A data frame with at least 2 columns: gene
for gene ID,
transcript
for transcript ID, and optionally gene_name
for gene names. If other_attrs
has been specified, then those will
also be columns in the data frame returned.
dl_transcriptome
Other functions to retrieve transcript and gene info:
sort_tr2g()
,
tr2g_EnsDb()
,
tr2g_TxDb()
,
tr2g_fasta()
,
tr2g_gff3()
,
tr2g_gtf()
,
transcript2gene()
tr2g <- tr2g_ensembl(species = "Danio rerio",
other_attrs = "description", write_tr2g = FALSE)
# This will use plants.ensembl.org as host instead of www.ensembl.org
tr2g <- tr2g_ensembl(species = "Arabidopsis thaliana", type = "plant",
write_tr2g = FALSE)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.