tr2g_ensembl: Get transcript and gene info from Ensembl

Description Usage Arguments Value See Also Examples

View source: R/tr2g.R

Description

This function queries Ensembl biomart to convert transcript IDs to gene IDs.

Usage

1
2
3
4
tr2g_ensembl(species, type = c("vertebrate", "metazoa", "plant",
  "fungus", "protist"), other_attrs = NULL, use_gene_name = TRUE,
  use_transcript_version = TRUE, use_gene_version = TRUE,
  ensembl_version = NULL, verbose = TRUE, ...)

Arguments

species

Character vector of length 1, Latin name of the species of interest.

type

Character, must be one of "vertebrate", "metazoa", "plant", "fungus" and "protist". Passing "vertebrate" will use the default www.ensembl.org host. Gene annotation of some common invertebrate model organisms, such as Drosophila melanogaster, are available on www.ensembl.org so for these invertebrate model organisms, "vertebrate" can be used for this argument. Passing values other than "vertebrate" will use other Ensembl hosts. For animals absent from www.ensembl.org, try "metazoa".

other_attrs

Character vector. Other attributes to get from Ensembl, such as gene symbol and position on the genome. Use listAttributes to see which attributes are available.

use_gene_name

Logical, whether to get gene names.

use_transcript_version

Logical, whether to include version number in the Ensembl transcript ID. To decide whether to include transcript version number, check whether version numbers are included in the transcripts.txt in the kallisto output directory. If that file includes version numbers, then trannscript version numbers must be included here as well. If that file does not include version numbers, then transcript version numbers must not be included here.

use_gene_version

Logical, whether to include version number in the Ensembl gene ID. Unlike transcript version number, it's up to you whether to include gene version number.

ensembl_version

Integer version number of Ensembl (e.g. 94 for the October 2018 release). This argument defaults to NULL, which will use the current release of Ensembl. Use listEnsemblArchives to see the version number corresponding to the Ensembl release of a particular date. The version specified here must match the version of Ensembl where the transcriptome used to build the kallisto index was downloaded.

verbose

Whether to display progress.

...

Othe arguments to be passed to useEnsembl, such as mirror. Note that setting mirrors other than the default, e.g. uswest, does not work for archived versions.

Value

A data frame with at least 2 columns: gene for gene ID, transcript for transcript ID, and optionally gene_name for gene names. If other_attrs has been specified, then those will also be columns in the data frame returned.

See Also

Other functions to retrieve transcript and gene info: sort_tr2g, tr2g_EnsDb, tr2g_TxDb, tr2g_fasta, tr2g_gff3, tr2g_gtf, transcript2gene

Examples

1
2
3
tr2g <- tr2g_ensembl(species = "Felis catus", other_attrs = "description")
# This will use plants.ensembl.org as host instead of www.ensembl.org
tr2g <- tr2g_ensembl(species = "Arabidopsis thaliana", type = "plant")

sarangian/RNASeqDEA documentation built on Dec. 8, 2019, 5:24 p.m.