transcript2gene: Map Ensembl transcript ID to gene ID

View source: R/tr2g.R

transcript2geneR Documentation

Map Ensembl transcript ID to gene ID

Description

This function is a shortcut to get the correctly sorted data frame with transcript IDs and the corresponding gene IDs from Ensembl biomart or Ensembl transcriptome FASTA files. For biomart query, it calls tr2g_ensembl and then sort_tr2g. For FASTA files, it calls tr2g_fasta and then sort_tr2g. Unlike in tr2g_ensembl and tr2g_fasta, multiple species can be supplied if cells from different species were sequenced together. This function should only be used if the kallisto inidex was built with transcriptomes from Ensembl. Also, if querying biomart, please make sure to set ensembl_version to match the version where the transcriptomes were downloaded.

Usage

transcript2gene(
  species,
  fasta_file,
  kallisto_out_path,
  type = "vertebrate",
  ...
)

Arguments

species

A character vector of Latin names of species present in this scRNA-seq dataset. This is used to retrieve Ensembl information from biomart.

fasta_file

Character vector of paths to the transcriptome FASTA files used to build the kallisto index. Exactly one of species and fasta_file can be missing.

kallisto_out_path

Path to the kallisto bus output directory.

type

A character vector indicating the type of each species. Each element must be one of "vertebrate", "metazoa", "plant", "fungus", and "protist". If length is 1, then this type will be used for all species specified here. Can be missing if fasta_file is specified.

...

Other arguments passed to tr2g_ensembl such as other_attrs, ensembl_version, and arguments passed to useMart. If fasta_files is supplied instead of species, then this will be extra argumennts to tr2g_fasta, such as use_transcript_version and use_gene_version.

Value

A data frame with two columns: gene and transcript, with Ensembl gene and transcript IDs (with version number), in the same order as in the transcriptome index used in kallisto.

Note

This function has been superseded by the new version of tr2g_* functions that can extract transcriptome for only the biotypes specified and with only the standard chromosomes. The new version of tr2g_* functions also sorts the transcriptome so the tr2g and the transcriptome have transcripts in the same order.

See Also

Other functions to retrieve transcript and gene info: sort_tr2g(), tr2g_EnsDb(), tr2g_TxDb(), tr2g_ensembl(), tr2g_fasta(), tr2g_gff3(), tr2g_gtf()

Examples

# Download dataset already in BUS format
library(TENxBUSData)
TENxBUSData(".", dataset = "hgmm100")
tr2g <- transcript2gene(c("Homo sapiens", "Mus musculus"), 
  type = "vertebrate", save_filtered = FALSE,
  ensembl_version = 99, kallisto_out_path = "./out_hgmm100")
# Clean up files from the example
unlink("out_hgmm100")

BUStools/BUSpaRse documentation built on March 3, 2024, 9:11 a.m.