tr2g_TxDb | R Documentation |
The genome and gene annotations of some species can be conveniently obtained
from Bioconductor packages. This is more convenient than downloading GTF
files from Ensembl and reading it into R. In these packages, the gene
annotation is stored in a TxDb
object, which has standardized
names for gene IDs, transcript IDs, exon IDs, and so on, which are stored in
the metadata fields in GTF and GFF3 files, which are not standardized.
This function extracts transcript and corresponding gene information from
gene annotation stored in a TxDb
object.
tr2g_TxDb(
txdb,
Genome = NULL,
get_transcriptome = TRUE,
out_path = ".",
write_tr2g = TRUE,
chrs_only = TRUE,
compress_fa = FALSE,
overwrite = FALSE
)
txdb |
A |
Genome |
Either a |
get_transcriptome |
Logical, whether to extract transcriptome from
genome with the GTF file. If filtering biotypes or chromosomes, the filtered
|
out_path |
Directory to save the outputs written to disk. If this directory does not exist, then it will be created. Defaults to the current working directory. |
write_tr2g |
Logical, whether to write tr2g to disk. If |
chrs_only |
Logical, whether to include chromosomes only, for GTF and
GFF files can contain annotations for scaffolds, which are not incorporated
into chromosomes. This will also exclude haplotypes. Defaults to |
compress_fa |
Logical, whether to compress the output fasta file. If
|
overwrite |
Logical, whether to overwrite if files with names of outputs written to disk already exist. |
A data frame with 3 columns: gene
for gene ID, transcript
for transcript ID, and tx_id
for internal transcript IDs used to avoid
duplicate transcript names. For TxDb packages from Bioconductor, gene ID is
Entrez ID, while transcript IDs are Ensembl IDs with version numbers for
TxDb.Hsapiens.UCSC.hg38.knownGene
. In some cases, the transcript ID
have duplicates, and this is resolved by adding numbers to make the IDs
unique.
A data frame with 3 columns: gene
for gene ID, transcript
for transcript ID, and gene_name
for gene names. If other_attrs
has been specified, then those will also be columns in the data frame returned.
Other functions to retrieve transcript and gene info:
sort_tr2g()
,
tr2g_EnsDb()
,
tr2g_ensembl()
,
tr2g_fasta()
,
tr2g_gff3()
,
tr2g_gtf()
,
transcript2gene()
Other functions to retrieve transcript and gene info:
sort_tr2g()
,
tr2g_EnsDb()
,
tr2g_ensembl()
,
tr2g_fasta()
,
tr2g_gff3()
,
tr2g_gtf()
,
transcript2gene()
library(TxDb.Hsapiens.UCSC.hg38.knownGene)
library(BSgenome.Hsapiens.UCSC.hg38)
tr2g_TxDb(TxDb.Hsapiens.UCSC.hg38.knownGene, BSgenome.Hsapiens.UCSC.hg38)
# Clean up
file.remove("transcriptome.fa", "tr2g.tsv")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.