dl_transcriptome | R Documentation |
This function downloads the cDNA fasta file from specific version of Ensembl. It can also filter the fasta file by gene and transcript biotype and remove scaffolds and haplotypes.
dl_transcriptome(
species,
out_path = ".",
type = c("vertebrate", "metazoa", "plant", "fungus", "protist"),
transcript_biotype_use = "all",
gene_biotype_use = "all",
chrs_only = TRUE,
ensembl_version = NULL,
verbose = TRUE,
...
)
species |
Character vector of length 1, Latin name of the species of interest. |
out_path |
Directory to save the outputs written to disk. If this directory does not exist, then it will be created. Defaults to the current working directory. |
type |
Character, must be one of "vertebrate", "metazoa", "plant", "fungus" and "protist". Passing "vertebrate" will use the default www.ensembl.org host. Gene annotation of some common invertebrate model organisms, such as Drosophila melanogaster, are available on www.ensembl.org so for these invertebrate model organisms, "vertebrate" can be used for this argument. Passing values other than "vertebrate" will use other Ensembl hosts. For animals absent from www.ensembl.org, try "metazoa". |
transcript_biotype_use |
Character, can be "all" or
a vector of transcript biotypes to be used. Transcript biotypes aren't
entirely the same as gene biotypes. For instance, in Ensembl annotation,
|
gene_biotype_use |
Character, can be "all", "cellranger", or
a vector of gene biotypes to be used. If "cellranger", then the biotypes
used by Cell Ranger's reference are used. See |
chrs_only |
Logical, whether to include chromosomes only, for GTF and
GFF files can contain annotations for scaffolds, which are not incorporated
into chromosomes. This will also exclude haplotypes. Defaults to |
ensembl_version |
Integer version number of Ensembl (e.g. 94 for the
October 2018 release). This argument defaults to |
verbose |
Whether to display progress. |
... |
Other arguments passed to |
Invisibly returns the path to the fasta file. The following files are
written to disk, in the out_path
directory:
The cDNA fasta file from Ensembl, from the specified version.
The filtered cDNA fasta file, only keeping the
desired biotypes and without scaffolds and haplotypes (if
chrs_only = TRUE
). This file will not be written if all gene and transcript
biotypes are used and scaffolds and haplotypes are not removed.
The transcript to gene file, without headers so can be
directly used for bustools
.
dl_transcriptome("Drosophila melanogaster", gene_biotype_use = "cellranger",
chrs_only = FALSE)
# Clean up
file.remove("Drosophila_melanogaster.BDGP6.32.cdna.all.fa.gz")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.