get_genome_gtf: Download genome (fasta), annotation (GTF) and contaminants

Description Usage Arguments Details Value See Also Examples

View source: R/genome_download_helper.R

Description

This function automatically downloads (if files not already exists) genomes and contaminants specified for genome alignment. Will create a R transcript database (TxDb object) from the annotation.
It will also index the genome for you
If you misspelled something or crashed, delete wrong files and run again.
Do remake = TRUE, to do it all over again.

Usage

1
get_genome_gtf(GTF, output.dir, organism, assembly_type, gunzip, genome)

Arguments

GTF

logical, default: TRUE, download gtf of organism specified in "organism" argument. If FALSE, check if the downloaded file already exist. If you want to use a custom gtf from you hard drive, set GTF = FALSE, and assign:
annotation <- getGenomeAndAnnotation(gtf = FALSE)
annotation["gtf"] = "path/to/gtf.gtf".
Only db = "ensembl" allowed for GTF.

output.dir

directory to save downloaded data

organism

scientific name of organism, Homo sapiens, Danio rerio, Mus musculus, etc. See biomartr:::get.ensembl.info() for full list of supported organisms.

assembly_type

a character string specifying from which assembly type the genome shall be retrieved from (ensembl only, else this argument is ignored): Default is assembly_type = "primary_assembly"). This will give you all no copies of any chromosomes. As an example, the primary_assembly fasta genome in human is only a few GB uncompressed.
assembly_type = "toplevel"). This will give you all multi-chromosomes (copies of the same chromosome with small variations). As an example the toplevel fasta genome in human is over 70 GB uncompressed. To get primary assembly with 1 chromosome variant per chromosome:

gunzip

logical, default TRUE, uncompress downloaded files that are zipped when downloaded, should be TRUE!

genome

a character path, default NULL. if set, must be path to genome fasta file, must be indexed. If you want to make sure chromosome naming of the GTF matches the genome. Not necessary if you downloaded from same source. If value is NULL or FALSE, will be ignored.

Details

If you want custom genome or gtf from you hard drive, assign it after you run this function, like this:
annotation <- getGenomeAndAnnotation(GTF = FALSE, genome = FALSE)
annotation["genome"] = "path/to/genome.fasta"
annotation["gtf"] = "path/to/gtf.gtf"

Value

a named character vector of path to genomes and gtf downloaded, and additional contaminants if used. If merge_contaminants is TRUE, will not give individual fasta files to contaminants, but only the merged one.

See Also

Other STAR: STAR.align.folder(), STAR.align.single(), STAR.allsteps.multiQC(), STAR.index(), STAR.install(), STAR.multiQC(), STAR.remove.crashed.genome(), install.fastp()

Examples

1
2
3
4
5
output.dir <- "/Bio_data/references/zebrafish"
#getGenomeAndAnnotation("Danio rerio", output.dir)

## Get Phix contamints to deplete during alignment
#getGenomeAndAnnotation("Danio rerio", output.dir, phix = TRUE)

ORFik documentation built on March 27, 2021, 6 p.m.