Description Usage Arguments Details Value See Also Examples
View source: R/genome_download.R
This function automatically downloads (if files not already exists)
genomes and contaminants specified for genome alignment.
Will create a R transcript database (TxDb object) from the annotation.
It will also index the genome for you
If you misspelled something or crashed, delete wrong files and
run again.
Do remake = TRUE, to do it all over again.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
organism |
scientific name of organism, Homo sapiens,
Danio rerio, Mus musculus, etc. See |
output.dir |
directory to save downloaded data |
db |
database to use for genome and GTF, default adviced: "ensembl" (will contain haplotypes, large file!). Alternatives: "refseq" (primary assembly) and "genbank" (mix) |
GTF |
logical, default: TRUE, download gtf of organism specified
in "organism" argument. If FALSE, check if the downloaded
file already exist. If you want to use a custom gtf from you hard drive,
set GTF = FALSE,
and assign: |
genome |
logical, default: TRUE, download genome of organism
specified in "organism" argument. If FALSE, check if the downloaded
file already exist. If you want to use a custom gtf from you hard drive,
set GTF = FALSE,
and assign: |
merge_contaminants |
logical, default TRUE. Will merge the contaminants specified into one fasta file, this considerably saves space and is much quicker to align with STAR than each contamint on it's own. If no contaminants are specified, this is ignored. |
phix |
logical, default FALSE, download phix sequence to filter out with. Phix is used as a contaminant genome. Only use if illumina sequencing. Phix is used in Illumina sequencers for sequencing quality control. Genome is: refseq, Escherichia virus phiX174 |
ncRNA |
logical or character, default FALSE (not used, no download),
ncRNA is used as a contaminant genome.
If TRUE, will try to find ncRNA sequences from the gtf file, usually represented as
lncRNA (long noncoding RNA's). Will let you know if no ncRNA sequences were found in
gtf. |
tRNA |
logical or character, default FALSE (not used, no download),
tRNA is used as a contaminant genome.
If TRUE, will try to find tRNA sequences from the gtf file, usually represented as
Mt_tRNA (mature tRNA's). Will let you know if no tRNA sequences were found in
gtf. If not found try character input: |
rRNA |
logical or character, default FALSE (not used, no download),
rRNA is used as a contaminant genome.
If TRUE, will try to find rRNA sequences from the gtf file, usually represented as
rRNA (ribosomal RNA's). Will let you know if no rRNA sequences were found in
gtf. If not found you can try character input: |
gunzip |
logical, default TRUE, uncompress downloaded files that are zipped when downloaded, should be TRUE! |
remake |
logical, default: FALSE, if TRUE remake everything specified |
assembly_type |
a character string specifying from which assembly type
the genome shall be retrieved from (ensembl only, else this argument is ignored):
Default is
|
If you want custom genome or gtf from you hard drive, assign it
after you run this function, like this:
annotation <- getGenomeAndAnnotation(GTF = FALSE, genome = FALSE)
annotation["genome"] = "path/to/genome.fasta"
annotation["gtf"] = "path/to/gtf.gtf"
a named character vector of path to genomes and gtf downloaded, and additional contaminants if used. If merge_contaminants is TRUE, will not give individual fasta files to contaminants, but only the merged one.
Other STAR:
STAR.align.folder()
,
STAR.align.single()
,
STAR.allsteps.multiQC()
,
STAR.index()
,
STAR.install()
,
STAR.multiQC()
,
STAR.remove.crashed.genome()
,
install.fastp()
1 2 3 4 5 | output.dir <- "/Bio_data/references/zebrafish"
#getGenomeAndAnnotation("Danio rerio", output.dir)
## Get Phix contamints to deplete during alignment
#getGenomeAndAnnotation("Danio rerio", output.dir, phix = TRUE)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.