knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.path = "man/figures/README-", out.width = "100%" )
The goal of utilitybeltrefseq is to make it easy to find and download the best refseq assemblies available for any given species. All you need to know is the species NCBI taxonomy ID
You can install the development version of utilitybeltrefseq from GitHub with:
# install.packages("devtools") devtools::install_github("selkamand/utilitybeltrefseq")
Say you want to find the best reference genome for Escherichia coli.
First we load the library then download a list of available assemblies from refseq.
library(utilitybeltrefseq)
update_refseq_data_cache()
For best results, run the above command every couple of months to stay up to date with whats currently in refseq.
Next, we need to find the NCBI taxonomy ID of our species of interest. We can find that here (taxid: 562). We could also have used the taxize package if we wanted to stay within R.
Now that we have the species level taxid, we can run choose_best_assembly
choose_best_assembly(taxid_of_interest = 562)
Note that we get a warning that there are multiple 'best' assemblies. If there are multiple strains in a species with complete assemblies - it his hard to know which to return. We choose the most recently added assembly. To return all of the high quality assemblies you can run:
choose_best_assembly( taxid_of_interest = 562, break_ties_based_on_newest_sequence_added = FALSE, return_accession_only = TRUE # set this to false to get more info about each assembly )
Keep an eye on the intraspecific name column. Often can use this to choose which strain you're after. See ?choose_best_assembly
for details.
Once we have the assembly accession we're interested in, we can download it:
download_assembly( target_assembly_accession = GCF_000005845.2 )
If we just want to quickly download a ref-genome of our species of interest we can just run
best_assembly = download_best_assembly(taxid_of_interest = 562)
If you have bwa-mem2 installed, this R package can also be used to align paired reads to the best assembly. This can be installed with brew install bwa2
. Can also find linux binaries here. To make a bam index file you will also need samtools
installed and available on PATH
forward_reads = "path/to/forward_reads.fq" reverse_reads = "path/to/reverse_reads.fq" taxid_of_interest = 562 outfile_prefix = "E. coli" taxid_of_interest |> download_best_assembly() |> prep_for_alignment() %>% align_reads(forward_reads = forward_reads, reverse_reads = reverse_reads, outfile_prefix = outfile_prefix) |> bam_index() # remove this line and |> line if you don't want an indexed bam / don't have samtools installed
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.