Once you have downloaded the isolationgWorkflow package, you need to load the package before using it.
library(isolationWorkflow)
Once loaded, you can use ?
to find out more about each function in the package.
?abif_fasta() ?parse_blast()
This step includes: Check if sequences were successful, and allows you to exlude failed sequences Trim ends off of sequences, as they are usually strings of Ns * Make a fasta file containing all sequences from your Big Dye plate so can send in one Blastn query
All of this is done using the abif_fasta()
. I recommend running this command twice. The first time I use to evaluate all of my sequences, and the second time I generate a "clean" fasta file.
This will give you:
a file containing sequences before and after trimming
a fasta file containing trimmed sequences (if output = FALSE
)
Note: all output files saved in same location as sanger files
When evaluating sequences, look for the following: sequence length -- too short means the sequence failed (i.e. DNA not released from boiling, no PCR product or BigDye failed) distribution of Ns -- Ns throughout the majority of the read means the isolate was not pure * satisfaction with the trimming algorithm -- compare the raw vs. trimmmed version of the sequence to ensure you agree with the trimming. If you don't, manually edit the final FASTA file before Blasting
# Check for sequence quality, and trimming abif_fasta(folder = "/path/to/sanger/files/", exclude = NULL, trim=TRUE, trim.check=FALSE, export.check=TRUE, show.prog=TRUE, output=FALSE) # prepare sequences for Blast ## exclude parameter accepts multiple entries to exclude multiple samples abif_fasta(folder = "/path/to/sanger/files/", exclude = c('sample1.ab1','sample2.ab1'), trim=TRUE, trim.check=FALSE, export.check=TRUE, show.prog=TRUE, output='myIsolates-clean.FASTA')
Use [Blastn](https://blast.ncbi.nlm.nih.gov/Blast.cgi](https://blast.ncbi.nlm.nih.gov/Blast.cgi) to classify your sequences. Setting the database to 16S ribosomal RNA sequences (Bacteria and Archaea) helps with specificity of the search.
Download your blast result as an XML file and save in same location as sanger files. Then using the parse_blast()
, we can read our results and save as a csv table. Output is saved in same location as your downloaded xml file.
# include path to blast result if necessary parse_blast(filename='/path/to/downloaded/blast/result/######-Alignment.xml', ngroups=1, tophit=FALSE, output='blast_result.csv')
You can use gen_strainlib()
to both generate and update your strain library, depending on how the parameters are used. Regardless, when you use gen_strainlib()
the following happens:
Regarding the last point, the blast result being examined will be cross referenced with the strain library. So you can compare and align your blast results with strains already in your library. These species-based fasta files allow you to align the sequences that are classified to the same species (see next step)
Note: a fasta file of your library strains is also generated because that is a useful thing to have.
``` {r, echo = TRUE, eval = FALSE}
gen _strainlib(blast_file = "/include/path/if/necessary/blast_result.csv", lib_path = "/path/to/strain/library", lib_file = NULL, lib_name = 'mystrainlibrary')
gen _strainlib(blast_file = "/include/path/if/necessary/blast_result.csv", lib_path = "/path/to/strain/library", lib_file = "mystrainlibrary.csv", lib_name = NULL)
***************** # 5) Align redundant species and update strain library This steps aligns the sequences of isolates that were classified as the same species, and allows you to pick which isolates you want to add to the library. Alignment is done using `AlignSeqs()` in the R package, `DECIPHER`. The alignment will automatically pop up as an HTML file in your web browser, with the nucleotides colour-coded to help identify any mis-matches.The alignemnt file is automatically saved in the same location as the fasta file that was aligned. When choosing which strains to keep, consider the following: * is the difference due to an N? if so, then you don't have enough information to say the sequences are different at that base * a strain that is different from others, ONLY because of Ns may have poor sequence quality and you should consider redoing the sequencing for that isolate. ```r # supply more than one fasta to align to save time choose_alignment(lib_path = "/path/to/strain/library", lib_file = 'mystrainlibrary.csv', blast_file = "/include/path/if/necessary/blast_result.csv", fasta_path = "/path/to/species/fasta/files", fastatoalign = c("species1.fasta", "species2.fasta", "species3.fasta"))
Since isolation results usually comes back in waves, it is helpful to be checking your results as you go, and freezing strains down as needed (rather than waiting for all results to come back, then deciding whichs strains to keep). This workflow is designed so that you are able to examine sanger sequences as they come in, and easily cross-reference them with the strains you have already decided to keep in your strain library.
Note, the final strain library file is in the same format as the blast result. This may have extra info you may not find useful in your final strain library. I recommend you don't modify the format, but rather, copy and paste into a seperate file to edit as you please.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.