README.md

WGS Analysis and Detection of Molecular Markers (WADE)

What is WADE?

WADE provides a flexible and customizable method to extract specific genes from a large number of genomes at once using BLAST to interrogate assembled genomes, current molecular analyses include antimicrobial resistance, toxins, virulence profiles and several multi-locus sequence typing (MLST) schemes. The Virulence Factor DataBase (VFDB) and antimicrobial resistance factor databases CARD, ARG-ANNOT and ResFinder have also been made available.

Tabular results are output in a format that is compatible for LabWare uploads. These results can consist of simple "Positive-Negative" results corresponding to presence or absence of a queried gene. Curated multi-fasta lookup files can be provided for molecular determinants to create molecular profiles of affective mutations. Fasta file outputs of gene sequences extracted from the genomes can readily be loaded into sequence aligners to correlate nucleotide differences to phenotypic observations.

Getting Started

This tool can be run using RStudio (available at https://www.rstudio.com/)

Prerequisites

This tool requires the use of R packages: plyr, tidyverse, tidyselect, tidysq, Biostrings, shiny, shinyWidgets, DT, readxl, beepr which can be loaded using:

library(plyr)  
library(tidyverse)  
library(tidyselect)  
library(tidysq)  
library(Biostrings)  
library(shiny)  
library(shinyWidgets)  
library(DT)  
library(readxl)  
library(beepr)  
library(wade)  

and the use of the BLAST+ executable from NCBI: https://blast.ncbi.nlm.nih.gov/Blast.cgi?PAGE_TYPE=BlastDocs&DOC_TYPE=Download

Installation

  1. Install R from https://www.r-project.org/

  2. Install RStudio from https://www.rstudio.com

  3. Install WADE:

    a. from github using:

    sh install.packages("devtools") library("devtools") install_github("phac-nml/wade")

    OR

    b. clone the wade git repository into a directory and run the following:

    sh install.packages("C:/path/to/GitHub/wade", repos=NULL, type="source")

  4. Install required packages

    sh install.packages("plyr") install.packages("tidyverse") install.packages("tidyselect") install.packages("tidysq") install.packages("stringr") install.packages("shiny") install.packages("shinyWidgets") install.packages("DT") install.packages("readxl") install.packages("beepr")

  5. Install Biostrings (https://bioconductor.org/install/)

    sh if(!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager") BiocManager::install(c("ggtree", "Biostrings"))

  6. Install the BLAST+ executable from NCBI: https://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/\ Download and install the *.exe file from the FTP directory.

Setup

Copy the WADE directory from the github/wade folder into the location from where you would like to run the program eg) C:/ drive\ OR\ Set up the following directory

C:/WADE/

with two subdirectories:

C:/WADE/Output/  
C:/WADE/temp/  

This molecular analysis tool queries pre-assembled fasta files. The location of the contig files, vcf files, and wade data files need to be listed in "DirectoryLocations.csv" in the line for the corresponding organism ID:

The contig files must have the file extension".fasta" (eg. MySampleNo_contig.fasta).

To use the multiple sample list option, a sample list file must be located in the WADE folder eg) C:/WADE/list.csv. list.csv must have the following structure:

| SampleNo | Variable | |----------|----------| | 12345 | 4 ug/ml | | 12346 | 8 ug/ml |

Usage

  1. Open the WADE.R file in RStudio. WADE.R is an RShiny UI interface to facilitate usage of WADE.
  2. change the current working directory in WADE.R to the folder where your "DirectoryLocations.csv" file and Output and temp subdirectories are located.
Line 20: curr_work_dir <- "C:/WADE/"
  1. Click on the "Run App" button
  2. Select the desired organism
  3. Select the desired analysis
  4. Choose the query locus. The default is "list" to query all loci in the chosen category, otherwise type in the desired locus.
  5. If this is the first time you have run this particular analysis or if the allele lookup files have been recently updated, the BLAST databases must be indexed by pressing the "MakeBlastdb" button.
  6. Enter sample number, or "list" to query multiple samples
  7. Click the "Go" button. The Output button will display a spreadsheet containing the results of the analysis. These results can also be found in the Output folder.

Troubleshooting

When running this program on some Windows machines, the MakeBlastdb program can give an error. If this happens, the environmental variables setting will need to be changed as follows:

  1. Go to Windows Settings and search for "Environmental Variables"

  2. In the System Properties dialogue box, click on the "Environmental Variables" button

  3. In the "User Variables for..." box, click "New..." button

  4. Input the following:

    sh Variable Name: BLASTDB_LMDB_MAP_SIZE Variable Value: 1000000

WADE Standalone Programs

The wade/standalones folder contains standalone versions of MasterBlastR, SerotypeR and WamR-Pneumo. Use of these tools through the WADE interface is recommended, however these standalone versions are available for use. Instructions for each tool can be found in the readme files within each tool's respective folder.

Legal

Copyright Government of Canada 2024

Written by: National Microbiology Laboratory, Public Health Agency of Canada

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this work except in compliance with the License. You may obtain a copy of the License at:

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Contact

Shelley Peterson: Shelley.Peterson\@phac-aspc.gc.ca{.email}



phac-nml/wade documentation built on Sept. 13, 2024, 2:41 p.m.