SEROTYPE_pipeline: Serotyping pipeline for WGS assemblies December 21 2023,...

View source: R/SerotypeR.R

SEROTYPE_pipelineR Documentation

Serotyping pipeline for WGS assemblies December 21 2023, Walter Demczuk & Shelley Peterson

Description

Takes Organism, Sample Number, Locus, to query a contig.fasta file

Usage

SEROTYPE_pipeline(Org_id, SampleNo, LocusID, Test_id, curr_work_dir)

Arguments

Org_id

Organism to query: GAS, PNEUMO or GONO

SampleNo

Sample number associated with contig.fasta file

LocusID

The locus to query, or enter "list" to use a list of alleles

Test_id

AMR, TOXINS, VIRULENCE, NGSTAR, use MASTER for 16S id

curr_work_dir

Start up directory from pipeline project to locate system file structure

Details

How it works:

Pneumococcus serotyping based on PneumoCaT and SeroBA libraries. Copies contig assembly from Warehouse drive BLAST's copied assembly vs. reference data fasta of whole CPS regions for each serotype If CPS region found that requires snp based analysis, queries each reference gene for that serogroup vs. loci list Interprets snps to serotype level The algorithm first determines the serogroup by blasting CPS_reference Each locus in the locus list (temp folder) associated with that serogroup will then be blasted There are 4 stages of locus analysis: 1)result = POS/NEG (presence/absence) 2)pseudo = pseudogene (disrupted/intact) 3)mutations = serotype determining amino acid substitutions as listed in locus_mutations 4)allele = entire gene sequence match of conserved serotype determining genes as found in the allele_lookup folders

The relevant result type is listed in the locus list table (temp folder) As each result is evaluated for each relevant locus result, the serotype lookup table locus_lookup (temp folder) is filtered If a single row is left in the lookup table by the end of the sample, that is the serotype Otherwise a Fail<serogroup> answer will be returned Supporting files like blast outputs, results, extracted fasta sequences can be found in the user's local output or temp folders

Value

A table frame containing the results of the query


phac-nml/wade documentation built on March 16, 2024, 8:32 a.m.