knitr::opts_chunk$set(echo = TRUE)
This analytical workflow was designed to perform the study described in the manuscript. Multiple stutter types were characterized based on MPS data from single source samples, using the Verogen ForenSeq™ DNA Signature Prep kit. A beta regression model was used to investigate the relationship between the stutter proportion and candidate explanatory variables.
In this markdown document, it is explained step by step how to create and prepare a data table with all the stutter information needed for the second part: statistical analysis with beta regression.
DNA sequences were processed as follows:
FastQ files were retrieve from the UAS server.
STRait $Razor^1$ v3.0 was used for allele identification. Source code and details on this software can be obtained from github in: https://github.com/Ahhgust/STRaitRazor
STRait Razor output txt file was then processed with lusSTR v0.3. This is a bioinformatic tool written in Python that converts DNA sequences from NGS data into different formats. Also, the analysis returns different allele designations. The tool and more information is available in https://github.com/bioforensics/lusSTR.
An example of a final output txt file from lusSTR "lusSTR_ExampleFile.txt" can be found on the package folder ins/extdata. To access it:
lusSTR_ExampleFile <- system.file("extdata", "lusSTR_ExampleFile.txt", package = "brackconv") # Or: library(brackconv) path = path.package("brackconv") #get package install folder filePath = paste0("inst/extdata/lusSTR_ExampleFile.txt") # lusSTR file path lusSTR_ExampleFile = paste0(path, filePath) # create object with file
library(brackconv) # projname = "MPSproto_stutterCharPaper" # path = path.package("MPSproto") #get package install folder # datfold = paste0(path,"/",projname,"/data") # calibfile = paste0(datfold,"/calibrated_",projname) #file name of the calibrated object # datfn_read = paste0(datfold,"/data_sumPerMarker.csv") #used for marker efficiency calibration # datfn_stuttermodel = paste0(datfold,"/calibrated_StutterCoef.csv") #the stutter model is already calibrated # datfn_noise = paste0(datfold,"/data_noise.csv") #used for noise model calibration (part 1) # datfn_stutterFew = paste0(datfold,"/data_excludedStutters.csv") #used for noise model calibration (part 2)
readlusSTRFile <- function(pathToYourFile,yourlusSTRFile) { initial_df <- read.table(paste0(path,lusSTRFile), sep="\t", header = TRUE) # Create new variable with only digits from SampleID, create Run variable with values 3 and 4: initial_df <- initial_df %>% mutate(SampleNumber = as.numeric(gsub("[^[:digit:]]", "", SampleID))) %>% select(-SampleNumber) }
# Parameter settings MinimumReads <- 10 Hb_Threshold <- 30 # Run function and create list with output data frames dfList_initialdf <- setNames(getInitialDataframe(initial_df,MinimumReads,Hb_Threshold), c("initial_df2","noise_underMinimumReads")) # setNames define the names of the data frames inside of the list list2env(dfList_initialdf,.GlobalEnv) # list2env: From a named list x, create an environment containing all list components as objects, or "multi-assign" from x into a pre-existing environment # Allele 2 drop outs quality check dfList_allele2DropOuts <- setNames(getAllele2Drops(initial_df2, Hb_Threshold), c("possibleAllele2_DropOuts","initial_df2")) list2env(dfList_allele2DropOuts,.GlobalEnv)
# Stutter table dfListStutterTab <- setNames(initialStutterTable(initial_df2), c("stutterInfoTable","included_SequencesDF","excluded_SequencesDF")) list2env(dfListStutterTab,.GlobalEnv)
# Parameter settings Hb_min <- Hb_Threshold Hb_max <- 50 # New stutter table dfListFinalStutterTab <- setNames(getStuttersAllele2Pos(stutterInfoTable,initial_df2,Hb_min, Hb_max), c("allele1StutterPosMin_MaxHb","stutterInfoTable_Final", "initial_df2")) list2env(dfListFinalStutterTab,.GlobalEnv)
dfList_Variables <- setNames(AddVariables(initial_df2,included_SequencesDF,excluded_SequencesDF), c("TotalExcluded","TotalIncluded","ExludedPriori","initial_df2")) list2env(dfList_Variables,.GlobalEnv)
# Stutter filtering. Obtains new data frames: dfList_StutterFiltering <- setNames(selectStutters(stutterInfoTable_Final), c("ambiguous_stutter","unambiguous_stutter", "infrequent_stutters", "frequent_stutters")) list2env(dfList_StutterFiltering,.GlobalEnv)
final_StutterTable <- getLUS(frequent_stutters)
final_StutterTable <- AddComplexATContVar(final_StutterTable,STR_complexity_df)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.