In HIVDiversity/MotifBinner2: Processes NGS Primer ID Datasets

primerSeqErr report for `r data_source_name`

fig.cap1 <- 'Figure: Substitution error rate by quality score'

figure_path <- paste('figure_', result$config$op_full_name, '/', sep = '')
#, fig.path=figure_path

error_parameters <- result$metrics$error_parameters[[data_source_name]]
subs_rate <- error_parameters$subs_rate
del_rate <- error_parameters$del_rate
ins_rate <- error_parameters$ins_rate
rate_mat <- error_parameters$rate_mat
subs_by_qual <- error_parameters$subs_by_qual

options(scipen=99)
p1 <- 
ggplot(subs_by_qual, aes(x=num_qual, y = rate)) +
  geom_point() +
  geom_errorbar(aes(ymax = UB, ymin = LB)) +
  ylab('Error Rate') +
  xlab('Quality Score')

Substitution Rate: r round(subs_rate,5) - Benchmark: 1 in 100 = 0.01

Deletion Rate: r round(del_rate,5) - Benchmark: 1 in 50 000 = 0.00002

Insertion Rate: r round(ins_rate,5) - Benchmark: 1 in 20 000 = 0.00005

More more detail on the benchmarks refer to an article named: "Insight into biases and sequencing errors for amplicon sequencing with the Illumina MiSeq platform" in Nucleic acids research in 2015 by Schirmer et al.

Note: The insertion and deletion rates are consistently 10 times higher than the benchmark - this is probably an issue with the alignment step rather than an accurate signal for increased indels. Fixing this is on the todo list.