primerSeqErr report for r data_source_name

fig.cap1 <- 'Figure: Substitution error rate by quality score'

figure_path <- paste('figure_', result$config$op_full_name, '/', sep = '')
#, fig.path=figure_path
error_parameters <- result$metrics$error_parameters[[data_source_name]]
subs_rate <- error_parameters$subs_rate
del_rate <- error_parameters$del_rate
ins_rate <- error_parameters$ins_rate
rate_mat <- error_parameters$rate_mat
subs_by_qual <- error_parameters$subs_by_qual

options(scipen=99)
p1 <- 
ggplot(subs_by_qual, aes(x=num_qual, y = rate)) +
  geom_point() +
  geom_errorbar(aes(ymax = UB, ymin = LB)) +
  ylab('Error Rate') +
  xlab('Quality Score')

Substitution Rate: r round(subs_rate,5) - Benchmark: 1 in 100 = 0.01

Deletion Rate: r round(del_rate,5) - Benchmark: 1 in 50 000 = 0.00002

Insertion Rate: r round(ins_rate,5) - Benchmark: 1 in 20 000 = 0.00005

More more detail on the benchmarks refer to an article named: "Insight into biases and sequencing errors for amplicon sequencing with the Illumina MiSeq platform" in Nucleic acids research in 2015 by Schirmer et al.

Note: The insertion and deletion rates are consistently 10 times higher than the benchmark - this is probably an issue with the alignment step rather than an accurate signal for increased indels. Fixing this is on the todo list.

print(format_table(rate_mat))
print(p1)
cat('\n\n---\n\n')


HIVDiversity/MotifBinner2 documentation built on May 6, 2019, 6:44 p.m.