denoise: Run the denoiser pipeline for a sequence read.

Description Usage Arguments Details Value Examples

View source: R/pipeline.r

Description

This function runs the complete denoising pipeline for a given input sequence and its corresponding name and phred scores. The default behaviour is set to interface with fastq files (standard output for most sequencers).

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
denoise(x, ...)

## Default S3 method:
denoise(x, ..., name = character(), phred = NULL,
  dir_check = TRUE, double_pass = TRUE, min_match = 100,
  max_inserts = 400, censor_length = 7, added_phred = "*",
  adjust_limit = 5, ambig_char = "N", to_file = FALSE,
  keep_flanks = TRUE, keep_phred = TRUE, outformat = "fastq",
  terminate_rejects = TRUE, outfile = NULL, phred_placeholder = "#",
  aa_check = TRUE, trans_table = 0, frame_offset = 0,
  append = TRUE)

Arguments

x

a DNA sequence string.

...

additional arguments to be passed between methods.

name

an optional character string. Identifier for the sequence.

phred

an optional character string. The phred score string corresponding to the nucleotide string. If passed then the input phred scores will be modified along with the nucleotides and carried through to the sequence output. Default = NULL.

dir_check

A boolean indicating if both the forward and reverse compliments of a sequence should be checked against the PHMM. Default is TRUE.

double_pass

A boolean indicating if a second pass through the Viterbi algorithm should be conducted for sequences that had leading nucleotides not matching the PHMM. This improves the accurate establishment of reading frame and will reduce false rejections by the amino acid check, but this comes at a cost of additional processing time. Default is TRUE.

min_match

The minimum number of sequential matches to the PHMM for a sequence to be denoised. Otherwise flag the sequence as a reject.

max_inserts

The maximum number of sequention insert states occuring in a sequence (including the flanking regions). If this number is exceeded than the entire read will be discarded if terminate_rejects = TRUE. Default is 400.

censor_length

the number of base pairs in either direction of a PHMM correction to convert to placeholder characters. Default is 7.

added_phred

The phred character to use for characters inserted into the original sequence.

adjust_limit

the maximum number of corrections that can be applied to a sequence read. If this number is exceeded then the entire read is rejected. Default is 3.

ambig_char

The character to use for ambigious positions in the sequence that is output to the file. Default is N.

to_file

Boolean indicating whether the sequence should be written to a file. Default is TRUE.

keep_flanks

Should the regions of the input sequence outside of the barcode region be readded to the denoised sequence prior to outputting to the file. Options are TRUE, FALSE and 'right'. The 'right' option will keep the trailing flank but remove the leading flank. Default is TRUE. False will lead to only the denoised sequence for the 657bp barcode region being output to the file.

keep_phred

Should the original PHRED scores be kept in the output? Default is TRUE.

outformat

The format of the output file. Options are fasta or fastq (default) format.

terminate_rejects

Boolean indicating if analysis of sequences that fail to meet phred quality score or path match thresholds should be terminated early (prior to sequence adjustment and writing to file). Default it true.

outfile

The name of the file to output the data to. Default filenames are respectively: denoised.fasta or denoised.fastq.

phred_placeholder

The character to input for the phred score line. Default is '#'. Used with write_fastq and keep_phred == FALSE only.

aa_check

Boolean indicating whether the amino acid sequence should be generated and assessed for stop codons. Default = TRUE.

trans_table

The translation table to use for translating from nucleotides to amino acids. Default is 0, meaning that censored translation is performed (amigious codons ignored). Used only when aa_check = TRUE.

frame_offset

The offset to the reading frame to be applied for translation. By default the offset is zero, so the first character in the framed sequence is considered the first nucelotide of the first codon. Passing frame_offset = 1 would offset the sequence by one and therefore make the second character in the framed sequence the the first nucelotide of the first codon. Used only when aa_check = TRUE.

append

Should the denoised sequence be appended to the output file?(TRUE) Or should the sequence overwrite the output file?(FALSE) Default is TRUE.

Details

Since the pipeline is designed for recieving or outputting either fasta or fastq data, this function is hevaily paramaterized. Note that not all paramaters will affect all use cases (i.e. if your outformat is to a fasta file, then the phred_placeholder paramater is ignored as this option only pertains to fastq outputs). The user is encouraged to read the vignette for a detailed walkthrough of the denoiser pipeline that will help identify the paramaters that relate to their given needs.

Value

a class object of code"DNAseq"

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
# Denoise example sequence with default paramaters.
ex_data = denoise(example_nt_string_errors, 
                  name = 'example_sequence_1', 
                  keep_phred = FALSE, 
                  to_file = FALSE)

#fastq data from a file
#previously run
fastq_example_file = system.file('extdata/coi_sequel_data_subset.fastq', 
                                 package = 'debar')
data = read_fastq(fastq_example_file)
#denoise the first sequence in the file
#use a custom censor length and no amino acid check
dn_dat_1 = denoise(x = data$sequence[[1]], 
                    name = data$header[[1]], 
                    phred = data$quality[[1]], 
                    censor_length = 11, 
                    aa_check = FALSE, 
                    to_file = FALSE)

debar documentation built on Jan. 11, 2020, 9:31 a.m.