Description Usage Arguments Details Value See Also
The function trims poor-quality bases and unknown bases from the ends of the sequences. Any reads which are too short, or contain any unknown bases (N), are removed from the file.
1 2 | filterBadSeqs(dataFile, minlength = 30, Phred = 25, blockSize = 1e+08,
readerBlockSize = 1e+05, mc.cores = 1)
|
dataFile |
An R data frame with the data to be processed. The R object is a standard format, and must contain the following headings: File, PE, Sample, Replicate, FilteredFile. More information about the file is available at |
Phred |
An integer which specifies Phred (ascii) quality score. Any two consecutive nucleotides with a quality score lower than this threshold will be discarded. Default score is 30. |
blockSize |
An integer which specifies the number of reads to be read at a time when processing. Default is 1e8. |
mc.cores |
The number of cores to use when parallelizing. Default is 1 (i.e. no parallelisation) |
minLength |
An integer which specifies the minimum length for a read. Reads shorter than this length will be discarded. Default is 30 nucleotides. |
readBlockSize |
An integer which specifies the number of bytes (characters) to be read at one time. Smaller |
The function should be run in the working directory, where all fastq files are found.
filterBadSeqs
iterates over each file specified in the "datafile", and filters and trims the reads for quality. This is done by iterating over chunks of reads in the fastq files at a time. The size of the chunks are decided by the "blockSize" and "readerBlockSize" parameters. More information about how this is done is available in the ShortRead
package.
* it removes any trailing or leadining N's from each sequence,
* it removes any reads wich still contain N's,
* it trims the trailing end when it finds a minimum of 2 poor-quality bases in a window of 5. The threshold for poor quality is determined by the parameter "Phred", where the Phred score is logarithmically related to the probability of errors at each base,
* it removes any reads shorter than a minimum length (this is specified by the "minLength" parameter).
The function produces a new set of fastq files which have been filtered. The user must specify in the "FILTEREDFILE" column of the data file the output file. The user may specify the same output file for multiple input files - this will append new output to existing files, thereby allowing de-multiplexing of samples which have been run on different lanes.
A new R object (QualityFilterResults
) is created, which contains pointers to the input and output fastq files, as well as a summary of how many reads have been trimmed or removed.
A data frame summarising for each file how many sequences have been trimmed or removed.
https://en.wikipedia.org/wiki/Phred_quality_score for more about quality scores.
ShortRead
for more information about blockSize
(n) and readerBlockSize
.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.