Description Usage Arguments Details Value Author(s) References See Also Examples
Text parsing functions for reading sequences in the FASTA or FASTQ format into R.
1 2 3 4 5 6 7 8 | readFASTQ(file = file.choose(), bin = TRUE)
readFASTA(
file = file.choose(),
bin = TRUE,
residues = "DNA",
alignment = FALSE
)
|
file |
the name of the FASTA or FASTQ file from which the sequences are to be read. |
bin |
logical indicating whether the returned object should be in binary/raw byte format (i.e. "DNAbin" or "AAbin" objects for nucleotide and amino acid sequences, respectively). If FALSE a vector of named character strings is returned. |
residues |
character string indicating whether the sequences to
be read are composed of nucleotides ("DNA"; default) or amino acids ("AA").
Only required for |
alignment |
logical indicating whether the sequences represent
an alignment to be parsed as a matrix.
Only applies to |
The FASTQ convention is somewhat ambiguous with several slightly different interpretations appearing in the literature. For now, this function supports the Illumina convention for FASTQ files, where each sequence and its associated metadata occupies four line of the text file as follows : (1) the run and cluster metadata preceded by an @ symbol; (2) the sequence itself in capitals without spaces; (3) a single "+" symbol; and (4) the Phred quality scores from 0 to 93 represented as ASCII symbols. For more information on this convention see the Illumina help page here .
For optimal memory efficiency and compatibility with other functions,
it is recommended to store sequences in raw byte format
as either DNAbin or AAbin objects.
For FASTQ files when bin = TRUE, a vector of quality scores
(also in raw-byte format) is attributed to each sequence.
These can be converted back to numeric quality scores with as.integer
.
For FASTQ files when bin = FALSE the function returns a vector with each
sequence as a concatenated string with a similarly concatenated quality attribute
comprised of the same ASCII metacharacters used in the FASTQ coding scheme.
This function can take a while to process larger FASTQ files, a multithreading option may be available in a future version.
Either a vector of character strings (if bin = FALSE), or a list of raw ("DNAbin" or "AAbin") vectors, with each element having a "quality" attribute.
Shaun Wilkinson
Bokulich NA, Subramanian S, Faith JJ, Gevers D, Gordon JI, Knight R, Mills DA, Caporaso JG (2013) Quality-filtering vastly improves diversity estimates from Illumina amplicon sequencing. Nat Methods, 1, 57-59.
Illumina help page: https://help.basespace.illumina.com/articles/descriptive/fastq-files/
writeFASTQ
and writeFASTA
for writing sequences to text in the FASTA or FASTQ format.
See also read.dna
in the ape
package.
1 2 3 4 5 6 7 8 | ## download and extract example FASTQ file to temporary directory
td <- tempdir()
URL <- "https://www.dropbox.com/s/71ixehy8e51etdd/insect_tutorial1_files.zip?dl=1"
dest <- paste0(td, "/insect_tutorial1_files.zip")
download.file(URL, destfile = dest, mode = "wb")
unzip(dest, exdir = td)
x <- readFASTQ(paste0(td, "/COI_sample2.fastq"))
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.