read_fastq | R Documentation |
Efficiently load headers, sequences & qualities from a fastq file.
read_fastq(file,
include_headers = TRUE,
include_sequences = TRUE,
include_qualities = TRUE,
include_phred_scores = FALSE,
include_error_probs = FALSE,
truncate_headers_at = NULL,
phred_offset = NULL,
max_sequences = Inf,
max_lines = Inf)
file |
A character, path to the input fastq file. This file may be gzipped with extension ".gz". |
include_headers |
Logical, whether to load the headers. If you don't need the headers you can set this to |
include_sequences |
Logical, whether to load the sequences. If you don't need the sequences you can set this to |
include_qualities |
Logical, whether to load the raw qualities, encoded as ASCII characters. If you don't need the raw qualities you can set this to |
include_phred_scores |
Logical, whether to compute and return the Phred quality scores, in the form of integers. These contain the same information as the raw qualities, but converted from character representation to integer scores. Also see option |
include_error_probs |
Logical, whether to compute and return the nominal error probabilities, based on the qualities. The nominal error probability of each nucleobase is computed as |
truncate_headers_at |
Optional character, needle at which to truncate headers. Everything at and after the first instance of the needle will be removed from the headers. |
phred_offset |
Optional integer, Phred offset to assume for converting raw quality characters to Phred scores. If |
max_sequences |
Optional integer, maximum number of sequences to load. Note that in the case of a gzipped input file the whole file is temporarily decompressed (up to |
max_lines |
Optional integer, maximum number of lines to load. Any trailing sequence truncated due to this limit will be discarded.
In contrast to |
This function is a fast and simple fastq loader. It can be used to load entire files into memory, or to only sample a small portion of sequences without reading the entire file (using max_lines
).
A named list with the following elements:
success |
Logical, indicating whether the file was loaded successfully. If FALSE, then an error message will be specified by the element |
headers |
Character vector, listing the loaded headers in the order encountered. Only included if |
sequences |
Character vector, listing the loaded sequences in the order encountered. Only included if |
qualities |
Character vector, listing the loaded raw qualities in the order encountered. Only included if |
phred_scores |
List of integer vectors, listing the loaded Phred scores in the order encountered. Hence, |
error_probs |
List of numeric vectors, listing the loaded error probabilities in the order encountered. Hence, |
Nlines |
Integer, number of lines encountered. |
Nsequences |
Integer, number of sequences loaded. |
Stilianos Louca
read_fasta
,
read_tree
## Not run:
# load a gzipped fastq file, considering only the first 1000 lines
fastq = read_fastq(file="mysequences.fastq.gz", max_lines=1000)
# print the first sequence and its error probabilities
cat(fastq$sequences[1])
print(fastq$error_probs[[1]])
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.