View source: R/summarize_quality_scores.R
| summarize_quality_scores | R Documentation |
For each base pair position, summarizes read length, Phred quality score, and the cumulative probability that all bases were called correctly.
summarize_quality_scores(
forward_files,
reverse_files,
n.total = 10000,
n.each = ceiling(n.total/length(forward_files)),
seed = NULL,
FUN = mean,
...
)
forward_files |
A character vector of file paths to FASTQ files containing forward DNA sequence reads. |
reverse_files |
A character vector of file paths to FASTQ files containing reverse DNA sequence reads. |
n.total |
Numeric. The number of read pairs to randomly sample from the input FASTQ files. Ignored if |
n.each |
Numeric. The number of read pairs to randomly sample from each pair of input FASTQ files. The default is |
seed |
Numeric. The seed for randomly sampling read pairs. If |
FUN |
A function to compute summary statistics of the quality scores. The default is |
... |
Additional arguments passed to |
For each combination of base pair position and read direction, calculates summary statistics of read length, Phred quality score, and the cumulative probability that all bases were called correctly. The cumulative probability is calculated from the first base pair up to the current position. Quality scores are assumed to be encoded in Sanger format. Read pairs are selected by randomly sampling up to n.each read pairs from each pair of input FASTQ files. By default, n.each is derived from n.total, and n.total will be ignored if n.each is provided. By default, mean is used to compute the summary statistics, but the user may provide another summary function instead (e.g., median). Functions which return multiple summary statistics are also supported (e.g., summary and quantile). Arguments in ... are passed to the summary function.
Returns a data frame containing summary statistics of read length and quality score at each base pair position. The returned data frame contains the following fields:
Direction: The read direction (i.e., "Forward" or "Reverse").
Position: The base pair position.
Length: The summary statistic(s) of read lengths. If FUN returns multiple summary statistics, then a matrix of the summary statistics will be stored in this field, which can be accessed with $Length.
Score: The summary statistic(s) of Phred quality scores. If FUN returns multiple summary statistics, then a matrix of the summary statistics will be stored in this field, which can be accessed with $Score.
Probability: The summary statistic(s) of the cumulative probability that all bases were called correctly. If FUN returns multiple summary statistics, then a matrix of the summary statistics will be stored in this field, which can be accessed with $Probability.
decode_quality_scores for decoding quality scores.
# Get example forward FASTQ files.
forward_files<-system.file("extdata",
paste0("S0",1:3,"F.fastq"),
package="LocaTT",
mustWork=TRUE)
# Get example reverse FASTQ files.
reverse_files<-system.file("extdata",
paste0("S0",1:3,"R.fastq"),
package="LocaTT",
mustWork=TRUE)
# Summarize quality scores.
summarize_quality_scores(forward_files,reverse_files)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.