require(ggfastqc) knitr::opts_chunk$set( comment = "#", error = FALSE, tidy = FALSE, cache = FALSE, collapse=TRUE) # options(datatable.auto.index=FALSE)
The ggfastqc package allows quick summary plots of
FastQC
reports from Next Generation Sequencing data.
There are four functions for plotting various summary statistics:
plot_gc_stats() -- GC percentage
plot_dup_stats() -- Sequence duplication percentage
plot_total_sequence_stats() -- Total sequenced reads
plot_sequence_quality() -- Per base sequence quality
The function fastqc() loads the entire report as an object of class fastqc
which can be used to generate any additional plots that are required.
The fastqc() function loads data from FastQC generated reports via the
argument sample_info which should be a file containing info about samples.
The file should contain at least these three columns:
sample -- contains the sample name.
pair -- in case of paired end reads, 1 or 2 corresponding to first and
second pair, and in case of single end reads, NA.
path -- full path to the fastqc summary report (.txt file) for each sample.
If just the file name (.txt) is provided, it is assumed that the file is
in the same folder as the input file provided to sample_info argument.
It can also optionally contain a group column. If present, the plots
generated will take it into account and color / facet accordingly.
It is recommended to have a group column.
path = system.file("tests/fastqc-sample", package="ggfastqc") ann_file = file.path(path, "annotation.txt")
path = "./" ann_file = file.path(path, "annotation.txt")
Here's how an annotation file might look like.
data.table::fread(ann_file)
fastqc() to load reportsobj = fastqc(ann_file) obj class(obj)
obj is an object of class fastqc.
Each element of value is itself a data.table.
plot_gc_stats() provides a plot of GC percentage in each of the samples. By
default the argument interactive = TRUE, in which case it will try to plot a
jitter plot using the plotly package. Jitter plots are possible only when
interactive = TRUE.
The other two types of plots possible are point and bar. Plots can be
interactive or static for these two types of plots. If static, the function
returns a ggplot2 plot.
plotlyplot_gc_stats(sample=obj)
pl = plot_gc_stats(sample=obj) ll = htmltools::tagList() ll[[1L]] = plotly::as.widget(pl) ll
Note that the facet is automatically named sample which was the name
provided to the input argument. More than one such fastqc object can be
provided to a single function to generate facetted plot as shown above, for
e.g., plot_gc_stats(s1 = obj1, s2 = obj2).
Using interactive=FALSE would result in a static ggplot2 plot, but jitter
geom is not possible then.
ggplot2plot_gc_stats(sample=obj, interactive=FALSE, geom="point") # or "bar"
plot_dup_stats() provides a plot of total reads sequenced. The
usage is also identical to plot_gc_stats.
plotlyplot_dup_stats(sample=obj)
pl = plot_dup_stats(sample=obj) ll = htmltools::tagList() ll[[1L]] = plotly::as.widget(pl) ll
ggplot2plot_dup_stats(sample=obj, interactive=FALSE, geom="point") # or "bar"
plot_total_sequence_stats() provides a plot of total reads sequenced. The
usage is also identical to plot_gc_stats.
plotlyplot_total_sequence_stats(sample=obj)
pl = plot_total_sequence_stats(sample=obj) ll = htmltools::tagList() ll[[1L]] = plotly::as.widget(pl) ll
ggplot2plot_total_sequence_stats(sample=obj, interactive=FALSE, geom="bar") # or "point"
plot_sequence_quality() provides a plot of per base sequence quality. The only
geom implemented is line. Both interactive and non-interactive plots are
possible, as shown below.
plotlyplot_sequence_quality(sample=obj)
pl = plot_sequence_quality(sample=obj) ll = htmltools::tagList() ll[[1L]] = plotly::as.widget(pl) ll
ggplot2plot_sequence_quality(sample=obj, interactive=FALSE)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.