Waterfall-class: Class Waterfall
In GenVisR: Genomic Visualizations in R

Description Usage Arguments Details Slots See Also Examples

An S4 class for the waterfall plot object, under development!!!

Waterfall(
  input,
  labelColumn = NULL,
  samples = NULL,
  coverage = NULL,
  mutation = NULL,
  genes = NULL,
  mutationHierarchy = NULL,
  recurrence = NULL,
  geneOrder = NULL,
  geneMax = NULL,
  sampleOrder = NULL,
  plotA = c("frequency", "burden", NULL),
  plotATally = c("simple", "complex"),
  plotALayers = NULL,
  plotB = c("proportion", "frequency", NULL),
  plotBTally = c("simple", "complex"),
  plotBLayers = NULL,
  gridOverlay = FALSE,
  drop = TRUE,
  labelSize = 5,
  labelAngle = 0,
  sampleNames = TRUE,
  clinical = NULL,
  sectionHeights = NULL,
  sectionWidths = NULL,
  verbose = FALSE,
  plotCLayers = NULL
)

`input`	Object of class `MutationAnnotationFormat`, `VEP`, `GMS`, or alterantively a data frame/data table with column names "sample", "gene", "mutation".
`labelColumn`	Character vector specifying a column name from which to extract label names for cells, must be a column within the object passed to input.
`samples`	Character vector specifying samples to plot. If not NULL all samples in "input" not specified with this parameter are removed. Further samples specified but not present in the data will be added.
`coverage`	Integer specifying the size in base pairs of the genome covered by sequence data from which mutations could be called. Required for the mutation burden sub-plot (see details and vignette). Optionally a named vector of integers corresponding to each sample can be supplied for more accurate calculations.
`mutation`	Character vector specifying mutations to keep, if defined mutations not supplied are removed from the main plot.
`genes`	Character vector specifying genes to keep, if not "NULL" all genes not specified are removed. Further genes specified but not present in the data will be added.
`mutationHierarchy`	data.table/data.frame object with rows specifying the order of mutations from most to least deleterious and containing column names "mutation" and "color". Used to change the default colors and/or to give priority to a mutation for the same gene/sample (see details and vignette).
`recurrence`	Numeric value between 0 and 1 specifying a mutation recurrence cutoff. Genes which do not have mutations in the proportion of samples defined are removed.
`geneOrder`	Character vector specifying the order in which to plot genes.
`geneMax`	Integer specifying the maximum number of genes to be plotted. Genes kept will be choosen based on the reccurence of mutations in samples, unless geneOrder is specified.
`sampleOrder`	Character vector specifying the order in which to plot samples.
`plotA`	String specifying the type of plot for the top sub-plot, one of "burden", "frequency", or NULL for a mutation burden (requires coverage to be specified), frequency of mutations, or no plot respectively.
`plotATally`	String specifying one of "simple" or "complex" for a simplified or complex tally of mutations respectively.
`plotALayers`	list of ggplot2 layers to be passed to the plot.
`plotB`	String specifying the type of plot for the left sub-plot, one of "proportion", "frequency", or NULL for a plot of gene proportions frequencies , or no plot respectively.
`plotBTally`	String specifying one of "simple" or "complex" for a simplified or complex tally of genes respectively.
`plotBLayers`	list of ggplot2 layers to be passed to the plot.
`gridOverlay`	Boolean specifying if a grid should be overlayed on the waterfall plot. This is not recommended for large cohorts.
`drop`	Boolean specifying if mutations not in the main plot should be dropped from the legend. If FALSE the legend will be based on mutations in the data before any subsets occur.
`labelSize`	Integer specifying the size of label text within each cell if "labelColumn" has been specified.
`labelAngle`	Numeric value specifying the angle of label text if "labelColumn" has been specified.
`sampleNames`	Boolean specifying if samples should be labeled on the x-axis of the plot.
`clinical`	Object of class `Clinical`, used for adding a clinical data subplot.
`sectionHeights`	Numeric vector specifying relative heights of each plot section, should sum to one. Expects a value for each section.
`sectionWidths`	Numeric vector specifying relative heights of each plot section, should sum to one. Expects a value for each section.
`verbose`	Boolean specifying if status messages should be reported.
`plotCLayers`	list of ggplot2 layers to be passed to the main plot.

'Waterfall()' is designed to visualize the mutations seen in a cohort. As input the function takes an object of class MutationAnnotationFormat, VEP, or GMS. Alternatively a user can provide either of data.table or data.frame as long as the column names of those objects include "sample", "gene", and "mutation". When supplying an object of class data.table or data.frame the user must also provide input to the 'mutationHierarchy' parameter.

The 'mutationHierarchy' parameter expects either a data.table or data.frame object containing the column names "mutation" and "color". Each row should match a mutation type given in the param 'input'. The 'mutationHierarchy' parameter is intended to both change the colors of mutations on the plot and to set a hierarchy of which mutation type to plot if there are more than 1 mutation types for the same gene/sample combination.

PlotA: gtable object for the top sub-plot.
PlotB: gtable object for the left sub-plot.
PlotC: gtable object for the main plot.
PlotD: gtable object for the bottom sub-plot.
Grob: gtable object for the arranged plot.
primaryData: data.table object storing the primary data, should have column names sample, gene, mutation, label.
simpleMutationCounts: data.table object storing simplified mutation counts, should have column names sample, mutation, Freq, mutationBurden
complexMutationCounts: data.table object storing mutation counts per mutation type should have column names sample, mutation, Freq, mutationBurden.
geneData: data.table object storing gene counts, should have column names gene, mutation, count.
ClinicalData: data.table object stroring the data used to plot the clinical sub-plot.
mutationHierarchy: data.table object storing the hierarchy of mutation type in order of most to least important and the mapping of mutation type to color. Should have column names mutation, color, and label.

MutationAnnotationFormat, VEP, GMS, Clinical

set.seed(426)

# create a data frame with required column names
mutationDF <- data.frame("sample"=sample(c("sample_1", "sample_2", "sample_3"), 10, replace=TRUE),
                         "gene"=sample(c("egfr", "tp53", "rb1", "apc"), 10, replace=TRUE),
                         "mutation"=sample(c("missense", "frame_shift", "splice_site"), 10, replace=TRUE))

# set the mutation hierarchy (required for DF)
hierarchyDF <- data.frame("mutation"=c("missense", "frame_shift", "slice_site"),
                          "color"=c("#3B3B98", "#BDC581", "#6A006A"))
                          
# Run the Waterfall Plot and draw the output
Waterfall.out <- Waterfall(mutationDF, mutationHierarchy=hierarchyDF)
drawPlot(Waterfall.out)