Waterfall-class: Class Waterfall

Description Usage Arguments Details Slots See Also Examples

Description

An S4 class for the waterfall plot object, under development!!!

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
Waterfall(
  input,
  labelColumn = NULL,
  samples = NULL,
  coverage = NULL,
  mutation = NULL,
  genes = NULL,
  mutationHierarchy = NULL,
  recurrence = NULL,
  geneOrder = NULL,
  geneMax = NULL,
  sampleOrder = NULL,
  plotA = c("frequency", "burden", NULL),
  plotATally = c("simple", "complex"),
  plotALayers = NULL,
  plotB = c("proportion", "frequency", NULL),
  plotBTally = c("simple", "complex"),
  plotBLayers = NULL,
  gridOverlay = FALSE,
  drop = TRUE,
  labelSize = 5,
  labelAngle = 0,
  sampleNames = TRUE,
  clinical = NULL,
  sectionHeights = NULL,
  sectionWidths = NULL,
  verbose = FALSE,
  plotCLayers = NULL
)

Arguments

input

Object of class MutationAnnotationFormat, VEP, GMS, or alterantively a data frame/data table with column names "sample", "gene", "mutation".

labelColumn

Character vector specifying a column name from which to extract label names for cells, must be a column within the object passed to input.

samples

Character vector specifying samples to plot. If not NULL all samples in "input" not specified with this parameter are removed. Further samples specified but not present in the data will be added.

coverage

Integer specifying the size in base pairs of the genome covered by sequence data from which mutations could be called. Required for the mutation burden sub-plot (see details and vignette). Optionally a named vector of integers corresponding to each sample can be supplied for more accurate calculations.

mutation

Character vector specifying mutations to keep, if defined mutations not supplied are removed from the main plot.

genes

Character vector specifying genes to keep, if not "NULL" all genes not specified are removed. Further genes specified but not present in the data will be added.

mutationHierarchy

data.table/data.frame object with rows specifying the order of mutations from most to least deleterious and containing column names "mutation" and "color". Used to change the default colors and/or to give priority to a mutation for the same gene/sample (see details and vignette).

recurrence

Numeric value between 0 and 1 specifying a mutation recurrence cutoff. Genes which do not have mutations in the proportion of samples defined are removed.

geneOrder

Character vector specifying the order in which to plot genes.

geneMax

Integer specifying the maximum number of genes to be plotted. Genes kept will be choosen based on the reccurence of mutations in samples, unless geneOrder is specified.

sampleOrder

Character vector specifying the order in which to plot samples.

plotA

String specifying the type of plot for the top sub-plot, one of "burden", "frequency", or NULL for a mutation burden (requires coverage to be specified), frequency of mutations, or no plot respectively.

plotATally

String specifying one of "simple" or "complex" for a simplified or complex tally of mutations respectively.

plotALayers

list of ggplot2 layers to be passed to the plot.

plotB

String specifying the type of plot for the left sub-plot, one of "proportion", "frequency", or NULL for a plot of gene proportions frequencies , or no plot respectively.

plotBTally

String specifying one of "simple" or "complex" for a simplified or complex tally of genes respectively.

plotBLayers

list of ggplot2 layers to be passed to the plot.

gridOverlay

Boolean specifying if a grid should be overlayed on the waterfall plot. This is not recommended for large cohorts.

drop

Boolean specifying if mutations not in the main plot should be dropped from the legend. If FALSE the legend will be based on mutations in the data before any subsets occur.

labelSize

Integer specifying the size of label text within each cell if "labelColumn" has been specified.

labelAngle

Numeric value specifying the angle of label text if "labelColumn" has been specified.

sampleNames

Boolean specifying if samples should be labeled on the x-axis of the plot.

clinical

Object of class Clinical, used for adding a clinical data subplot.

sectionHeights

Numeric vector specifying relative heights of each plot section, should sum to one. Expects a value for each section.

sectionWidths

Numeric vector specifying relative heights of each plot section, should sum to one. Expects a value for each section.

verbose

Boolean specifying if status messages should be reported.

plotCLayers

list of ggplot2 layers to be passed to the main plot.

Details

'Waterfall()' is designed to visualize the mutations seen in a cohort. As input the function takes an object of class MutationAnnotationFormat, VEP, or GMS. Alternatively a user can provide either of data.table or data.frame as long as the column names of those objects include "sample", "gene", and "mutation". When supplying an object of class data.table or data.frame the user must also provide input to the 'mutationHierarchy' parameter.

The 'mutationHierarchy' parameter expects either a data.table or data.frame object containing the column names "mutation" and "color". Each row should match a mutation type given in the param 'input'. The 'mutationHierarchy' parameter is intended to both change the colors of mutations on the plot and to set a hierarchy of which mutation type to plot if there are more than 1 mutation types for the same gene/sample combination.

Slots

PlotA

gtable object for the top sub-plot.

PlotB

gtable object for the left sub-plot.

PlotC

gtable object for the main plot.

PlotD

gtable object for the bottom sub-plot.

Grob

gtable object for the arranged plot.

primaryData

data.table object storing the primary data, should have column names sample, gene, mutation, label.

simpleMutationCounts

data.table object storing simplified mutation counts, should have column names sample, mutation, Freq, mutationBurden

complexMutationCounts

data.table object storing mutation counts per mutation type should have column names sample, mutation, Freq, mutationBurden.

geneData

data.table object storing gene counts, should have column names gene, mutation, count.

ClinicalData

data.table object stroring the data used to plot the clinical sub-plot.

mutationHierarchy

data.table object storing the hierarchy of mutation type in order of most to least important and the mapping of mutation type to color. Should have column names mutation, color, and label.

See Also

MutationAnnotationFormat, VEP, GMS, Clinical

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
set.seed(426)

# create a data frame with required column names
mutationDF <- data.frame("sample"=sample(c("sample_1", "sample_2", "sample_3"), 10, replace=TRUE),
                         "gene"=sample(c("egfr", "tp53", "rb1", "apc"), 10, replace=TRUE),
                         "mutation"=sample(c("missense", "frame_shift", "splice_site"), 10, replace=TRUE))

# set the mutation hierarchy (required for DF)
hierarchyDF <- data.frame("mutation"=c("missense", "frame_shift", "slice_site"),
                          "color"=c("#3B3B98", "#BDC581", "#6A006A"))
                          
# Run the Waterfall Plot and draw the output
Waterfall.out <- Waterfall(mutationDF, mutationHierarchy=hierarchyDF)
drawPlot(Waterfall.out)

GenVisR documentation built on Dec. 28, 2020, 2 a.m.