Construct a waterfall plot

Share:

Description

Given a data frame construct a water fall plot showing the mutation burden and mutation type on a gene and sample level.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
waterfall(x, mainRecurCutoff = 0, mainGrid = TRUE, mainXlabel = FALSE,
  main_geneLabSize = 8, mainLabelCol = NULL, mainLabelSize = 4,
  mainLabelAngle = 0, mainDropMut = FALSE, mainPalette = NULL,
  mainLayer = NULL, mutBurden = NULL, plotMutBurden = TRUE,
  coverageSpace = 44100000, mutBurdenLayer = NULL, clinData = NULL,
  clinLegCol = 1, clinVarOrder = NULL, clinVarCol = NULL,
  clinLayer = NULL, sampRecurLayer = NULL, plotGenes = NULL,
  geneOrder = NULL, plotSamples = NULL, sampOrder = NULL,
  maxGenes = NULL, rmvSilent = FALSE, fileType = "MAF",
  variant_class_order = NULL, out = "plot")

Arguments

x

Object of class data frame representing annotated mutations. The data frame supplied must have one of the following sets of column names ("Tumor_Sample_Barcode", "Hugo_Symbol", "Variant_Classification") for fileType="MAF", ("sample","gene_name","trv_type") for fileType="MGI" or ("sample", "gene", "variant_class") for fileType="Custom". This columns should represent samples in a cohort, gene with mutation, and the mutation type respectively.

mainRecurCutoff

Numeric value between 0 and 1 specifying a mutation recurrence cutoff. Genes which do not have mutations in the proportion os samples defined are removed.

mainGrid

Boolean specifying if a grid should be overlayed on the main plot. Not recommended if the number of genes or samples to be plotted is large.

mainXlabel

Boolean specifying whether to label the x-axis with sample names. Not recommended if the number of samples to be plotted is large.

main_geneLabSize

Intenger specifying the size of gene names displayed on the y-axis.

mainLabelCol

Character string specifying a column name from the argument supplied to parameter 'x' from which to derive cell labels from (see details and vignette).

mainLabelSize

Integer specifying the size of text labels for cells in the main plot. Valid only if argument is supplied to the parameter 'mainLabelCol'.

mainLabelAngle

Integer specifying the degree of rotation for text labels. Valid only if argument is supplied to the parameter 'mainLabelCol'.

mainDropMut

Boolean specifying whether to drop unused "mutation type" levels from the legend.

mainPalette

Character vector specifying colours for mutation types plotted in the main plot, must specify a colour for each mutation type plotted.

mainLayer

Valid ggplot2 layer to be added to the main plot.

mutBurden

Object of class data frame containing columns "sample", "mut_burden" with sample levels matching those supplied in x.

plotMutBurden

Boolean specify if the mutation burden sub-plot should be displayed.

coverageSpace

Integer specifying the size in bp of the genome covered by sequence data from which mutations could be called (see details and vignette).

mutBurdenLayer

Valid ggplot2 layer to be added to the top sub-plot.

clinData

Object of class data frame with rows representing clinical data. The data frame should be in "long format" and columns must be names as "sample", "variable", and "value" (optional see details and vignette).

clinLegCol

Integer specifying the number of columns in the legend for the clinical data, only valid if argument is supplied to parameter clinData.

clinVarOrder

Character vector specifying the order in which to plot variables in the variable column of the argument given to the parameter clinData. The argument supplied to this parameter should have the same unique length and values as in the variable column of the argument supplied to parameter clinData (see vignette).

clinVarCol

Named character vector specifying the mapping of colours to variables in the variable column of the data frame supplied to clinData (ex. "variable"="colour").

clinLayer

Valid ggplot2 layer to be added to the clinical sub-plot.

sampRecurLayer

Valid ggplot2 layer to be added to the left sub-plot.

plotGenes

Character vector specifying genes to plot. If not null genes not specified within this character vector are removed.

geneOrder

Character vector specifying the order in which to plot genes.

plotSamples

Character vector specifying samples to plot. If not null all other samples not specified within this parameter are removed.

sampOrder

Character vector specifying the order of the samples to plot.

maxGenes

Integer specifying the maximum number of genes to be plotted. Genes kept will be choosen based on the reccurence of mutations in samples.

rmvSilent

Boolean specifying if silent mutations should be removed from the plot.

fileType

Character string specifying the file format of the data frame specified to parameter 'x', one of "MGI", "MAF", "Custom" (see details and vignette).

variant_class_order

Character vector specifying the hierarchical order of mutation types to plot, required if file_type == "Custom" (see details and vignette).

out

Character vector specifying the the object to output, one of "data", "grob", or "plot", defaults to "plot" (see returns).

Details

waterfall is a function designed to visualize the mutations seen in a cohort. The function takes a data frame with appropriate column names (see fileType parameter) and plots the mutations within. In cases where multiple mutations occur in the same cell the most deleterious mutation is given priority (see vignette for default priority). If the fileType parameter is set to "Custom" the user most supply this priority via the 'variant_class_order' parameter with the highest priorities occuring first. Additionally this parameter will override the default orders of MGI and MAF file types.

Various data subsets are allowed via the waterfall function (see above), all of these subsets will occur independently of the mutation burden calculation. To clarify the removal of genes and mutations will only occur after the mutation burden is calculated. The mutation burden calculation is only meant to provide a rough estimate and assumes that the coverage breadth within the cohort is aproximately equal. For more accurate calculations it is recommended to supply this information via the mutBurden parameter which. Note that the mutation burden calculation relies on the 'coverageSpace' parameter (see vignette).

It is possible to display additional information within the plot via cell labels. The 'mainLabelCol' parameter will look for an additional column in the data frame and plot text within cells based on those values (see vignette).

Value

One of the following, a list of dataframes containing data to be plotted, a grob object, or a plot.

Examples

1
2
# Plot the data
waterfall(brcaMAF, plotGenes=c("PIK3CA", "TP53", "USH2A", "MLL3", "BRCA1"))

Want to suggest features or report bugs for rdrr.io? Use the GitHub issue tracker.