mismatchPlot: mismatchPlot

Description Usage Arguments Details Value Author(s) Examples

View source: R/plotting.R

Description

Plotting function that returns a ggplot2 object representing the mismatches and coverages of the specified samples in the specified region.

Usage

1
mismatchPlot( data, sampledata, samples=sampledata$Sample, windowsize = NULL, position = NULL, range = NULL, plotReference = TRUE, refHeight=8, printReference = TRUE, printRefSize = 2, tickSpacing = c(10,10) )

Arguments

data

The data to be plotted. Returned by h5dapply or h5readBlock.

sampledata

The sampledata for the cohort represented by data. Returned by getSampleData

samples

A character vector listing the names of samples to be plotted, defaults to all samples as described in sampledata

windowsize

Size of the window in which to plot on each side. The total interval that is plotted will be [position-windowsize,position+windowsize]

position

The position at which the plot shall be centered

range

Integer vector of two elements specifying a range of coordinates to be plotted, use either position + windowsize or range; if both are provided range overwrites position and windowsize.

plotReference

This boolean flag specifies if a reference track should be plotted, only takes effect if there is a slot named Reference in the data object passed to the function

refHeight

Height of the reference track in coverage units (default of 8 = reference track is as high as 8 reads coverage would be in the plot of a sample.)

printReference

Boolean parameter to indicate whether a text representation of the reference should be overlayed to the reference track, can only be true if plotReference is true.

printRefSize

Size parameter of the geom_text layer used to print the reference. This value is unitless and needs to be manually optimised for a given plot.

tickSpacing

Integer vector of two elements, specifying the spacing of ticks along the x and y axes respectively.

Details

If position and windowsize are specified this function creates a plot centered on position using the coverage and mismatch counts stored in data, annotating it with sample information provided in the data.frame sampledata and showing all samples listed in sample. If range is specified, the plot will cover the positions from range[1] to range[2]. The difference between specifying range or position plus windowsize lies only in the labelling of the x-axis and the coordinate system used on the x-axis. In the former case the coordinate system is that of genomic coordinates as specified in range, when using the latter the x-axis coordinates go from -windowsize through +windowsize and position 0 is marked with the calue provided in the position parameter. Furthermore when a position and windowsize are provided two black lines marking the center position are drawn (this is usefull for visualising SNVs)

If neither range, nor position and windowsize are specified the function will try to extract the information from the data object. If data is the return value of a call to h5dapply or h5readBlock this will work automagically.

The plot has the genomic position on the x-axis. The y-axis encodes values where positive values are on the forward strand and negative values on the reverse. The coverage is shown in grey, deletions in purple and the mismatches in the colors specified in the legend. Note that for each possible mismatch there is an additional color for low-quality counts (coming from the first and last sequencing cycles), so e.g. C is filled dark red and C_lq light red.

If data is the result of a call to h5dapply representing multiple blocks of data as defined in the range parameter to h5dapply then the plot will contain the mismatchPlots of each of the ranges plotted next to each other.

Value

A ggplot object containing the mismatch plot, this can be used like any other ggplot object, i.e. additional layers and styles my be applied by simply adding them to the plot.

Author(s)

Paul Pyl

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
  # loading library and example data
  library(h5vc)
  tallyFile <- system.file( "extdata", "example.tally.hfs5", package = "h5vcData" )
  sampleData <- getSampleData( tallyFile, "/ExampleStudy/16" )
  position <- 29979628
  windowsize <- 30
  samples <- sampleData$Sample[sampleData$Patient == "Patient8"]
  data <- h5readBlock(
    filename = tallyFile,
    group = "/ExampleStudy/16",
    names = c("Coverages", "Counts", "Deletions", "Reference"),
    range = c(position - windowsize, position + windowsize)
  )
  #Plotting with position and windowsize
  p <- mismatchPlot(
    data = data,
    sampledata = sampleData,
    samples = samples,
    windowsize = windowsize,
    position = position
  )
  print(p)
  #plotting with range and modified tickSpacing and refHeight
  p <- mismatchPlot(
    data = data,
    sampledata = sampleData,
    samples = samples,
    range = c(position - windowsize, position + windowsize),
    tickSpacing = c(20, 5),
    refHeight = 5
  )
  print(p)
  #plotting without specfiying range or position
  p <- mismatchPlot(
    data = data,
    sampledata = sampleData,
    samples = samples
  )
  print(p)
  #Plotting multiple regions (with small overlaps)
  library(IRanges)
  dataList <- h5dapply(
    filename = tallyFile,
    group = "/ExampleStudy/16",
    names = c("Coverages", "Counts", "Deletions", "Reference"),
    range = IRanges(start = seq( position - windowsize, position + windowsize, 20), width = 30 )
  )
  p <- mismatchPlot(
    data = dataList,
    sampledata = sampleData,
    samples = samples
  )
  print(p)

h5vc documentation built on Nov. 8, 2020, 4:56 p.m.