Plot_Coverage: RNA-seq Coverage Plots and Genome Tracks

View source: R/Coverage.R

Plot_CoverageR Documentation

RNA-seq Coverage Plots and Genome Tracks

Description

Generate plotly / ggplot RNA-seq genome and coverage plots from command line. For some quick working examples, see the Examples section below.

Usage

Plot_Coverage(
  se,
  Event,
  Gene,
  seqname,
  start,
  end,
  coordinates,
  strand = c("*", "+", "-"),
  zoom_factor,
  bases_flanking = 100,
  tracks,
  track_names = tracks,
  condition,
  selected_transcripts,
  condense_tracks = FALSE,
  stack_tracks = FALSE,
  t_test = FALSE,
  norm_event
)

Plot_Genome(
  se,
  reference_path,
  Gene,
  seqname,
  start,
  end,
  coordinates,
  zoom_factor,
  bases_flanking = 100,
  selected_transcripts,
  condense_tracks = FALSE
)

as_egg_ggplot(p_obj)

Arguments

se

A NxtSE object, created by MakeSE. COV files must be linked to the NxtSE object. To do this, see the example in MakeSE. Required by Plot_Coverage.

Event

The EventName of the IR / alternative splicing event to be displayed. Use rownames(se) to display a list of valid events.

Gene

Whether to use the range for the given Gene. If given, overrides Event (but Event or norm_event will be used to normalise by condition). Valid Gene entries include gene_id (Ensembl ID) or gene_name (Gene Symbol).

seqname, start, end

The chromosome (string) and genomic start/end coordinates (numeric) of the region to display. If present, overrides both Event and Gene. E.g. for a given region of chr1:10000-11000, use the parameters: seqname = "chr1", start = 10000, end = 11000

coordinates

A string specifying genomic coordinates can be given instead of seqname,start,end. Must be of the format "chr:start-end", e.g. "chr1:10000-11000"

strand

Whether to show coverage of both strands "*" (default), or from the "+" or "-" strand only.

zoom_factor

Zoom out from event. Each level of zoom zooms out by a factor of 3. E.g. for a query region of chr1:10000-11000, if a zoom_factor of 1.0 is given, chr1:99000-12000 will be displayed.

bases_flanking

(Default = 100) How many bases flanking the zoomed window. Useful when used in conjunction with zoom_factor == 0. E.g. for a given region of chr1:10000-11000, if zoom_factor = 0 and bases_flanking = 100, the region chr1:9900-11100 will be displayed.

tracks

The names of individual samples, or the names of the different conditions to be plotted. For the latter, set condition to the specified condition category.

track_names

The names of the tracks to be displayed. If omitted, the track_names will default to the input in tracks

condition

To display normalised coverage per condition, set this to the condition category. If omitted, tracks are assumed to refer to the names of individual samples.

selected_transcripts

(Optional) A vector containing transcript ID or transcript names of transcripts to be displayed on the gene annotation track. Useful to remove minor isoforms that are not relevant to the samples being displayed.

condense_tracks

(default FALSE) Whether to collapse the transcript track annotations by gene.

stack_tracks

(default FALSE) Whether to graph all the conditions on a single coverage track. If set to TRUE, each condition will be displayed in a different colour on the same track. Ignored if condition is not set.

t_test

(default FALSE) Whether to perform a pair-wise T-test. Only used if there are TWO condition tracks.

norm_event

Whether to normalise by an event different to that given in "Event". The difference between this and Event is that the genomic coordinates can be centered around a different Event, Gene or region as given in seqname/start/end. If norm_event is different to Event, norm_event will be used for normalisation and Event will be used to define the genomic coordinates of the viewing window. norm_event is required if Event is not set and condition is set.

reference_path

The path of the reference generated by BuildReference. Required by Plot_Genome if a NxtSE object is not specified.

p_obj

In as_egg_ggplot, takes the output of Plot_Coverage and plots all tracks in a static plot using ggarrange function of the egg package. Requires egg to be installed.

Details

In RNA sequencing, alignments to spliced transcripts will "skip" over genomic regions of introns. This can be illustrated in a plot using a horizontal genomic axis, with the vertical axis representing the number of alignments covering each nucleotide. As a result, the coverage "hills" represent the expression of exons, and "valleys" to introns.

Different alternatively-spliced isoforms thus produce different coverage patterns. The change in the coverage across an alternate exon relative to its constitutively-included flanking exons, for example, represents its alternative inclusion or skipping. Similarly, elevation of intron valleys represent increased intron retention.

With multiple replicates per sample, coverage is dependent on library size and gene expression. To compare alternative splicing ratios, normalisation of the coverage of the alternate exon (or alternatively retained intron) relative to their constitutive flanking exons, is required. There is no established method for this normalisation, and can be confounded in situations where flanking elements are themselves alternatively spliced.

NxtIRF performs this coverage normalisation using the same method as its estimate of spliced / intronic transcript abundance using the SpliceOverMax method (see details section in CollateData). This normalisation can be applied to correct for library size and gene expression differences between samples of the same experimental condition. After normalisation, mean and variance of coverage can be computed as ratios relative to total transcript abundance. This method can visualise alternatively included genomic regions including casette exons, alternate splice site usage, and intron retention.

Plot_Coverage generates plots showing depth of alignments to the genomic axis. Plots can be generated for individual samples or samples grouped by experimental conditions. In the latter, mean and 95% confidence intervals are shown.

Plot_Genome generates genome transcript tracks only. Protein-coding regions are denoted by thick rectangles, whereas non-protein coding transcripts or untranslated regions are denoted with thin rectangles. Introns are denoted as lines.

Value

A list containing two objects. final_plot is the plotly object. ggplot is a list of ggplot tracks, with:

  • ggplot[[n]] is the nth track (where n = 1, 2, 3 or 4).

  • ggplot[[5]] contains the T-test track if one is generated.

  • ggplot[[6]] always contains the genome track.

Functions

  • Plot_Coverage: generates plots showing depth of alignments to the genomic axis. Plots can be generated for individual samples or samples grouped by experimental conditions. In the latter, mean and 95 intervals are shown.

  • Plot_Genome: Generates a plot of transcripts within a given genomic region, or belonging to a specified gene

  • as_egg_ggplot: Coerce the 'Plot_Coverage()' output as a vertically stacked ggplot, using egg::ggarrange

Examples

se <- NxtIRF_example_NxtSE()

# Plot the genome track only, with specified gene:
p <- Plot_Genome(se, Gene = "SRSF3")
p$ggplot

# View the genome track, specifying a genomic region via coordinates:
p <- Plot_Genome(se, coordinates = "chrZ:10000-20000")
p$ggplot

# Assign annotation re experimental conditions

colData(se)$treatment <- rep(c("A", "B"), each = 3)

# Verify that the COV files are linked to the NxtSE object:
covfile(se)

# Return a list of ggplot and plotly objects
p <- Plot_Coverage(
    se = se,
    Event = rowData(se)$EventName[1],
    tracks = colnames(se)[1:4]
)

# Display a static ggplot / egg::ggarrange stacked plot:

as_egg_ggplot(p)

# Display the plotly-based interactive Coverage plot:
p$final_plot

# Plot the same event but by condition "treatment"
p <- Plot_Coverage(
    se, rowData(se)$EventName[1],
    tracks = c("A", "B"), condition = "treatment"
)
as_egg_ggplot(p)

alexchwong/NxtIRFcore documentation built on Oct. 31, 2022, 9:14 a.m.