getPlotSetArray: Process genomic signal

Description Usage Arguments Details Value See Also Examples

View source: R/getPlotSetArray.R

Description

Function to process genomic signal from tracks and/or motif data, calculate statistics. This function should be used as the entry point to the SeqPlots pipeline and followed by plotting function(s).

Usage

1
2
3
4
getPlotSetArray(tracks, features, refgenome, bin = 10L, rm0 = FALSE,
  ignore_strand = FALSE, xmin = 2000L, xmax = 2000L,
  xanchored = 1000L, type = "pf", add_heatmap = TRUE,
  verbose = FALSE, stat = "mean", lvl1m = message, lvl2m = message)

Arguments

tracks

Character vector or list of BigWig track paths. For motif density plots MotifSetup class containing motifs setup and/or BigWig track paths (see details)

features

Character vector or list containing feature file paths (BED or GFF).

refgenome

The UCSC code of reference genome, e.g. 'hg19' for Homo sapiens (see details)

bin

Binning window size in base pairs, defaults to 1L

rm0

Remove zeros from mean/error estimate calculation, 0 in track file will be treated as missing data, defaults to FALSE

ignore_strand

If TRUE the directionality is ignored, that is all features' strands, regardless of annotation in GFF/BED file, are treated as undetermined ("*"), defaults to FALSE

xmin

Upstream calculation distance in base pairs, defaults to 200L

xmax

Downstream calculation distance in base pairs, defaults to 2000L

xanchored

Anchored, feature body pseudo length in base pairs. The features will be extended or shrunk using linear approximation. Used only if /codetype="af", defaults to 1000L

type

The type of the calculation, "pf" for point features (default), "mf" for midpoint features, "ef" for endpoint features and "af" for anchored features, see details

add_heatmap

Add the heatmap data to output, must be on to plot heatmap form output PlotSetArray class, defauls to TRUE

verbose

Print various messages and warnings, defaults to FALSE

stat

If set to "median" the median is used as summarizing statistic for linear plots instead of mean

lvl1m

function to handle lvl 1 messages, useful when invoked in Shiny GUI environment, defaults to message

lvl2m

function to handle lvl 2 messages, useful when invoked in Shiny GUI environment, defaults to message

Details

This function takes genomic coordinates in BED or GFF format, and extracts the signal from track files (BigWig) and/or calculates motif density in these regions. Then it computes the statistics required for average and heatmap plots. Returns the PlotSetArray class, which can be further subsisted, and used for plotting.

Modes of operation

The function operate in three modes, determined by type parameter:

Binning the track

bin numeric parameter determines the resolution of data acquisition. The default value 10 means that 10bp intervals within the plotting range will be summarized by calculating the mean. Higher values increases the speed of calculation and produces smoother plots, but decreases resolution.

DNA motifs

The MotifSetup class allows to calculate and plot the density of any user-defined motif around the chosen genomic feature using the reference sequence package. Motif plots can be mixed with track files' signal plots. The MotifSetup can be initialized in following way:

ms <- MotifSetup()
ms$addMotif("TATA", window=200L, heatmap=TRUE, revcomp=TRUE, name=pattern)
ms$addMotif("GAGA", window=100L)$addBigWig("path/to/file.bw")

The addMotiff methods accepts following parameters:

motif

The DNA motif sequence.

window

Sliding window size in base pairs [bp] - the size of the sliding window for motif calculation. The value (number of matching motifs within the window) is reported in the middle of the window, e.g. if window is set to 200bp, DNA motif is "GC" and there are 8 CpGs in first 200 bp of the chromosome the value 8 will be reported at 100th bp.

name

Display name - The name of the motif that will be shown in key and heatmap labels. Leave blank to use DNA motif value.

heatmap

Plot heatmap or error estimates - this checkbox determines if heatmap matrix and error estimates should be calculated. If unchecked much faster algorithm will be used for motif density calculation, but only the average plot without the error estimates will be available.

revcomp

Match reverse complement as well - select if reverse complement motif should be reported as well. For example the TATA motif will report both TATA and ATAT with this option selected.

Reference genomes

The refgenome parameter determines the reference genome to be used chromosome naming convention (e.g. chrX vs. X) and chromosome lengths. Also for motif plots the genomic sequence is used to calculate motif density tracks. To check which genomic packages are installed in current R session use installed.genomes function. available.genomes gives the list of all reference genome packages currently supplied by BioConductor. Please refer to BSgenome package documentation for installing and forging new genome packages.

Value

The PlotSetArray object.

See Also

Other plotting functions: plotAverage, plotHeatmap, plot

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
  
# Get the paths of example files                      
bed1 <- system.file("extdata", 
    "Transcripts_ce10_chrI_100Kb.bed", package="seqplots")
bed2 <- system.file("extdata", 
    "GSM1208361_chrI_100Kb_PeakCalls.bed", package="seqplots")
bw1 <- system.file("extdata", 
    "GSM1208360_chrI_100Kb_q5_sample.bw", package="seqplots")

#If required install C. elegans genomic package from Bioconductor
if(!"BSgenome.Celegans.UCSC.ce10" %in% BSgenome::installed.genomes()) {
    if(.Platform$OS.type != "windows" || .Machine$sizeof.pointer != 4) {
        if (!requireNamespace("BiocManager", quietly=TRUE))
        install.packages("BiocManager")
        BiocManager::install("BSgenome.Celegans.UCSC.ce10")
    }
}

#Get getPlotSetArray for track and feature files
#Does not work on Windows i386 (32 bit)
if(.Platform$OS.type != "windows" || .Machine$sizeof.pointer != 4) {
    plotset1 <- getPlotSetArray(bw1, c(bed1, bed2), 'ce10')
} else {
    load(system.file("extdata", "precalc_plotset.Rdata", package="seqplots"))
}
plot(plotset1) #Average plot
plot(plotset1[1,], what='h') #Heatmap

#Get getPlotSetArray for motifs, track and feature files
ms <- MotifSetup()
ms <- MotifSetup()
ms$addMotif('GAGA')
ms$addMotif('TATA')
ms$addBigWig(bw1)
if(.Platform$OS.type != "windows" || .Machine$sizeof.pointer != 4) {
    plotset2 <- getPlotSetArray(ms, c(bed1, bed2), 'ce10')
}
plot(plotset2) #Average plot
plot(plotset2[1,], what='h') #Heatmap
 

seqplots documentation built on Nov. 8, 2020, 4:52 p.m.