In kkdey/Logolas: EDLogo Plots Featuring String Logos and Adaptive Scaling of Position-Weight Matrices

We present an elaborate guided tutorial of how to use the Logolas R package.

knitr::opts_chunk$set(dpi = 90,dev = "png")

Features of Logolas

Logolas offers two new features that makes logo visualization a more generic tool with potential applications in a much wider scope of problems.

Enrichment Depletion Logo (EDLogo) : General logo plotting softwares (seqLogo) primarily highlight only enrichment of symbols at each position of the logo, but Logolas allows the user to highlight both enrichment and depletion of symbols at any position, leading to more parsimonious and visually appealing representation.
String symbols : General logo plot softwares have limited library of symbols, typically restricted to English alphabets. Logolas allows the user to plot symbols for any alphanumeric string, and also allows for combination of strings and charcaters.

Installation

The most recent version of Logolas is available from Github using devtools R package.First, you would require to install the following Bioconductor packages.

source("https://bioconductor.org/biocLite.R")
biocLite(c("Biostrings","BiocStyle","Biobase","seqLogo","ggseqlogo"))

Then install Logolas as follows

library(devtools)
install_github("kkdey/Logolas",build_vignettes = TRUE)

Once you have installed the package, load the package in R by entering

library(Logolas)

To get an overview of the package, enter

help(package = "Logolas")

Accepted Data Types

Logolas accepts two data formats as input

a vector of aligned character sequences (may be DNA, RNA or amino acid sequences), each of same length (see Example 1 below)
a positional frequency (weight) matrix, termed PFM (PWM), with the symbols to be plotted along the rows and the positions of aligned sequences, from which the matrix is generated, along the columns. (see Example 2)

Illustration of Logolas

String set input

Consider aligned strings of characters

sequence <- c("CTATTGT", "CTCTTAT", "CTATTAA", "CTATTTA", "CTATTAT", "CTTGAAT",
              "CTTAGAT", "CTATTAA", "CTATTTA", "CTATTAT", "CTTTTAT", "CTATAGT",
              "CTATTTT", "CTTATAT", "CTATATT", "CTCATTT", "CTTATTT", "CAATAGT",
              "CATTTGA", "CTCTTAT", "CTATTAT", "CTTTTAT", "CTATAAT", "CTTAGGT",
              "CTATTGT", "CTCATGT", "CTATAGT", "CTCGTTA", "CTAGAAT", "CAATGGT")

The standard logo plot, akin to seqLogo R package can be plotted using the \textbf{logomaker()} function with type=Logo.

logomaker(sequence, type = "Logo")

To plot the EDLogo plot for the same example, use the type=EDLogo option instead.

logomaker(sequence, type = "EDLogo")

Instead of DNA.RNA sequence as above, one can also use amino acid character sequences.

library(ggseqlogo)
data(ggseqlogo_sample)
sequence <- seqs_aa$AKT1
logomaker(sequence, type = "EDLogo")

Positional Frequency (Weight) Matrix

We now see an example of positional weight matrix (PWM) as input to logomaker().

data("seqlogo_example")
seqlogo_example

We plot the logo plots for this PWM matrix.

logomaker(seqlogo_example, type = "Logo", return_heights = TRUE)
logomaker(seqlogo_example, type = "EDLogo", return_heights = TRUE)

The r return_heights = TRUE outputs the information content at each position for the standard logo plot (type = "Logo") and the heights of the stacks along the positive and negative Y axis, along with the breakdown of the height due to different characters for the EDLogo plot (type = "EDLogo").

Coloring schemes

The logomaker() function provides three arguments to set the colors for the logos, a color_type specifying the scheme of coloring used,

colors denoting the cohort of colors used and a color_seed argument determining how sampling is done from this cohort.

The color_type argument can be of three types, per_row, per_column and per_symbol.

colors element is a cohort/library of colors (chosen suitably large) from which distinct colors are chosen based on distinct color_type. The number of colors chosen is of same length as number of rows in table for per_row (assigning a color to each string), of same length as number of columns in table for per_column (assuming a color for each column), or a distinct color for a distinct symbol in per_symbol. The length of colors must be as large as the number of colors to be chosen in each scenario.

The default color_type is per-row and default colors comprises of a large cohort of nearly 70 distinct colors from which colors are sampled using the color_seed argument. The user can ignore the colors option unless he is not happy with the set of colors used, and play around with the color_seed instead and choose the one that presents a set of colors of his liking.

An example of using default colors and just specifying color_seed.

logomaker(seqlogo_example, type = "EDLogo", color_seed = 2000)

An alternative example of using an user-specified colors.

logomaker(seqlogo_example, color_type = "per_row",
          colors = c("#7FC97F", "#BEAED4", "#FDC086", "#386CB0"),
          type = "EDLogo")

Styles of symbols

Besides the default style with filled symbols for each character, one can also use characters with border styles. For the standard logo plot, this is accomplished by the tofill control argument.

logomaker(seqlogo_example, type = "Logo",
          logo_control = list(control = list(tofill= FALSE)), color_seed = 4000)

For an EDLogo plot, the arguments tofill_pos and tofill_neg represent the coloring scheme for the positive and the negative axes in an EDLogo plot.

logomaker(seqlogo_example, type = "EDLogo",
          logo_control = list(control = list(tofill_pos = TRUE,
                                             tofill_neg = FALSE)))

Background Info

Logolas allows the user to scale the data based on a specified background information - specified through the input bg. The default value is NULL, in which case equal probability is assigned to each symbol.

Alternatively, the user can specify a vector (equal to in length to the number of symbols) which specifies the background probability for each symbol and assumes this same background probability for each position (column of PWM matrix). See examplw below.

bg <- c(0.05, 0.90, 0.03, 0.05)
names(bg) <- c("A", "C", "G", "T")
logomaker(seqlogo_example, bg=bg, type = "EDLogo")

As a second alternative, the user can specify bg as a matrix of same dimension as the positional freuqncy matrix underlying the input data, which is flexible in having separate background probabilities for each position.

logomaker(seqlogo_example, bg=(seqlogo_example+1e-02), type = "EDLogo")

String symbols

Logolas allows the user to plot symbols not just for characters as in previous examples, but for any alphanumeric string. We present two such examples below.

In the first example, demonstrating the relative frequencies of different mutation signatures (mismatch type at the center and bases flanking it), we see how Logolas handles data with a mix of strings and characters, with flanking positions being purely characters and the center position being purely string based (data from Shiraishi et al 2015).

data("mutation_sig")
logomaker(mutation_sig, type = "Logo", color_seed = 2000)

The user may want to have distinct colors for distinct symbols.characters in this case than a string. This is where we make use of the per_symbol option for color_type.

logomaker(mutation_sig, type = "EDLogo", color_type = "per_symbol",  color_seed = 2000)

In the second example, we use EDLogo plots to highlight enrichment and depletion of histone marks in different regions of the genome, with respect to a specific background bg(data from Koch et al 2007).

data("histone_marks")
logomaker(histone_marks$mat, bg=histone_marks$bgmat, type = "EDLogo")

Extras

Multiple panels plots

Logolas plots can be plotted in multiple panels, as depicted below.

sequence <- c("CTATTGT", "CTCTTAT", "CTATTAA", "CTATTTA", "CTATTAT", "CTTGAAT",
              "CTTAGAT", "CTATTAA", "CTATTTA", "CTATTAT")
Logolas::get_viewport_logo(1, 2, heights.val = 20)
library(grid)
seekViewport(paste0("plotlogo", 1))
logomaker(sequence, type = "Logo", logo_control = list(newpage = FALSE))

seekViewport(paste0("plotlogo", 2))
logomaker(sequence, type = "EDLogo", logo_control = list(newpage = FALSE))

In the same way, ggplot2 graphics can also be combined with Logolas plots.

PSSM logos

While logomaker() takes a PFM, PWM or a set of aligned sequences as input, sometimes, some position specific scores are only available to the user. In this case, one can use the logo_pssm() in Logolas to plot the scoring matrix.

data(pssm)
logo_pssm(pssm, control = list(round_off = 0))

The round_off comtrol argument specifies the number of points after decimal allowed in the axes of the plot.

Acknowledgements

The authors would like to acknowledge Oliver Bembom, the author of seqLogo for acting as an inspiration and providing the foundation on which this package is created. We also thank Peter Carbonetto, Edward Wallace and John Blischak for helpful feedback and discussions.

\section{Session Info}

sessionInfo()

Thank you for using Logolas !

If you have any questions, you can either open an issue in our Github page or write to Kushal K Dey (kkdey@uchicago.edu). Also please feel free to contribute to the package. You can contribute by submitting a pull request or by communicating with the said person.

kkdey/Logolas documentation built on May 20, 2019, 10:30 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

kkdey/Logolas
EDLogo Plots Featuring String Logos and Adaptive Scaling of Position-Weight Matrices

In kkdey/Logolas: EDLogo Plots Featuring String Logos and Adaptive Scaling of Position-Weight Matrices

Features of Logolas

Installation

Accepted Data Types

Illustration of Logolas

String set input

Positional Frequency (Weight) Matrix

Coloring schemes

Styles of symbols

Background Info

String symbols

Extras

Multiple panels plots

PSSM logos

Acknowledgements

R Package Documentation

Browse R Packages

We want your feedback!

kkdey/Logolas EDLogo Plots Featuring String Logos and Adaptive Scaling of Position-Weight Matrices

In kkdey/Logolas: EDLogo Plots Featuring String Logos and Adaptive Scaling of Position-Weight Matrices

Features of Logolas

Installation

Accepted Data Types

Illustration of Logolas

String set input

Positional Frequency (Weight) Matrix

Coloring schemes

Styles of symbols

Background Info

String symbols

Extras

Multiple panels plots

PSSM logos

Acknowledgements

R Package Documentation

Browse R Packages

We want your feedback!

kkdey/Logolas
EDLogo Plots Featuring String Logos and Adaptive Scaling of Position-Weight Matrices