plot_bits_logo: Plot sequence information content as bits.

View source: R/pwm_utils.R

plot_bits_logoR Documentation

Description

This function makes a publication quality logo. If the sequences contains an alphabet with letters in upper AND lower case, the Shannon entropy is calculated in a case-insensitive way. Meaning that the maximum bit value will be log2 the number of in the alphabet. Instead in the plot the letters preserve their case.

Usage

plot_bits_logo(
  df,
  ID_col,
  alphabet,
  small_n_correction = FALSE,
  y_lims = c(0, NA),
  anno_width = 0.75,
  InfoContent_thrshld,
  highlight_colour = "grey74",
  axis_txt_size = 10,
  ttl_txt = NULL,
  ...
)

Arguments

df

A data.frame with a column called Sequence and anther defined in ID_col. Use the wrapper fasta2df() to import a fasta file to a dataframe that is compatible with this function.

ID_col

A character specifying the column name in df to be used as the sequence ID.

alphabet

A character vector containing the alphabet letters present in Sequence. Guessed by default from df.

small_n_correction

Logical for applying a small correction to the Shannon Entropy for low number of input sequences. Parameter for df2bits, type ?df2bits for details. Default FALSE.

y_lims

A numeric vector of length 2 specifying the Y-axis min and max value. Default c(0, NA)

anno_width

A small number that defines how wide the vertical bar should be. Default 0.75.

InfoContent_thrshld

The information content (bits) threshold to consider a letter position to be highlighted. Position whose letter information sum is lower than InfoContent_thrshld are highlithed by a vertical bar in the plot.

highlight_colour

A colour name to fill the letter highlighting rectangle. Default grey74 with alpha = 0.5.

axis_txt_size

A number specifying the size of the axis text in the plot. Default 10.

ttl_txt

Some text in quotes specifying the plot title.

...

Advanced parameters passed to renumber_logo_seq_breaks.

Details

This function uses geom_logo to plot the logo.

Value

A ggplot sequence logo

Examples

plot_bits_logo(df = df_w_seqs, ID_col = 'Species')

plot_bits_logo(df = df_w_seqs, ID_col = 'Species', 
               InfoContent_thrshld = 0.5, anno_width = 0.5, 
               highlight_colour = 'lightblue', 
               uppercase_spacer = 5, lowercase_spacer = 6)
               
# compare the plot when a small number epsilon is added to the Shannon's Entropy formula 
# This is the default               
plot_bits_logo(df = suz12_ex4_eutheria, ID_col = 'Species', y_lims = c(0, 2),
               small_n_correction = F, ttl_txt = 'Without small correction')    

# This is optional, but recommended when having few input sequences
plot_bits_logo(df = suz12_ex4_eutheria, ID_col = 'Species', y_lims = c(0, 2),
               small_n_correction = F, ttl_txt = 'Without small correction')                               

Ni-Ar/niar documentation built on Feb. 3, 2025, 9:25 a.m.