sequenceMap: Sequence Map Function

View source: R/sequenceMap.R

sequenceMapR Documentation

Sequence Map Function

Description

This is a graphical function used to visualize data along an amino acid sequence.
The purpose of this function is to show the entire sequence and color residues based on properties. This may help identify important residues along a protein. This was designed with the goal of visualizing discrete values, but has since been expanded to visualize numeric/continuous values.

Usage

sequenceMap(
  sequence,
  property,
  nbResidues = 30,
  labelType = "both",
  everyN = c(1, 10),
  labelLocation = c("on", "below"),
  rotationAngle = c(0, 0),
  customColors = NA
)

Arguments

sequence

amino acid sequence as a single character string, a vector of single characters, or an AAString object. It also supports a single character string that specifies the path to a .fasta or .fa file.

property

a vector with length equal to sequence length. This is what is visualized on the function. Can be discrete or continuous values.

nbResidues

numeric value, 30 by default. The number of residues to display on each row of the plot. It is not recommended to be over 50 or under 10 for standard sequences. Optimal value may vary between sequences of extreme lengths.

labelType

character string, "both" by default. accepted values are labelType = c("both", "AA", "number", "none"). "both" shows both amino acid residue and residue number. "AA" and "number" show either the amino acid residue or the residue number, respectively. "none" only shows graphical values without labels. NOTE: When using "both", *everyN*, *labelLocation*, and *rotationAngle* all require vectors of length = 2 where the first value applies to the "AA" parameter and the second value applies to the "number" parameter. When using "AA" or "number, *everyN*, *labelLocation*, and *rotationAngle* require a single value. If a vector us provided, only the first value will be used.

everyN

numeric value or vector of numeric values with length = 2. This is used to show every Nth amino acid and/or residue number. To show every value, set everyN = 1 or everyN = c(1, 1).

labelLocation

character string or vector of character strings with length = 2. When labelLocation = "on", the text is layered on top of the graphical output. When labelLocation = "below", the text is placed below the graphical output. If labelType = "both", do not set labelLocation = c("on", "on") or labelLocation = c("below", "below").

rotationAngle

numeric value or vector of numeric values with length = 2. This value is used to rotate text. Especially useful when printing many residue numbers.

customColors

vector of colors as character strings. NA by default. Used to support custom plot colors. If property is a discrete scale, a character vector of colors with length = number of unique discrete observations is required. If property is a continuous scale, a character vector of the colors for c("highColor","lowColor","midColor"). Set NA to skip custom colors.

Value

A ggplot.

See Also

sequenceMapCoordinates for mapping coordinates

Examples

#Get a data frame returned from another function
aaVector <- c("A", "C", "D", "E", "F",
              "G", "H", "I", "K", "L",
              "M", "N", "P", "Q", "R",
              "S", "T", "V", "W", "Y")
## As a continuous property
exampleDF_cont <- chargeCalculationGlobal(sequence = aaVector)
head(exampleDF_cont)
## Or as a discrete property
exampleDF_disc <- structuralTendency(sequence = aaVector)
head(exampleDF_disc)
sequenceMap(sequence = exampleDF_cont$AA,
          property = exampleDF_cont$Charge,
         nbResidues = 3,
         labelType = "both")

sequenceMap(sequence = exampleDF_disc$AA,
         property = exampleDF_disc$Tendency,
         nbResidues = 3,
         labelType = "both")

#Change the layout of labels
sequenceMap(
sequence = exampleDF_disc$AA,
property = exampleDF_disc$Tendency,
nbResidues = 3,
labelType = "AA") #Only AA residue Labels

sequenceMap(
sequence = exampleDF_disc$AA,
property = exampleDF_disc$Tendency,
nbResidues = 3,
labelType = "number") #Only residue numner labels

sequenceMap(
sequence = exampleDF_disc$AA,
property = exampleDF_disc$Tendency,
nbResidues = 3,
labelType = "none") #No labels

#The text can also be rotated for ease of reading,
 ## espeically helpful for larger sequences.
sequenceMap(
sequence = exampleDF_disc$AA,
property = exampleDF_disc$Tendency,
labelType = "number",
labelLocation = "on",
  rotationAngle = 90)

#Specify colors for continuous values

sequenceMap(
sequence = exampleDF_cont$AA,
property = exampleDF_cont$Charge,
customColors = c("purple", "pink", "grey90"))

#or discrete values
sequenceMap(
sequence = exampleDF_disc$AA,
property = exampleDF_disc$Tendency,
customColors = c("#999999", "#E69F00", "#56B4E9"))


#change the number of residues on each line with nbResidue
#or discrete values
sequenceMap(
sequence = exampleDF_disc$AA,
property = exampleDF_disc$Tendency,
nbResidues = 1)
sequenceMap(
sequence = exampleDF_disc$AA,
property = exampleDF_disc$Tendency,
nbResidues = 3)
sequenceMap(
sequence = exampleDF_disc$AA,
property = exampleDF_disc$Tendency,
nbResidues = 10)


#Use sequenceMapCoordinates for additional annotations
gg <- sequenceMap(sequence = exampleDF_disc$AA,
               property = exampleDF_disc$Tendency,
               nbResidues = 3,
               labelType = "both")

#Change the nbResidues to correspond to the sequenceMap setting
mapCoordDF <- sequenceMapCoordinates(aaVector,
                                  nbResidues = 3)
head(mapCoordDF)

#subsetting for positive residues
mapCoordDF_subset <- mapCoordDF$AA %in% c("K", "R", "H")
mapCoordDF_subset <- mapCoordDF[mapCoordDF_subset,]

library(ggplot2)
gg <- gg + geom_point(inherit.aes = FALSE,
                    data = mapCoordDF_subset,
                   aes(x = col + 0.5, #to center on the residue
                       y = row + 0.2), #to move above on the residue
                   color = "purple",
                   size = 3,
                   shape = 3)
plot(gg)


wmm27/idpr documentation built on Jan. 12, 2023, 8:45 a.m.