Structural Tendency Vignette

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.width = 6, 
  fig.height = 4
)

Background

The composition of amino acids and the overall chemistry of Intrinsically Disordered Proteins (IDPs) are distinctly different from that of ordered proteins. Each amino acid has a tendency to favor a compact or extended secondary and tertiary structures based on the chemistry of the residue. Disorder-promoting residues, those enriched in IDPs, are typically hydrophilic, charged, or small. Order promoting residues, those enriched in structured proteins, tend to be aliphatic, hydrophobic, aromatic, or form tertiary structures. Disorder neutral residues neither favor order or disordered structures (Uversky, 2019).

Disorder-promoting residues are P, E, S, Q, K, A, and G; order-promoting residues are M, N, V, H, L, F, Y, I, W, and C; disorder‐neutral residues are D, T, and R (Uversky, 2013).

Installation

The package can be installed from Bioconductor with the following line of code. It requires the BiocManager package to be installed

#BiocManager::install("idpr") 

The most recent version of the package can be installed with the following line of code. This requires the devtools package to be installed.

#devtools::install_github("wmm27/idpr")

Quick-use guide

library(idpr)

The structuralTendency function is used to convert an amino acid sequence into a data frame with each residue within the sequence in one column and the corresponding structural tendency in another column.

This example will use the H. sapiens TP53 sequence, acquired from UniProt (UniProt Consortium 2019) and stored within the idpr package for examples.

P53_HUMAN <- TP53Sequences[2]
print(P53_HUMAN)
tendencyDF <- structuralTendency(P53_HUMAN)
head(tendencyDF)

For convenient plotting, use structuralTendencyPlot(). Results can be as a pie chart or bar plot.

structuralTendencyPlot(P53_HUMAN)

structuralTendencyPlot(P53_HUMAN,
                       graphType = "bar")

structuralTendency In Detail

Like most functions in idpr, the structuralTendency function will accept an amino acid sequence as a vector of single-letter residues or as a character string. It will also accept the path to a .fasta file as a character string (".fasta" must be included within the string).

It will result in a data frame with 3 columns. The first column is "Position", which indicates the numeric position of the residue in the submitted sequence. The second column is "AA", which indicates the amino acid residue as a single letter. The third column is "Tendency", which indicates the structural tendency of the residue that was matched by the function.

The following examples will use the M. musculus Tp53 sequence, acquired from UniProt (UniProt Consortium 2019) and stored within the idpr package for examples.

P53_MOUSE <- TP53Sequences[1]
print(P53_MOUSE)

This will get the data frame.

tendencyDF <- structuralTendency(P53_MOUSE)
head(tendencyDF)

The data frame can be used for custom plotting. Another possibility is the use of the sequenceMap() function within idpr.

sequenceMap(
  sequence = tendencyDF$AA,
  property = tendencyDF$Tendency,
  customColors = c("#999999", "#E69F00", "#56B4E9"))  

structuralTendency defines order- and disorder-promoting residues based on Uversky (2013).

However, there are other definitions of these categories. For example, Dunker et al. (2001) defines these categories as:

structuralTendency allows for the user to specify different definitions. An example of this using the Dunker et al. (2001) definitions rather than the default is below.

tendencyDF <- structuralTendency(P53_MOUSE,
                 disorderPromoting = c("A", "R", "G", "Q", "S", "P", "E", "K"),
                 disorderNeutral = c("H", "M", "T", "D"),
                 orderPromoting = c("W", "C", "F", "I", "Y", "V", "L", "N"))
head(tendencyDF)


sequenceMap(
  sequence = P53_MOUSE,
  property = tendencyDF$Tendency,
  customColors = c("#999999", "#E69F00", "#56B4E9"))    

structuralTendencyPlot In Detail

structuralTendencyPlot is a function for summarizing and plotting results from structuralTendency. This function accepts the same arguments as structuralTendency() for sequence and definitions of residue tendency.

The function will get the amino acid composition for the protein and match it to the tendency definition.

The results can be visualized in the form of a pie chart (default) or a bar chart.

structuralTendencyPlot(P53_MOUSE)

structuralTendencyPlot(P53_MOUSE,
                       graphType = "bar",
                       proteinName = names(P53_MOUSE))

In addition to the compositional profile of each residue, a summary of the profile focused only on the structural tendency can be given by setting summarize = TRUE. This shifts the focus from amino acid identity to the general composition. The graphType is preserved.

structuralTendencyPlot(P53_MOUSE,
                       summarize = TRUE)

structuralTendencyPlot(P53_MOUSE,
                       graphType = "bar",
                       proteinName = names(P53_MOUSE),
                       summarize = TRUE)

In addition to the default plotting, a data frame of the compositional profile can be produced by setting graphType = "none". This can be used for further data analysis or for custom plotting. The "summarize" argument is preserved.

compositionDF <- structuralTendencyPlot(P53_MOUSE,
                                        graphType = "none")
head(compositionDF)


summaryDF <- structuralTendencyPlot(P53_MOUSE,
                                    graphType = "none",
                                    summarize = TRUE)
head(summaryDF)

References

Citations

Dunker, A. K., Lawson, J. D., Brown, C. J., Williams, R. M., Romero, P., Oh, J. S., . . . Obradovic, Z. (2001). Intrinsically disordered protein. Journal of Molecular Graphics and Modelling, 19(1), 26-59. doi:https://doi.org/10.1016/S1093-3263(00)00138-8

UniProt Consortium. (2019). UniProt: a worldwide hub of protein knowledge. Nucleic acids research, 47(D1), D506-D515.

Uversky, V. N. (2013). A decade and a half of protein intrinsic disorder: Biology still waits for physics. Protein Science, 22(6), 693-724. doi:10.1002/pro.2261

Uversky, V. N. (2019). Intrinsically Disordered Proteins and Their “Mysterious” (Meta)Physics. Frontiers in Physics, 7(10). doi:10.3389/fphy.2019.00010

Additional Information

R Version

R.version.string

System Information

as.data.frame(Sys.info())
citation()


Try the idpr package in your browser

Any scripts or data that you put into this service are public.

idpr documentation built on Dec. 26, 2020, 6 p.m.