knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) library(chipPCR) library(dplyr) library(ggplot2) library(patchwork) library(tidyr) library(ggh4x)
We assume that six main scenarios can occur during the interaction of two amyloid proteins (see the figure above). Scenarios depend on the stability (permanent/transient) of the binding between interactor and interactee and the impact on the interactee's fibrillization speed (acceleration/inhibition). If there is no interaction between interactor and interactee, amyloid proteins are forming fibrils independently - scenario I occurs. If there is transient contact between interactor and interactee along with fibrillation inhibition - scenario II takes place, but if interactee's fibrillation is accelerated - scenario IV. If the physical binding between interactor and interactee occurs together with fibrillation inhibition - scenario III happens, but if interactee's fibrillation is accelerated - scenario V or VI takes place.
scenario I: no interaction between interactor and interactee & amyloid proteins are forming fibrils independently scenario II: transient contact between interactor and interactee & inhibition of homofibril fibrillization scenario III: physical binding between interactor and interactee & inhibition of homofibril fibrillization scenario IV: transient contact between interactor and interactee & acceleration of homofibril fibrillization scenario V: physical binding between interactor and interactee & acceleration of homofibril fibrillization scenario VI: physical binding between interactor and interactee & acceleration of heterofibril fibrillization
The scenarios are discrete but they represent points in the continuum rather than the real phenomenons. We are aware that depending on the experimental conditions an interaction can vary between scenario III and IV. Therefore, we do not imply that each interaction follows strictly one of these scenarios, but rather presents most dominantly one of them. To distinguish between these interaction scenarios we design three descriptors (described below). Descriptor 1. differentiates between scenarios I (no effect on kinetics) II and III (inhibited aggregation) as well as IV, V and VI (acceleration). Descriptor 2. discriminates between scenarios IV and V/VI. Descriptor 3. differentiates between scenarios V and VI.
For example, if descriptor 1 is faster aggregation, descriptor 2 - yes, direct evidence and descriptor 3 - yes, no or no information, they describe cross-seeding.
General remarks: this descriptor is fully based on the kinetics or any kinetic data. Here, by fibrillization we mean aggregation from low-organisation levels to mature fibrils confirmed by e.g., microscopy images. If the interactor accelerates the speed of the oligomer formation, but they never aggregate into the level of mature fibrils (fibrillization does not occur), it is not an acceleration as we understand it. The commonly used technique to measure the kinetics of fibrillization is Thioflavin T (ThT) assay (e.g., ThT 101: a primer on the use of thioflavin T to investigate amyloid formation (doi: 10.1080/13506129.2017.1304905)). We are aware of the fact ThT is not always quantitative, i.e. a higher (or lower) ThT level - under different conditions (e.g., the presence of the interactor) - can be caused by changes to the fibril structure rather than the amount of fibrils. For the purpose of simplification, we ignore it and always follow the interpretation of authors.
set.seed(1) p1 <- rbind(mutate(AmpSim(cyc = 1:40), curve_type = "faster fibrillization (steeper slope)"), mutate(AmpSim(cyc = 1:40, b.eff = -8), curve_type = "slower fibrillization")) %>% ggplot(aes(x = cyc, y = fluo, color = curve_type)) + geom_line() + scale_x_continuous("Time") + scale_y_continuous("ThT fluorescence") + scale_color_manual("", values = c("#D79B00", "#6C8EBF")) + ggtitle("Slope") p2 <- rbind(mutate(AmpSim(cyc = 1:40), curve_type = "faster fibrillization (shorter lag phase)"), mutate(AmpSim(cyc = 1:40, Cq = 30), curve_type = "slower fibrillization")) %>% ggplot(aes(x = cyc, y = fluo, color = curve_type)) + geom_line() + scale_x_continuous("Time") + scale_y_continuous("ThT fluorescence") + scale_color_manual("", values = c("#D79B00", "#6C8EBF")) + ggtitle("Lag phase") p3 <- rbind(mutate(AmpSim(cyc = 1:40, ampl = 1.5), curve_type = "faster fibrillization (higher maximum ThT emission)"), mutate(AmpSim(cyc = 1:40), curve_type = "slower fibrillization")) %>% ggplot(aes(x = cyc, y = fluo, color = curve_type)) + geom_line() + scale_x_continuous("Time") + scale_y_continuous("ThT fluorescence") + scale_color_manual("", values = c("#D79B00", "#6C8EBF")) + ggtitle("Maximum ThT emission") (p1 + p2 + p3)*(theme_bw(base_size = 9) + theme(legend.position = "bottom", legend.direction = "vertical", plot.background = element_rect(fill = NA), axis.text = element_blank()))
Name of the amyloid protein: was chosen from a list of amyloid proteins considered by us. Every protein on the list has confirmed amyloid-like properties.
Sequence: The sequence is a vector of amino acids.
Source sequence: the UniProt ID of the original protein.
The AmyloGraph database as a single protein treats a protein that can occur in many taxonomic variants or after modifications (e.g., we have human and bovine precursor albumins, P02768 and P02769 as well as the products of the post-translational modifications, Q56G89).
The source sequence may be not identical to the interactor’s or interactee’s sequence. However, interactor or interactee might be a part of the source sequence (as human amyloid beta 1-40 is a part of the P05067) or a mutated variant of a source sequence (when some amino acids are altered compared to the original sequence). For example, in AmyloGraph database the CsgA protein can occur as one of 6 variants, including 4 homologues and 2 mutants.
dat <- AmyloGraph::ag_data_interactions total_number_pubs <- length(unique(dat[["doi"]])) total_number_int <- nrow(dat) total_number_prot <- length(unique(c(dat[["interactee_name"]], dat[["interactor_name"]])))
We started our manuscript collection on amyloid-amyloid interactions by defining the eligibility criteria:
We have started our search with the analysis of 24 manuscripts in our in-house collection of publications. Next, we have expanded our search by repeatedly adding manuscripts cited by manuscripts in our collection or referencing manuscripts in our collections. The final collection had 364 manuscripts.
We have curated the information in collected publications using a two-step procedure: initial curation and validation.
During this procedure, a curator reviewed all interactions described in the manuscripts and annotated them in the dedicated form considering three AmyloGraph descriptors: descriptor 1. the impact on the speed of the fibrillization; descriptor 2. physical binding between interactee and interactors; descriptor 3. presence of the heterogenous fibrils (described in detail in the section Descriptors). They chose names of amyloid proteins involved in the interaction from a list and collected information on the amyloids' sequence. Each record was associated with manuscript's doi.
The final list of interactions after the initial curation covered 863 interactions 49 proteins described in 185 manuscripts.
During this procedure, a curator has independently reviewed the reported interaction records from assigned manuscripts in the dedicated form. The semi-random assignment procedure ensured that the curator who validated a specific record was not involved in its initial curation.
They reviewed interaction records similarly to during the initial curation step. A curator considered three AmyloGraph descriptors: descriptor 1. the impact on the speed of the fibrillization; descriptor 2. physical binding between interactee and interactors; descriptor 3. presence of the heterogenous fibrils Descriptors. They chose amyloid proteins' names from a list, collected information on the sequence of amyloid proteins involved in the interaction, and provided the sequence of an original protein by its UniProt ID. They could also add in missing interaction records or remove false ones.
The final list covers r total_number_int
interactions between r total_number_prot
proteins described in r total_number_pubs
manuscripts.
dat <- data.frame(interactions = c(NA, NA, 863, 883, 883), manuscripts = c(24, 364, 185, 172, 172), step = c("In-house collection", "Extended search", "Initial\ncuration", "Validation", "Contact with authors"), stage = c("A) Pre-screen of manuscripts", "A) Pre-screen of manuscripts", "B) Manual curation", "B) Manual curation", "C) Independent validation")) %>% mutate(stage = factor(stage, levels = unique(stage)), step = factor(step, levels = unique(step))) %>% pivot_longer(cols = c(interactions, manuscripts)) %>% na.omit() %>% mutate(name = factor(name, levels = c("manuscripts", "interactions"), labels = c("Manuscripts", "Interactions"))) ggplot(dat, aes(x = "a", y = value, label = value, fill = name)) + geom_col(position = position_dodge(width = 1)) + geom_label(aes(y = ifelse(value < 50, value + 70, value/2)), position = position_dodge(width = 1), show.legend = FALSE) + scale_x_discrete("") + scale_y_continuous("Count") + scale_fill_manual("", values = c("#f1a340", "#998ec3")) + facet_nested_wrap(~ stage + step, scales = "free_x", nrow = 1) + theme_bw() + theme(axis.text.x = element_blank(), axis.ticks.x = element_blank(), panel.grid.major.x = element_blank(), legend.position = "bottom")
We consulted the final result of the validation with the authors of manuscripts reporting given interactions. To do so, we contacted corresponding author. In the case of more than two corresponding authors, we took the very last author of the publication. If the corresponding author was not available, we tried to contact the first authors' of the publication. If somebody authored more then one manuscript we contacted this author about all of the reported interactions.
We contacted 122 authors. 11 authors confirmed 81 interactions (r round(81/total_number_int, 4)*100
%) in 21 manuscripts (r round(21/total_number_pubs, 4)*100
%). Despite our efforts, we could not find a way to contact the authors of three manuscripts.
Data in AmyloGraph can be filtered using an amino acid motif. A motif that should appear in either interactor's or interactee's sequence. Only interactions between those sequences will be displayed on the graph and in the table.
A motif should consist of the letters representing amino acids with possibility of using the following ambiguous letters: "B" -- either "D" or "N" "J" -- either "I" or "L" "Z" -- either "E" or "Q" "X" -- any standard amino acid
Additionally, the character "*" may be used for a subsequence of any (possibly distinct) amino acids of any length. The character "\^" may be used as the first character of a motif to mark the beginning of the sequence. Similarly, "$" may be used as the last character of a motif to mark the end of a sequence.
Some exemplary motifs:
The articles listed below are the sources of curated data in the AmyloGraph database.
all_refs <- arrange(AmyloGraph::ag_data_references, nm) tmp <- lapply(1L:nrow(all_refs), function(ith_ref_id) { cat("1. ", AmyloGraph::citify(all_refs[ith_ref_id, ]), ".\n\n", sep = "") })
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.