knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

library(chipPCR)
library(dplyr)
library(ggplot2)
library(patchwork)
library(tidyr)
library(ggh4x)

General assumptions

Main definitions

  1. An interactee is always acted upon by an interactor. This is the AmyloGraph terminology and it is not used in analyzed manuscripts. Suppose the publication uses words like 'co-incubation' and from the text itself it is not clear how to distinguish between interactor and interactor. Additionally, the authors study the effect of Protein A on Protein B and vice versa. In this case, we annotate it as two separate interactions: I) A is an interactor over B and II) B is an interactor over A.
  2. General logic. We use following operators AND and OR which are defined in the following way:
    1. Operator OR applied to $a$ and $b$ means: I) $a$, II) $b$, III) $a$ and $b$.
    2. Operator AND applied to $a$ and $b$ only when we have $a$ and $b$ simultaneously.
  3. Authors’ interpretation always supersedes ours.
  4. We refrain from using vague terms as seeding or cross-seeding. Instead, we describe the interaction using three descriptors (described in the section Descriptors).
  5. We refer to glossary provided by MIRRAGGE – Minimum Information Required for Reproducible AGGregation Experiments (doi: 10.3389/fnmol.2020.582488).
    1. Additional terms:
      • T50: the time required for the amyloid reaction to reach 50% of the final fluorescence intensity.

General interaction scenarios

We assume that six main scenarios can occur during the interaction of two amyloid proteins (see the figure above). Scenarios depend on the stability (permanent/transient) of the binding between interactor and interactee and the impact on the interactee's fibrillization speed (acceleration/inhibition). If there is no interaction between interactor and interactee, amyloid proteins are forming fibrils independently - scenario I occurs. If there is transient contact between interactor and interactee along with fibrillation inhibition - scenario II takes place, but if interactee's fibrillation is accelerated - scenario IV. If the physical binding between interactor and interactee occurs together with fibrillation inhibition - scenario III happens, but if interactee's fibrillation is accelerated - scenario V or VI takes place.

scenario I: no interaction between interactor and interactee & amyloid proteins are forming fibrils independently scenario II: transient contact between interactor and interactee & inhibition of homofibril fibrillization scenario III: physical binding between interactor and interactee & inhibition of homofibril fibrillization scenario IV: transient contact between interactor and interactee & acceleration of homofibril fibrillization scenario V: physical binding between interactor and interactee & acceleration of homofibril fibrillization scenario VI: physical binding between interactor and interactee & acceleration of heterofibril fibrillization

The scenarios are discrete but they represent points in the continuum rather than the real phenomenons. We are aware that depending on the experimental conditions an interaction can vary between scenario III and IV. Therefore, we do not imply that each interaction follows strictly one of these scenarios, but rather presents most dominantly one of them. To distinguish between these interaction scenarios we design three descriptors (described below). Descriptor 1. differentiates between scenarios I (no effect on kinetics) II and III (inhibited aggregation) as well as IV, V and VI (acceleration). Descriptor 2. discriminates between scenarios IV and V/VI. Descriptor 3. differentiates between scenarios V and VI.

For example, if descriptor 1 is faster aggregation, descriptor 2 - yes, direct evidence and descriptor 3 - yes, no or no information, they describe cross-seeding.

Descriptors:

Descriptor 1. The impact on the speed of the interactee' fibrillization.

General remarks: this descriptor is fully based on the kinetics or any kinetic data. Here, by fibrillization we mean aggregation from low-organisation levels to mature fibrils confirmed by e.g., microscopy images. If the interactor accelerates the speed of the oligomer formation, but they never aggregate into the level of mature fibrils (fibrillization does not occur), it is not an acceleration as we understand it. The commonly used technique to measure the kinetics of fibrillization is Thioflavin T (ThT) assay (e.g., ThT 101: a primer on the use of thioflavin T to investigate amyloid formation (doi: 10.1080/13506129.2017.1304905)). We are aware of the fact ThT is not always quantitative, i.e. a higher (or lower) ThT level - under different conditions (e.g., the presence of the interactor) - can be caused by changes to the fibril structure rather than the amount of fibrils. For the purpose of simplification, we ignore it and always follow the interpretation of authors.

  1. Faster aggregation: a) the maximum ThT emission observed at the end of the reaction of the interactee and interactor is higher than maximum ThT emission for interactee alone OR b) if the slope of the kinetic curve is steeper OR c) the lag phase is shorter OR d) T50 is lower. The fibrillization still occurred.
  2. Slower aggregation: a) the maximum ThT emission observed at the end of the reaction of the interactee and interactor is lower than maximum ThT emission for interactee alone AND b) the slope of the kinetic curve is less steep) OR c) the lag phase is longer. So we need (a AND b) OR c. The fibrillization still occurred.
  3. No aggregation: there is no confirmed fibrilization after the interaction.
  4. No effect: a) The slopes of kinetic curves are visibly similar AND b) the maximum ThT emission is similar AND c) the lag phase is similar.
  5. No information: there were no kinetic assays.
set.seed(1)

p1 <- rbind(mutate(AmpSim(cyc = 1:40), curve_type = "faster fibrillization (steeper slope)"),
      mutate(AmpSim(cyc = 1:40, b.eff = -8), curve_type = "slower fibrillization")) %>% 
  ggplot(aes(x = cyc, y = fluo, color = curve_type)) + 
  geom_line() +
  scale_x_continuous("Time") +
  scale_y_continuous("ThT fluorescence") +
  scale_color_manual("", values = c("#D79B00", "#6C8EBF")) +
  ggtitle("Slope")


p2 <- rbind(mutate(AmpSim(cyc = 1:40), curve_type = "faster fibrillization (shorter lag phase)"),
      mutate(AmpSim(cyc = 1:40, Cq = 30), curve_type = "slower fibrillization")) %>% 
  ggplot(aes(x = cyc, y = fluo, color = curve_type)) + 
  geom_line() +
  scale_x_continuous("Time") +
  scale_y_continuous("ThT fluorescence") +
  scale_color_manual("", values = c("#D79B00", "#6C8EBF")) +
  ggtitle("Lag phase")

p3 <- rbind(mutate(AmpSim(cyc = 1:40, ampl = 1.5), curve_type = "faster fibrillization (higher maximum ThT emission)"),
            mutate(AmpSim(cyc = 1:40), curve_type = "slower fibrillization")) %>% 
  ggplot(aes(x = cyc, y = fluo, color = curve_type)) + 
  geom_line() +
  scale_x_continuous("Time") +
  scale_y_continuous("ThT fluorescence") +
  scale_color_manual("", values = c("#D79B00", "#6C8EBF")) +
  ggtitle("Maximum ThT emission")


(p1 + p2 + p3)*(theme_bw(base_size = 9) +
  theme(legend.position = "bottom",
        legend.direction = "vertical",
        plot.background = element_rect(fill = NA),
        axis.text = element_blank()))

Descriptor 2. Physical binding between interactee and interactor.

  1. Yes, direct evidence: there is an experimental evidence that fibrils consist of two different amyloids (labeling; immunolabeling). It also applies if we have a visible colocalization of an interactee and an interactor visible in the microscopic images.
  2. Yes, implied by kinetics: if seeding is implied by kinetic experiments results and as such it is interpreted by authors of the publication. In principle, this answer covers every acceleration of the fibrillization confirmed by kinetic experiments.
  3. No: no effect on the elongation of interactee’s fibrils.
  4. Formation of fibrils by the interactee is inhibited: the formation of interactee’s aggregates was slowed or completely halted by the interactor.
  5. No information: there is no experimental evidence and seeding is not implied by kinetics experiments results.

Descriptor 3. Presence of the heterogenous fibrils consisting of interactor and interactee molecules.

  1. Yes: applies when a) there is experimental evidence that fibrils consist of two different amyloids (labeling; immunolabeling) AND b) the mature fibrils are structurally different than fibrils formed in the presence of interactor OR c) the term co-aggregation/ heterogeneous fibrils/ hybrid fibrils is used to describe the aggregation process.
  2. No: if the resulting amyloid fibrils have the dimension matching that of the aggregating interactee alone. a) the mature fibrils are confirmed by a microscopy technique to have the same structure as fibrils formed by the interactee without the presence of the interactor OR b) there is no fibrillar product at all OR c) an interactee and an interactor are the same protein.
  3. No information: there is no experimental evidence and seeding is not implied by kinetics experiments results.

Information on the sequence of interactor and interactee

Name of the amyloid protein: was chosen from a list of amyloid proteins considered by us. Every protein on the list has confirmed amyloid-like properties.

Sequence: The sequence is a vector of amino acids.

  1. In the case when the exact sequence is not known, we provide the longest possible precursor from UniProt.
  2. If protein is available only in a complex or was isolated (for example, purified from isolated spleen amyloid fibrils), we consider sequence as unavailable.
  3. If interactee or interactor are mutants/fragments of an amyloid protein, we provide only the sequence of the mutant/fragment and not the wild type protein.
  4. We consider sequences that have modified amino acids (e.g., methylated), but we do not include this information in the sequence.
  5. We do NOT consider mutants that instead of standard amino acids have a) non-biogenic amino acids (e.g., tyramine) b) non-amino acid linkers.
  6. If the sequence of the interactor or the interactee contains modified amino acid residues (e.g., phosporylated), we do not supply this information in the sequential data.

Source sequence: the UniProt ID of the original protein.

The AmyloGraph database as a single protein treats a protein that can occur in many taxonomic variants or after modifications (e.g., we have human and bovine precursor albumins, P02768 and P02769 as well as the products of the post-translational modifications, Q56G89).

The source sequence may be not identical to the interactor’s or interactee’s sequence. However, interactor or interactee might be a part of the source sequence (as human amyloid beta 1-40 is a part of the P05067) or a mutated variant of a source sequence (when some amino acids are altered compared to the original sequence). For example, in AmyloGraph database the CsgA protein can occur as one of 6 variants, including 4 homologues and 2 mutants.

Data acquisition

dat <- AmyloGraph::ag_data_interactions

total_number_pubs <- length(unique(dat[["doi"]]))
total_number_int <- nrow(dat)
total_number_prot <- length(unique(c(dat[["interactee_name"]], dat[["interactor_name"]])))

Manuscript collection

We started our manuscript collection on amyloid-amyloid interactions by defining the eligibility criteria:

  1. The manuscript has to be published after 2000.
  2. The manuscript has to report directly experimental results (this excludes review papers and simulations).
  3. The manuscript has to report experiments conducted in vitro.
  4. The manuscript has to report interactions leading to fibrillization.
    1. If the interactor accelerates the speed of the oligomer formation, but they never aggregate into the level of mature fibrils (fibrillization does not occur), it is not an acceleration in our understanding, but inhibition.
    2. In the case of different interactions of the same two amyloids, when these differences stem from the different amyloid formation levels (monomer, oligomer, fiber), pH, concentration, temperature or other experimental conditions, we showcase these interactions as two (or more) different interactions.
  5. The manuscript has to report interactions between two amyloid proteins. The list of amyloid proteins considered by us is available here.
    1. If one of the interaction participants is a non-amyloid protein, it should not be included in the database. The only exceptions are: a) non-aggregating homologs of known amyloid proteins b) non-aggregating mutants of amyloid proteins c) non-aggregating fragments of amyloid proteins.
    2. If the interactee or interactor is a) a mutant of an amyloid protein OR b) a fragment of an amyloid protein OR c) a taxonomic variant of an amyloid protein, we still add them to the database under the name of the original protein. However, in this case, we provide the exact sequence of the interactee/interactor and not the original protein.
    3. If the sequence of the interactor or the interactee contains (due to modifications) non-amino acids or nonbiogenic amino acids, this interaction is rejected.
  6. The manuscript has to report only two-party interactions. The database does not contain interactions with more than two participants, and the only exception is when two out of three participants are the same protein in a different aggregation level.

We have started our search with the analysis of 24 manuscripts in our in-house collection of publications. Next, we have expanded our search by repeatedly adding manuscripts cited by manuscripts in our collection or referencing manuscripts in our collections. The final collection had 364 manuscripts.

We have curated the information in collected publications using a two-step procedure: initial curation and validation.

Initial curation

During this procedure, a curator reviewed all interactions described in the manuscripts and annotated them in the dedicated form considering three AmyloGraph descriptors: descriptor 1. the impact on the speed of the fibrillization; descriptor 2. physical binding between interactee and interactors; descriptor 3. presence of the heterogenous fibrils (described in detail in the section Descriptors). They chose names of amyloid proteins involved in the interaction from a list and collected information on the amyloids' sequence. Each record was associated with manuscript's doi.

The final list of interactions after the initial curation covered 863 interactions 49 proteins described in 185 manuscripts.

Validation

During this procedure, a curator has independently reviewed the reported interaction records from assigned manuscripts in the dedicated form. The semi-random assignment procedure ensured that the curator who validated a specific record was not involved in its initial curation.

They reviewed interaction records similarly to during the initial curation step. A curator considered three AmyloGraph descriptors: descriptor 1. the impact on the speed of the fibrillization; descriptor 2. physical binding between interactee and interactors; descriptor 3. presence of the heterogenous fibrils Descriptors. They chose amyloid proteins' names from a list, collected information on the sequence of amyloid proteins involved in the interaction, and provided the sequence of an original protein by its UniProt ID. They could also add in missing interaction records or remove false ones.

The final list covers r total_number_int interactions between r total_number_prot proteins described in r total_number_pubs manuscripts.

dat <- data.frame(interactions = c(NA, NA, 863, 883, 883),
                  manuscripts = c(24, 364, 185, 172, 172),
                  step = c("In-house collection", "Extended search", 
                           "Initial\ncuration", "Validation", "Contact with authors"),
                  stage = c("A) Pre-screen of manuscripts",
                            "A) Pre-screen of manuscripts",
                            "B) Manual curation", "B) Manual curation",
                            "C) Independent validation")) %>% 
  mutate(stage = factor(stage, levels = unique(stage)),
         step = factor(step, levels = unique(step))) %>% 
  pivot_longer(cols = c(interactions, manuscripts)) %>% 
  na.omit() %>% 
  mutate(name = factor(name, 
                       levels = c("manuscripts", "interactions"),
                       labels = c("Manuscripts", "Interactions")))

ggplot(dat, aes(x = "a", y = value, label = value, fill = name)) +
  geom_col(position = position_dodge(width = 1)) + 
  geom_label(aes(y = ifelse(value < 50, value + 70, value/2)), 
             position = position_dodge(width = 1), show.legend = FALSE) +
  scale_x_discrete("") +
  scale_y_continuous("Count") +
  scale_fill_manual("", values = c("#f1a340", "#998ec3")) +
  facet_nested_wrap(~ stage + step, scales = "free_x", nrow = 1) +
  theme_bw() +
  theme(axis.text.x = element_blank(),
        axis.ticks.x = element_blank(),
        panel.grid.major.x = element_blank(),
        legend.position = "bottom")

Contact with authors

We consulted the final result of the validation with the authors of manuscripts reporting given interactions. To do so, we contacted corresponding author. In the case of more than two corresponding authors, we took the very last author of the publication. If the corresponding author was not available, we tried to contact the first authors' of the publication. If somebody authored more then one manuscript we contacted this author about all of the reported interactions.

We contacted 122 authors. 11 authors confirmed 81 interactions (r round(81/total_number_int, 4)*100 %) in 21 manuscripts (r round(21/total_number_pubs, 4)*100 %). Despite our efforts, we could not find a way to contact the authors of three manuscripts.

Usage of AmyloGraph

Filter by motif

Data in AmyloGraph can be filtered using an amino acid motif. A motif that should appear in either interactor's or interactee's sequence. Only interactions between those sequences will be displayed on the graph and in the table.

A motif should consist of the letters representing amino acids with possibility of using the following ambiguous letters: "B" -- either "D" or "N" "J" -- either "I" or "L" "Z" -- either "E" or "Q" "X" -- any standard amino acid

Additionally, the character "*" may be used for a subsequence of any (possibly distinct) amino acids of any length. The character "\^" may be used as the first character of a motif to mark the beginning of the sequence. Similarly, "$" may be used as the last character of a motif to mark the end of a sequence.

Some exemplary motifs:

Supplementary references

The articles listed below are the sources of curated data in the AmyloGraph database.

all_refs <- arrange(AmyloGraph::ag_data_references, nm)

tmp <- lapply(1L:nrow(all_refs), function(ith_ref_id) {
  cat("1. ", AmyloGraph::citify(all_refs[ith_ref_id, ]), ".\n\n", sep = "")
})


KotulskaLab/AmyloGraph documentation built on June 30, 2023, 8:48 p.m.