ConnectedComponents: Connected components
In rformassspectrometry/PSMatch: Handling and Managing Peptide Spectrum Matches

View source: R/ConnectedComponent-class.R

ConnectedComponents

R Documentation

Connected components

Description

Connected components are a useful representation when exploring identification data. They represent the relation between proteins (the connected components) and how they form groups of proteins as defined by shared peptides.

Connected components are stored as ConnectedComponents objects that can be generated using the ConnectedComponents() function.

Usage

ConnectedComponents(object, ...)

ccMatrix(x)

connectedComponents(x, i, simplify = TRUE)

## S4 method for signature 'ConnectedComponents'
length(x)

## S4 method for signature 'ConnectedComponents'
dims(x)

## S4 method for signature 'ConnectedComponents'
ncols(x)

## S4 method for signature 'ConnectedComponents'
nrows(x)

## S4 method for signature 'ConnectedComponents,integer,ANY,ANY'
x[i, j, ..., drop = FALSE]

## S4 method for signature 'ConnectedComponents,logical,ANY,ANY'
x[i, j, ..., drop = FALSE]

## S4 method for signature 'ConnectedComponents,numeric,ANY,ANY'
x[i, j, ..., drop = FALSE]

prioritiseConnectedComponents(x)

prioritizeConnectedComponents(x)

## S4 method for signature 'ConnectedComponents'
adjacencyMatrix(object)

Arguments

`object`	For the `ConnectedComponents` class constructor, either a sparse adjacency matrix of class `Matrix` or an instance of class `PSM`.
`...`	Additional arguments passed to `makeAdjacencyMatrix()` when `object` is of class `PSM()`.
`x`	An object of class `ConnectedComponents`.
`i`	`numeric()`, `integer()` or `logical()` to subset the `ConnectedComponents` instance. If a `logical()`, it must be of same length as the object is subsets.
`simplify`	`logical(1)` if `TRUE` (default), the output is simplified to sparse matrix if `i` was of length 1, otherwise a `List` is returned. Always a `List` if `FALSE`.
`j`	ignored
`drop`	ignore

Value

The ConnectedComponents() constructor returns an instance of class ConnectedComponents. The Creating and manipulating objects section describes the return values of the functions that manipulate ConnectedComponents objects.

Slots

adjMatrix: The sparse adjacency matrix (class Matrix) of dimension p peptides by m proteins that was used to generate the object.
ccMatrix: The sparse connected components matrix (class Matrix) of dimension m by m proteins.
adjMatrices: A List containing adjacency matrices of each connected components.

Creating and manipulating objects

Instances of the class are created with the ConnectedComponent() constructor from a PSM() object or directly from a sparse adjacency matrix of class Matrix. Note that if using the latter, the rows and columns must be named.
The sparse peptide-by-protein adjacency matrix is stored in the ConnectedComponent instance and can be accessed with the adjacencyMatrix() function.
The protein-by-protein connected components sparse matrix of object x can be accessed with the ccMatrix(x) function.
The number of connected components of object x can be retrieved with length(x).
The size of the connected components of object x, i.e the number of proteins in each component, can be retrieved with ncols(x). The number of peptides defining the connected components can be retrieved with nrows(x). Both can be accessed with dims(x).
The connectedComponents(x, i, simplify = TRUE) function returns the peptide-by-protein sparse adjacency matrix (or List of matrices, if length(i) > 1), i.e. the subset of the adjacency matrix defined by the proteins in connected component(s) i. i is the numeric index (between 1 and length(x)) of the connected connected. If simplify is TRUE (default), then a matrix is returned instead of a List of matrices of length 1. If set to FALSE, a List is always returned, irrespective of its length.
To help with the exploration of individual connected Components, the prioritiseConnectedComponents() function will take an instance of ConnectedComponents and return a data.frame where the component indices are ordered based on their potential to clean up/flag some peptides and split protein groups in small groups or individual proteins, or simply explore them. The prioritisation is based on a set of metrics computed from the component's adjacency matrix, including its dimensions, row and col sums maxima and minima, its sparsity and the number of communities and their modularity that quantifies how well the communities separate (see igraph::modularity(). Note that trivial components, i.e. those composed of a single peptide and protein are excluded from the prioritised results. This data.frame is ideally suited for a principal component analysis (using for instance prcomp()) for further inspection for component visualisation with plotAdjacencyMatrix().

Examples


## --------------------------------
## From an adjacency matrix
## --------------------------------
library(Matrix)
adj <- sparseMatrix(i = c(1, 2, 3, 3, 4, 4, 5),
                    j = c(1, 2, 3, 4, 3, 4, 5),
                    x = 1,
                    dimnames = list(paste0("Pep", 1:5),
                                   paste0("Prot", 1:5)))
adj
cc <- ConnectedComponents(adj)
cc

length(cc)
ncols(cc)

adjacencyMatrix(cc) ## same as adj above
ccMatrix(cc)

connectedComponents(cc)
connectedComponents(cc, 3) ## a singel matrix
connectedComponents(cc, 1:2) ## a List

## --------------------------------
## From an PSM object
## --------------------------------
f <- msdata::ident(full.names = TRUE, pattern = "TMT")
f

psm <- PSM(f) |>
       filterPsmDecoy() |>
       filterPsmRank()

cc <- ConnectedComponents(psm)
cc

length(cc)
table(ncols(cc))

(i <- which(ncols(cc) == 4))
ccomp <- connectedComponents(cc, i)

## A group of 4 proteins that all share peptide RTRYQAEVR
ccomp[[1]]

## Visualise the adjacency matrix - here, we see how the single
## peptides (white node) 'unites' the four proteins (blue nodes)
plotAdjacencyMatrix(ccomp[[1]])

## A group of 4 proteins formed by 7 peptides: THPAERKPRRRKKR is
## found in the two first proteins, KPTARRRKRK was found twice in
## ECA3389, VVPVGLRALVWVQR was found in all 4 proteins, KLKPRRR
## is specific to ECA3399, ...
ccomp[[3]]

## See how VVPVGLRALVWVQR is shared by ECA3406 ECA3415 ECA3389 and
## links the three other componennts, namely ECA3399, ECA3389 and
## (ECA3415, ECA3406). Filtering that peptide out would split that
## protein group in three.
plotAdjacencyMatrix(ccomp[[3]])

## Colour protein node based on protein names similarity
plotAdjacencyMatrix(ccomp[[3]], 1)

## To select non-trivial components of size > 1
cc2 <- cc[ncols(cc) > 1]
cc2

## Use components features to prioritise their exploration
pri_cc <- prioritiseConnectedComponents(cc)
pri_cc

plotAdjacencyMatrix(connectedComponents(cc, 1082), 1)

rformassspectrometry/PSMatch documentation built on June 1, 2025, 2:47 p.m.