From a topological point of view, polymers can be modeled as open polygonal paths. Upon closure, these paths generate topological objects called knots. Multi-component knots are known as links. Rknots provides functions for the topological analysis of knots and links with a particular focus on biological polymers (e.g. proteins).
This vignette is divided in three main parts:
For more details on the methods, please refer to Comoglio and Rinaldi, 2011 and Comoglio and Rinaldi, 2012.
Rknots has been developed as a generalized framework. Theoretically, any arbitrary structure compliant with the requirements below can be loaded. Practically, this depends on the complexity of the structure to be analyzed, the task to be performed, and available memory.
Rknots deals with knots and links. These are represented as the coordinates of \(N\) points in the three-dimensional space and a set of integer separators. \(N\) points in 3D can be naturally represented by a \(N \times 3\) matrix, where each row is a point and columns are \(x,y,z\) coordinates. A knot is entirely defined by its points. However, an integer vector \(S\) of length \(n-1\) is required to represent \(n\)-component links. The elements of \(S\) are called separators and describe the boundaries between components. Particularly, we defined a separator as the index of the edge that if not removed would connect components across.
For example, the Hopf link represented above is defined by an \(8 \times 3\) matrix and separator \(S={4}\). The edge that would connect the two components is indicated by the gray dashed line.
Rknots provides two datasets:
The Rolfsen table of 250 knots with less than 11 crossings (enumerated according to Rolfsen). Full and minimal stickies representations are stored in the Rolfsen.table and Rolfsen.table.ms datasets, respectively.
r
library(Rknots)
str( head(Rolfsen.table, 5) )
head( names(Rolfsen.table) )
Note that knots are elements of a named list. Thus, instances can also be accessed by name
r
str( Rolfsen.table['3.1'] )
The difference between the full representation of a trefoil knot and its minimal stickies representation is shown below.
Coordinates and separators of 130 links up to 4 components, stored in the link.table dataset.
r
str( head(link.table, 5) )
head( names(link.table) )
Each element of the list has two slots containing link coordinates and separators.
We now randomly sample a knot and a link structure to illustrate the main features of the package in the next sections.
set.seed(123) knot <- makeExampleKnot(k = TRUE) #a knot link <- makeExampleKnot(k = FALSE) #a link
Rknots introduces an S4 class called Knot. An object of class Knot has two mandatory slots:
An new instance of a Knot object can be created either by the class constructor newKnot or with the generic constructor new.
knot.cls <- new('Knot', points3D = knot) link.cls <- new('Knot', points3D = link$points3D, ends = link$ends) ( knot.cls <- newKnot(points3D = knot) ) ( link.cls <- newKnot(points3D = link$points3D, ends = link$ends) )
Slots can be accessed with the accessors getCoordinates and getEnds, or by subsetting with the [ operator. Direct access with @ is discouraged.
head( getCoordinates(knot.cls), 5 ) getEnds(knot.cls) head( link.cls['points3D'], 5) link.cls['ends']
Similarly, slots can be modified by the setters setCoordinates, setEnds or using the [<- operator.
knot.bu <- knot.cls #copy new.coordinates <- matrix( runif(60), ncol = 3 ) #generate random coordinates setCoordinates(knot.cls) <- new.coordinates #replace knot.cls knot.cls <- knot.bu #restore link.bu <- link.cls #copy setEnds(link.cls) <- c(10, 50, 90) #replace separators getEnds(link.cls) link.cls <- link.bu #restore
Note that Rknots does not expect the new vector of separators to be of the same length as the original one, i.e. it is possible to change the link type and this is useful when local operations on the link are performed.
The computational cost of polynomial invariants is generally very high. Therefore, structures must be reduced beforehand by reducing the number of points while retaining the topology of the original structure.
To this end, two structure reduction algorithms have been implemented in Rknots:
The AB algorithm is very efficient, whereas the MSR algorithm, by working on the knot projection, contains intrinsically more information but is significantly slower. The reduction can be applied directly to coordinates and separators
knot.AB <- AlexanderBriggs(points3D = knot, ends = c()) str(knot.AB) knot.msr <- msr(points3D = knot, ends = c()) str(knot.msr) link.AB <- AlexanderBriggs(points3D = link$points3D, ends = link$ends) str(link.AB) link.msr <- msr(points3D = link$points3D, ends = link$ends) str(link.msr)
or to an object of class Knot with the reduceStructure function.
knot.cls.AB <- reduceStructure(knot.cls, algorithm = 'AB' ) knot.cls.MSR <- reduceStructure(knot.cls, algorithm = 'MSR' ) link.cls.AB <- reduceStructure(link.cls, algorithm = 'AB' ) link.cls.MSR <- reduceStructure(link.cls, algorithm = 'MSR' ) link.cls.AB
The function msr (and the MSR method) return both the simplified structure, as well as the intersection matrix \(M\), which contains position and sign of the crossings in the knot/link diagram. This function can also be used to partially reduce structures by controlling the number of iterations (default to 100).
The original and simplified structures can be inspected by plotting a knot diagram with plotDiagram.
par( mfrow=c(1,2) ) #plotDiagram(knot.AB$points3D, knot.AB$ends, lwd = 2, main = 'Alexander-Briggs') #plotDiagram(knot.msr$points3D, knot.msr$ends, lwd = 2, main = 'MSR') plotDiagram(link.AB$points3D, link.AB$ends, lwd = 2, main = 'Alexander-Briggs') plotDiagram(link.msr$points3D, link.msr$ends, lwd = 2, main = 'MSR')
or directly with the plot function is the object is of class Knot.
plot(link.cls.AB, lend = 2, lwd = 3, main = 'using par()')
Rknots can be used to compute the following invariants:
Given an object of class Knot, invariants can be computed with computeInvariant. The function internally discriminates between knots and links and returns the appropriate polynomial as follows:
( delta.k <- computeInvariant( knot.cls.AB, invariant = 'Alexander') ) jones.k <- computeInvariant( knot.cls.AB, invariant = 'Jones', skein.sign = -1) homfly.k <- computeInvariant( knot.cls.AB, invariant = 'HOMFLY', skein.sign = -1) ( delta.l <- computeInvariant( link.cls.AB, invariant = 'Alexander') ) jones.l <- computeInvariant( link.cls.AB, invariant = 'Jones', skein.sign = -1) homfly.l <- computeInvariant( link.cls.AB, invariant = 'HOMFLY', skein.sign = -1)
When possible, Rknots allows to convert a polynomial invariant into another. For example, the HOMFLY polynomial can be converted to the Jones polynomial by
converted <-HOMFLY2Jones( homfly.k ) identical( converted, jones.k)
For some applications, the linking number of a link is desired. The linking number of a polygonal link can be computed with
( computeInvariant( link.cls.AB, invariant = 'LK') )
An illustration of the very beginning of the Rolfsen knot table can be generated as follows.
text <- names(Rolfsen.table)[1 : 6] par(mfrow = c(3,2)) for(i in 1 : 6) { k <- Rolfsen.table[[i]] k <- newKnot(k) plot(k, lwd = 2, main = text[i], sub = paste( computeInvariant(k, 'Alexander'), computeInvariant(k, 'Jones'), computeInvariant(k, 'HOMFLY'), sep = '\n')) }
In this section, we apply Rknots to detect and characterize knots in proteins. Two PDB files are part of the package data and will be used in the following case study:
A protein can be loaded from the file system or fetched from the Protein Data Bank using the loadProtein function, which returns a list of matrices where each element contains the 3D coordinates of a given protein chain. By default, loadProtein performs gap finding for each chain backbone. A cutoff parameter corresponding to the maximum allowed euclidean distance between two consecutive alpha-Carbon residues facilitates a custom definition of a gap. If a distance is greater than cutoff, the chain is split at the corresponding position and subchains are generated.
protein <- loadProtein(system.file("extdata/2K0A.pdb", package="Rknots")) protein<- loadProtein('2K0A') #from the PDB str(protein)
The 3D structure of the imported protein and the corresponding backbone model can be then visualized with plotKnot3D, which leverages on the superb graphics of the rgl package. Particularly, any parameter of the rgl functions lines3d and spheres3d can be supplied.
ramp <- colorRamp(c('blue', 'white', 'red')) pal <- rgb( ramp(seq(0, 1, length = 109)), max = 255) plotKnot3D(protein$A, colors = list( pal ), lwd = 8, radius = .4, showNC = TRUE, text = TRUE)
By setting showNC = TRUE
the chain is oriented by labeling the protein N-ter and C-ter, whereas text = TRUE
adds a residue number to each point. A snaphot of the 3D plot is illustrated below.
To find knots in proteins, a single chain has to be supplied and coerced to a Knot object. The available chains are returned by loadProtein.
protein <- newKnot(protein$A) protein
Closure and projection can then be performed using closeAndProject. This function applies a Principal Component Analysis on the closed structure to simplify the computation of invariants and for visualization purposes.
protein.cp <- closeAndProject( protein, w = 2 )
par(mfrow = c(1,2)) plot(protein, main = 'original', lwd = 2) plot(protein.cp, main = 'closed & projected', lwd = 2)
Next, invariants can be computed with
( delta.p <- computeInvariant( protein.cp, invariant = 'Alexander') ) ( jones.p <- computeInvariant( protein.cp, invariant = 'Jones', skein.sign = -1) ) ( homfly.p <- computeInvariant( protein.cp, invariant = 'HOMFLY', skein.sign = -1) )
For simple knots, the knot type can be assigned automatically using getKnotType
getKnotType(polynomial = delta.p, invariant = 'Alexander') getKnotType(polynomial = homfly.p, invariant = 'HOMFLY') getKnotType(polynomial = homfly.p, invariant = 'HOMFLY', full.output = TRUE)
The results indicate the Rds3p protein has a left-handed knot, which can be compared with the right-handed trefoil polynomial in the Rolfsen knot table by means of another Rknots utility:
trefoil <- Rolfsen.table[[1]] trefoil <- newKnot(trefoil) ( homfly.tr <- computeInvariant(trefoil, 'HOMFLY') ) ( homfly.tl <- HOMFLY2mirror(homfly.tr) ) identical( homfly.p, homfly.tl )
Finally, if a protein has more than one chain one can interate over all possible chains using lapply. The following code illustrates how to perform this analysis by fetching a protein with two chains from the PDB
processChain <- function(protein, i) { chain <- newKnot(protein[[i]]) chain <- closeAndProject( chain ) return( computeInvariant(chain, 'HOMFLY') ) } lengthChain <- function(protein, i) return( nrow(protein[[i]])) protein <- loadProtein('1AJC', join.gaps = FALSE, cutoff = 7) str(protein) chains <- names(protein) polynomials <- lapply( 1: length(chains) , function(i) { ifelse(lengthChain(protein, i) > 6, processChain(protein, i), 1) } ) cbind(chains, polynomials)
The results indicate that the first chain has been split and resulted in a two unknotted subchains. The second chain instead bears a right-handed trefoil knot. Note: by ignoring gap finding, we would have found a knot in the first chain.
protein <- loadProtein('1AJC', join.gaps = TRUE) str(protein) chains <- names(protein) polynomials <- lapply( 1: length(chains) , function(i) { ifelse(lengthChain(protein, i) > 6, processChain(protein, i), 1) } ) cbind(chains, polynomials)
sessionInfo()
Adler D. and Murdoch D. rgl: 3D visualization device system (OpenGL).
Alexander J.W. and Briggs G.B. (1926) On types of knotted curves. Ann of Math, 28, 562-586.
Comoglio F. and Rinaldi M. (2011) A Topological Framework for the Computation of the HOMFLY Polynomial and Its Application to Proteins. PLoS ONE 6(4), e18693.
Comoglio F. and Rinaldi M. (2012) Rknots: topological analysis of knotted biopolymers with R. Bioinformatics 28(10), 1400-1401.
Reidemeister K. (1926), Abh Math Sem Univ Hamburg 5: 24-32.
van Roon A.M., Loening N.M., Obayashi E., Yang J.C., Newman A.J., Hernandez H., Nagai K. and Neuhaus D., (2008) Solution structure of the U2 snRNP protein Rds3p reveals a knotted zinc-finger motif, Proc Natl Acad Sci USA, 105, 9621-9626.
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.