geostas: GeoStaS Domain Finder

View source: R/geostas.R

geostasR Documentation

GeoStaS Domain Finder

Description

Identifies geometrically stable domains in biomolecules

Usage

geostas(...)

## Default S3 method:
geostas(...)

## S3 method for class 'xyz'
geostas(xyz, amsm = NULL, k = 3, pairwise = TRUE,
      clustalg = "kmeans", fit = TRUE, ncore = NULL, verbose=TRUE, ...)

## S3 method for class 'nma'
geostas(nma, m.inds = 7:11, verbose=TRUE, ...)

## S3 method for class 'enma'
geostas(enma, pdbs = NULL, m.inds = 1:5, verbose=TRUE, ...)

## S3 method for class 'pdb'
geostas(pdb, inds = NULL, verbose=TRUE, ...)

## S3 method for class 'pdbs'
geostas(pdbs, verbose=TRUE, ...)

amsm.xyz(xyz, ncore = NULL)

## S3 method for class 'geostas'
print(x, ...)

Arguments

...

arguments passed to and from functions, such as kmeans, and hclust which are called internally in geostas.xyz.

xyz

numeric matrix of xyz coordinates as obtained e.g. by read.ncdf, read.dcd, or mktrj.

amsm

a numeric matrix as obtained by amsm.xyz (convenient e.g. for re-doing only the clustering analysis of the ‘AMSM’ matrix).

k

an integer scalar or vector with the desired number of groups.

pairwise

logical, if TRUE use pairwise clustering of the atomic movement similarity matrix (AMSM), else columnwise.

clustalg

a character string specifing the clustering algorithm. Allowed values are ‘kmeans’ and ‘hclust’.

fit

logical, if TRUE coordinate superposition on identified core atoms is performed prior to the calculation of the AMS matrix.

ncore

number of CPU cores used to do the calculation. ncore>1 requires package ‘parallel’ installed.

verbose

logical, if TRUE details of the geostas calculations are printed to screen.

nma

an ‘nma’ object as obtained from function nma. Function mktrj is used internally to generate a trajectory based on the normal modes.

m.inds

the mode number(s) along which trajectory should be made (see function mktrj).

enma

an ‘enma’ object as obtained from function nma.pdbs. Function mktrj is used internally to generate a trajectory based on the normal modes.

pdbs

a ‘pdbs’ object as obtained from function pdbaln or read.fasta.pdb.

pdb

a ‘pdb’ object as obtained from function read.pdb.

inds

a ‘select’ object as obtained from function atom.select giving the atomic indices at which the calculation should be based. By default the function will attempt to locate C-alpha atoms using function atom.select.

x

a ‘geostas’ object as obtained from function geostas.

Details

This function attempts to identify rigid domains in a protein (or nucleic acid) structure based on an structural ensemble, e.g. obtained from NMR experiments, molecular dynamics simulations, or normal mode analysis.

The algorithm is based on a geometric approach for comparing pairwise traces of atomic motion and the search for their best superposition using a quaternion representation of rotation. The result is stored in a NxN atomic movement similarity matrix (AMSM) describing the correspondence between all pairs of atom motion. Rigid domains are obtained by clustering the elements of the AMS matrix (pairwise=TRUE), or alternatively, the columns similarity (pairwise=FALSE), using either K-means (kmeans) or hierarchical (hclust) clustering.

Compared to the conventional cross-correlation matrix (see function dccm) the “geostas” approach provide functionality to also detect domains involved in rotational motions (i.e. two atoms located on opposite sides of a rotating domain will appear as anti-correlated in the cross-correlation matrix, but should obtain a high similarity coefficient in the AMS matrix).

See examples for more details.

Value

Returns a list object of type ‘geostas’ with the following components:

amsm

a numeric matrix of atomic movement similarity (AMSM).

fit.inds

a numeric vector of xyz indices used for fitting.

grps

a numeric vector containing the domain assignment per residue.

atomgrps

a numeric vector containing the domain assignment per atom (only provided for geostas.pdb).

inds

a list of atom ‘select’ objects with indices to corresponding to the identified domains.

Note

The current implementation in Bio3D uses a different fitting and clustering approach than the original Java implementation. The results will therefore differ.

Author(s)

Julia Romanowska and Lars Skjaerven

References

Romanowska, J. et al. (2012) JCTC 8, 2588–2599. Skjaerven, L. et al. (2014) BMC Bioinformatics 15, 399. Grant, B.J. et al. (2006) Bioinformatics 22, 2695–2696.

See Also

plot.geostas, read.pdb, mktrj, read.ncdf, read.dcd, nma, dccm.

Examples


# PDB server connection required - testing excluded
try({

#### NMR-ensemble example
## Read a multi-model PDB file 
pdb <- read.pdb("1d1d", multi=TRUE)

## Find domains and write PDB
gs  <- geostas(pdb, fit=TRUE)

## Plot a atomic movement similarity matrix
plot.geostas(gs, contour=FALSE)

## Fit all frames to the 'first' domain
domain.inds <- gs$inds[[1]]

xyz <- pdbfit(pdb, inds=domain.inds)

#write.pdb(pdb, xyz=xyz, chain=gs$atomgrps)

}, silent=TRUE)
if(inherits(.Last.value, "try-error")) {
   message("Need internet to run the example")
}


## Not run: 
#### NMA example
## Fetch stucture
pdb <- read.pdb("1crn")

## Calculate (vibrational) normal modes
modes <- nma(pdb)

## Find domains
gs <- geostas(modes, k=2)

## Write NMA trajectory with domain assignment
mktrj(modes, mode=7, chain=gs$grps)

## Redo geostas domain clustering 
gs <- geostas(modes, amsm=gs$amsm, k=5)




#### Trajectory example
## Read inn DCD trajectory file, fit coordinates
dcdfile <- system.file("examples/hivp.dcd", package = "bio3d")
trj <- read.dcd(dcdfile)
xyz <- fit.xyz(trj[1,], trj)

## Find domains
gs <- geostas(xyz, k=3, fit=FALSE)

## Principal component analysis 
pc.md <- pca.xyz(xyz)

## Visualize PCs with colored domains (chain ID)
mktrj(pc.md, pc=1, chain=gs$grps)




#### X-ray ensemble GroEL subunits
# Define the ensemble PDB-ids
ids <- c("1sx4_[A,B,H,I]", "1xck_[A-B]", "1sx3_[A-B]", "4ab3_[A-B]")

# Download and split PDBs by chain ID
raw.files <- get.pdb(ids, path = "raw_pdbs", gzip = TRUE)
files <- pdbsplit(raw.files, ids, path = "raw_pdbs/split_chain/")

# Align structures
pdbs <- pdbaln(files)

# Find domains
gs <- geostas(pdbs, k=4, fit=TRUE)

# Superimpose to core region
pdbs$xyz <- pdbfit(pdbs, inds=gs$fit.inds)

# Principal component analysis 
pc.xray <- pca(pdbs)

# Visualize PCs with colored domains (chain ID)
mktrj(pc.xray, pc=1, chain=gs$grps)


##- Same, but more manual approach 
gaps.pos <- gap.inspect(pdbs$xyz)

# Find core region
core <- core.find(pdbs)

# Fit to core region
xyz <- fit.xyz(pdbs$xyz[1, gaps.pos$f.inds],
               pdbs$xyz[, gaps.pos$f.inds],
               fixed.inds=core$xyz,
               mobile.inds=core$xyz)

# Find domains
gs <- geostas(xyz, k=4, fit=FALSE)

# Perform PCA
pc.xray <- pca.xyz(xyz)

# Make trajectory
mktrj(pc.xray, pc=1, chain=gs$grps)


## End(Not run)

bio3d documentation built on Oct. 30, 2024, 1:08 a.m.