read.fasta.pdb: Read Aligned Structure Data
In bio3d: Biological Structure Analysis

read.fasta.pdb

R Documentation

Read Aligned Structure Data

Description

Read aligned PDB structures and store their C-alpha atom data, including xyz coordinates, residue numbers, residue type and B-factors.

Usage

read.fasta.pdb(aln, prefix = "", pdbext = "", fix.ali = FALSE,
             pdblist=NULL, ncore = 1, nseg.scale = 1, progress = NULL, ...)

Arguments

`aln`	an alignment data structure obtained with `read.fasta`.
`prefix`	prefix to aln$id to locate PDB files.
`pdbext`	the file name extention of the PDB files.
`fix.ali`	logical, if TRUE check consistence between `$ali` and `$resno`, and correct `$ali` if they don't match.
`pdblist`	an optional list of `pdb` objects with sequence corresponding to the alignments in `aln`. Primarily used through function `pdbaln` when the PDB objects already exists (avoids reading PDBs from file).
`ncore`	number of CPU cores used to do the calculation. `ncore>1` requires package ‘parallel’ installed.
`nseg.scale`	split input data into specified number of segments prior to running multiple core calculation. See `fit.xyz`.
`progress`	progress bar for use with shiny web app.
`...`	other parameters for `read.pdb`.

Details

The input aln, produced with read.fasta, must have identifers (i.e. sequence names) that match the PDB file names. For example the sequence corresponding to the structure “1bg2.pdb” should have the identifer ‘1bg2’. See examples below.

Sequence miss-matches will generate errors. Thus, care should be taken to ensure that the sequences in the alignment match the sequences in their associated PDB files.

Value

Returns a list of class "pdbs" with the following five components:

`xyz`	numeric matrix of aligned C-alpha coordinates.
`resno`	character matrix of aligned residue numbers.
`b`	numeric matrix of aligned B-factor values.
`chain`	character matrix of aligned chain identifiers.
`id`	character vector of PDB sequence/structure names.
`ali`	character matrix of aligned sequences.
`resid`	character matrix of aligned 3-letter residue names.
`sse`	character matrix of aligned helix and strand secondary structure elements as defined in each PDB file.
`call`	the matched call.

Note

The sequence character ‘X’ is useful for masking unusual or unknown residues, as it can match any other residue type.

Author(s)

Barry Grant

References

Grant, B.J. et al. (2006) Bioinformatics 22, 2695–2696.

Examples


# Redundant testing excluded
try({

# Read sequence alignment
file <- system.file("examples/kif1a.fa",package="bio3d")
aln  <- read.fasta(file)

# Read aligned PDBs
pdbs <- read.fasta.pdb(aln)

# Structure/sequence names/ids
basename( pdbs$id )

# Alignment positions 335 to 339
pdbs$ali[,335:339]
pdbs$resid[,335:339]
pdbs$resno[,335:339]
pdbs$b[,335:339]

# Alignment C-alpha coordinates for these positions
pdbs$xyz[, atom2xyz(335:339)]

# See 'fit.xyz()' function for actual coordinate superposition
#  e.g. fit to first structure
# xyz <- fit.xyz(pdbs$xyz[1,], pdbs)
# xyz[, atom2xyz(335:339)]

}, silent=TRUE)
if(inherits(.Last.value, "try-error")) {
   message("Need internet to run the example")
}

bio3d documentation built on Oct. 30, 2024, 1:08 a.m.