getShape: Predict DNA shape from a FASTA file

Description Usage Arguments Details Value Author(s) Examples

View source: R/getShape.R

Description

The DNA prediction uses a sliding pentamer window where structural features unique to each of the 512 distinct pentamers define a vector of minor groove width (MGW), Roll, propeller twist (ProT), and helix twist (HelT) at each nucleotide position (Zhou, et al., 2013). MGW and ProT define base-pair parameter whereas Roll and HelT represent base pair-step parameters. The values for each DNA shape feature as function of its pentamer sequence were derived from all-atom Monte Carlo simulations where DNA structure is sampled in collective and internal degrees of freedom in combination with explicit counter ions (Zhang, et al., 2014). The Monte Carlo simulations were analyzed with a modified Curves approach (Zhou, et al., 2013). Through data mining, average values for each shape feature were calculated for the on average 44 occurrences of each pentamer in an ensemble of Monte Carlo trajectories for 2,121 DNA fragments of 12-27 base pairs in length. DNAshapeR predicts four DNA shape features, which were observed in various co-crystal structures playing an important role in specific protein-DNA binding. The core prediction algorithm enables ultra-fast, high-throughput predictions of shape features for thousands of genomic sequences and is implemented in C++. Since it is likely that features describing additional structural properties or equivalent features derived from different experimental or computational sources will become available, the package has a flexible modular design that easily allows future expansions. In the latest version, we further added additional 9 DNA shape features beyond our previous set of 4 features, and expanded our available repertoire to a total of 13 features, including 6 inter-base pair or base pair-step parameters (HelT, Rise, Roll, Shift, Slide, and Tilt), 6 intra-base pair or base pair-step parameters (Buckle, Opening, ProT, Shear, Stagger, and Stretch), and MGW.

Usage

1
2
getShape(filename, shapeType = 'Default', parse = TRUE,
methylate = FALSE, methylatedPosFile = NULL)

Arguments

filename

The name of input fasta format file, including full path to file if it is located outside the current working directory.

shapeType

A character indicating the shape parameters which can be "MGW", "ProT", "Roll", "HelT" or "All" (meaning all four shapes)

parse

A logical value indicating whether parse the prediction result

methylate

A logical value indicating wheter consider methlatation

methylatedPosFile

The name of input postion file indicating methlated position

Details

Predict biophysical feature

Our previous work explained protein-DNA binding specificity based on correlations between MGW and electrostatic potential (EP) observed in experimentally available structures (Joshi, et al., 2007). However, A/T and C/G base pairs carry different partial charge distributions in the minor groove (due primarily to the guanine amino group), which will affect minor-groove EP. We developed a high-throughput method to predict minor-groove EP based on data mining of results from solving the nonlinear Poisson-Boltzmann calculations (Honig & Nicholls, 1995) on 2,297 DNA structures derived from Monte Carlo simulations. DNAshapeR includes EP as an additional feature.

Value

shapeList A List containing shapre prediction result

Author(s)

Federico Comoglio & Tsu-Pei Chiu

Examples

1
2
fn <- system.file("extdata", "CGRsample.fa", package = "DNAshapeR")
pred <- getShape(fn)

TsuPeiChiu/DNAshapeR documentation built on July 6, 2021, 9:07 p.m.