ramseylab/regshape: An ensemble classifier for regulatory elements based on DNA sequence-based shape parameters

Transcription factor binding site (TFBS) sequence patterns are often characterized by a position-nucleotide weight matrix (PWM) because it can be estimated from a small number of representative TFBS sequences. However, because the PWM probability model assumes independence between individual positions within the binding site, the PWMs for some TFs are poor discriminants of TFBS sequences from non-binding-site, noncoding DNA. Since three-dimensional DNA structure is recognized by TFs and is a determinant of binding specificity that depends on multi-base patterns, we developed a weak classifier, based on DNA shape parameter features extracted from DNA sequence, for predicting whether an oligonucleotide sequence (of variable length but at least six bp) is or is not within a cis-regulatory element (specifically, a transcription factor binding site). This classifier's predictions are returned as voting fraction scores that range from 0 to 1. The voting fraction scores can be used in tandem with a standard PWM score (such as can be obtained using the R package TFBSTools) as described in the accompanying paper "A DNA shape-based regulatory prior improves position-weight matrix-based recognition of transcription factor binding sites" (Yang and Ramsey, Dec. 2014).

Getting started

Package details

AuthorJichen Yang, Stephen Ramsey
MaintainerStephen Ramsey (lab.saramsey.org) <PLEASE_GET_MY_EMAIL_ADDRESS_AT_MY_WEBSITE@nowhere.com>
Licensefile LICENSE
Version1.0
URL http://lab.saramsey.org http://github.com/ramseylab/regshape
Package repositoryView on GitHub
Installation Install the latest version of this package by entering the following in R:
install.packages("remotes")
remotes::install_github("ramseylab/regshape")
ramseylab/regshape documentation built on May 26, 2019, 10:55 p.m.