kebabsData: KeBABS Sequence Data

Description Format Source References

Description

The package contains two small sequence datasets for demonstration of the package functionality.

TFBS is a subset of EP300/CREBBP binding data provided with the publication Lee et al., 2011. The data is based on binding sites identified with ChIP-seq by Visel et al., 2009. Please note that due to package size restrictions only a small subset of the data used in Lee et al., 2011 is included in the package. Following variables are defined:

CCoil is a set of heptad-annotated amino acid sequences of coiled coil proteins forming dimers or trimers from the web site of the package PrOCoil by Mahrenholz et. al., 2011. The data contains the sequences with heptad annotation, the oligomerization state and group assignment for each sequence. The grouping was performed through single linkage clustering of sequence similarities based on pairwise ungapped alignment. Following variables are defined:

Format

TFBS contains the 259 positive and 241 negative sequences as DNAStringSet and the corresponding labels as numeric vector containing a value of 1 for positive and -1 for negative samples.

CCoil contains the 477 AA sequences as AAStringSet and the corresponding labels as factor. The heptad anntoation is stored as character vector and group assignment as numeric vector.

Source

TFBS: http://www.beerlab.org/p300enhancer

CCoil: http://www.bioinf.jku.at/software/procoil/data.html

References

(Lee, 2011) – D. Lee, R. Karchin and M. A. Beer. Discriminative prediction of mammalian enhancers from DNA sequence. Genome Research, 21(12):2167-2180, 2011.

(Visel, 2009) – A. Visel, M. J. Blow, Z. Li, T. Zhang, J. A. Akiyama, A. Holt, I. Plajzer-Frick, M. Shoukry, C. Wright, F.Chen, V. Afzal, B. Ren, E. M. Rubin and L. A. Pennacchio. ChIP-seq accurately predicts tissue-specific activity of enhancers. Nature, 457(7231):854-858, 2009.

(Mahrenholz, 2011) – C. Mahrenholz, I. Abfalter, U. Bodenhofer, R. Volkmer and S. Hochreiter. Complex networks govern coiled-coil oligomerizations - predicting and profiling by means of a machine learning approach.


kebabs documentation built on Nov. 8, 2020, 7:38 p.m.