SELEX Package

Description

Functions to assist in discovering transcription factor DNA binding specificities from SELEX-seq experimental data according to the Slattery et al. paper. For a more comprehensive example, please look at the vignette. Sample data used in the Slattery, et. al. is stored in the extdata folder for the package, and can be accessed using either the base R function system.file or the package function selex.exampledata.

Functions available:

selex.affinities Construct a K-mer affinity table
selex.config Set SELEX system parameters
selex.counts Construct or retrieve a K-mer count table
selex.countSummary Summarize available K-mer count tables
selex.defineSample Define annotation for an individual sample
selex.exampledata Extract example data files
selex.fastqPSFM Construct a diagnostic PSFM for a FASTQ file
selex.getAttributes Display sample handle attributes
selex.getRound0 Obtain round zero sample handle
selex.getSeqfilter Display sequence filter attributes
selex.infogain Compute or retrieve information gain between rounds
selex.infogainSummary Summarize available information gain values
selex.jvmStatus Display current JVM memory usage
selex.kmax Calculate kmax for a dataset
selex.kmerPSFM Construct a PSFM from a K-mer table
selex.loadAnnotation Load a sample annotation file
selex.mm Build or retrieve a Markov model
selex.mmProb Compute prior probability of sequence using Markov model
selex.mmSummary Summarize Markov model properties
selex.revcomp Create forward-reverse complement data pairs
selex.run Run a standard SELEX analysis
selex.sample Create a sample handle
selex.sampleSummary Show samples visible to the current SELEX session
selex.saveAnnotation Save sample annotations to file
selex.seqfilter Create a sequence filter
selex.setwd Set or change the working directory
selex.split Randomly split a dataset
selex.summary Display all count table, Markov model, and information gain summaries

Details

Package: SELEX
Type: Package
Version: .99
Date: 2014-11-3
License: GPL

Author(s)

Chaitanya Rastogi, Dahong Liu, and Harmen Bussemaker

Maintainer: Harmen Bussemaker hjb2004@columbia.edu

References

Slattery, M., Riley, T.R., Liu, P., Abe, N., Gomez-Alcala, P., Dror, I., Zhou, T., Rohs, R., Honig, B., Bussemaker, H.J.,and Mann, R.S. (2011) Cofactor binding evokes latent differences in DNA binding specificity between Hox proteins. Cell 147:1270–1282.

Riley, T.R., Slattery, M., Abe, N., Rastogi, C., Liu, D., Mann, R.S., and Bussemaker, H.J. (2014) SELEX-seq: a method for characterizing the complete repertoire of binding site preferences for transcription factor complexes. Methods Mol. Biol. 1196:255–278.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
#Initialize the SELEX package
#options(java.parameters="-Xmx1500M")
#library(SELEX) 

# Configure the current session
workDir = file.path(".", "SELEX_workspace")
selex.config(workingDir=workDir,verbose=FALSE, maxThreadNumber= 4)

# Extract sample data from package, including XML database
sampleFiles = selex.exampledata(workDir)

# Load & display all sample files using XML database
selex.loadAnnotation(sampleFiles[3])
selex.sampleSummary()

# Create sample handles
r0 = selex.sample(seqName="R0.libraries", sampleName="R0.barcodeGC", round=0)
r2 = selex.sample(seqName='R2.libraries', sampleName='ExdHox.R2', round=2)

# Split the r0 sample into testing and training sets
r0.split = selex.split(sample=r0)
r0.split

# Display all currently loaded samples
selex.sampleSummary() 

# Find kmax on the test dataset
k = selex.kmax(sample=r0.split$test)

# Build the Markov model on the training dataset
mm = selex.mm(sample=r0.split$train, order=NA, crossValidationSample=r0.split$test)
# See Markov model R^2 values
selex.mmSummary()

# Kmer counting with an offset
t1 =  selex.counts(sample=r2, k=2, offset=14, markovModel=NULL)
# Kmer counting with a Markov model (produces expected counts)
t2 =  selex.counts(sample=r2, k=4, markovModel=mm)
# Display all available kmer statistics
selex.countSummary()

# Calculate information gain
ig =  selex.infogain(sample=r2, k=8, mm)
# View information gain results
selex.infogainSummary()

# Perform the default analysis
selex.run(trainingSample=r0.split$train, crossValidationSample=r0.split$test, 
  infoGainSample=r2)

# View all stats
selex.summary()

Want to suggest features or report bugs for rdrr.io? Use the GitHub issue tracker.