SELEX: SELEX Package
In SELEX: Functions for analyzing SELEX-seq data

Description Details Author(s) References Examples

Functions to assist in discovering transcription factor DNA binding specificities from SELEX-seq experimental data according to the Slattery et al. paper. For a more comprehensive example, please look at the vignette. Sample data used in the Slattery, et. al. is stored in the extdata folder for the package, and can be accessed using either the base R function system.file or the package function selex.exampledata.

Functions available:

`selex.affinities`	Construct a K-mer affinity table
`selex.config`	Set SELEX system parameters
`selex.counts`	Construct or retrieve a K-mer count table
`selex.countSummary`	Summarize available K-mer count tables
`selex.defineSample`	Define annotation for an individual sample
`selex.exampledata`	Extract example data files
`selex.fastqPSFM`	Construct a diagnostic PSFM for a FASTQ file
`selex.getAttributes`	Display sample handle attributes
`selex.getRound0`	Obtain round zero sample handle
`selex.getSeqfilter`	Display sequence filter attributes
`selex.infogain`	Compute or retrieve information gain between rounds
`selex.infogainSummary`	Summarize available information gain values
`selex.jvmStatus`	Display current JVM memory usage
`selex.kmax`	Calculate kmax for a dataset
`selex.kmerPSFM`	Construct a PSFM from a K-mer table
`selex.loadAnnotation`	Load a sample annotation file
`selex.mm`	Build or retrieve a Markov model
`selex.mmProb`	Compute prior probability of sequence using Markov model
`selex.mmSummary`	Summarize Markov model properties
`selex.revcomp`	Create forward-reverse complement data pairs
`selex.run`	Run a standard SELEX analysis
`selex.sample`	Create a sample handle
`selex.sampleSummary`	Show samples visible to the current SELEX session
`selex.saveAnnotation`	Save sample annotations to file
`selex.seqfilter`	Create a sequence filter
`selex.setwd`	Set or change the working directory
`selex.split`	Randomly split a dataset
`selex.summary`	Display all count table, Markov model, and information gain summaries

Package:	SELEX
Type:	Package
Version:	.99
Date:	2014-11-3
License:	GPL

Chaitanya Rastogi, Dahong Liu, and Harmen Bussemaker

Maintainer: Harmen Bussemaker hjb2004@columbia.edu

Slattery, M., Riley, T.R., Liu, P., Abe, N., Gomez-Alcala, P., Dror, I., Zhou, T., Rohs, R., Honig, B., Bussemaker, H.J.,and Mann, R.S. (2011) Cofactor binding evokes latent differences in DNA binding specificity between Hox proteins. Cell 147:1270–1282.

Riley, T.R., Slattery, M., Abe, N., Rastogi, C., Liu, D., Mann, R.S., and Bussemaker, H.J. (2014) SELEX-seq: a method for characterizing the complete repertoire of binding site preferences for transcription factor complexes. Methods Mol. Biol. 1196:255–278.

#Initialize the SELEX package
#options(java.parameters="-Xmx1500M")
#library(SELEX) 

# Configure the current session
workDir = file.path(".", "SELEX_workspace")
selex.config(workingDir=workDir,verbose=FALSE, maxThreadNumber= 4)

# Extract sample data from package, including XML database
sampleFiles = selex.exampledata(workDir)

# Load & display all sample files using XML database
selex.loadAnnotation(sampleFiles[3])
selex.sampleSummary()

# Create sample handles
r0 = selex.sample(seqName="R0.libraries", sampleName="R0.barcodeGC", round=0)
r2 = selex.sample(seqName='R2.libraries', sampleName='ExdHox.R2', round=2)

# Split the r0 sample into testing and training sets
r0.split = selex.split(sample=r0)
r0.split

# Display all currently loaded samples
selex.sampleSummary() 

# Find kmax on the test dataset
k = selex.kmax(sample=r0.split$test)

# Build the Markov model on the training dataset
mm = selex.mm(sample=r0.split$train, order=NA, crossValidationSample=r0.split$test)
# See Markov model R^2 values
selex.mmSummary()

# Kmer counting with an offset
t1 =  selex.counts(sample=r2, k=2, offset=14, markovModel=NULL)
# Kmer counting with a Markov model (produces expected counts)
t2 =  selex.counts(sample=r2, k=4, markovModel=mm)
# Display all available kmer statistics
selex.countSummary()

# Calculate information gain
ig =  selex.infogain(sample=r2, k=8, mm)
# View information gain results
selex.infogainSummary()

# Perform the default analysis
selex.run(trainingSample=r0.split$train, crossValidationSample=r0.split$test, 
  infoGainSample=r2)

# View all stats
selex.summary()