selex.counts: Construct or retrieve a K-mer count table
In SELEX: Functions for analyzing SELEX-seq data

Description Usage Arguments Details Value References See Also Examples

A function used to count and return the number of instances K-mers of length k appear within the sample's variable regions. If an offset value is provided, K-mer counting takes place at a fixed position within the variable region of the read. If a Markov model is supplied, the expected count and the probability of observing the K-mer are also returned.

1 2	selex.counts(sample, k, minCount=100, top=-1, numSort=TRUE, offset=NULL, markovModel=NULL, forceCalculation=FALSE, seqfilter=NULL, outputPath = "")

`sample`	A sample handle to the dataset on which K-mer counting should be perfomed.
`k`	K-mer length(s) to be counted.
`minCount`	The minimum number of counts for a K-mer to be output.
`top`	Give the first N K-mers (by count).
`numSort`	Sort K-mers in descending order by count. If `FALSE`, K-mers are sorted in ascending order.
`offset`	Location of window for which K-mers should be counted for. If not provided, K-mers are counted across all windows.
`markovModel`	Markov model handle to use to predict previous round probabilities and expected counts.
`forceCalculation`	Forces K-mer counting to be performed again, even if a previous result exists.
`seqfilter`	A sequence filter object to include/exclude sequences that are read in from the FASTQ file.
`outputPath`	Prints the computed K-mer table to a plain text file. This is useful when the number of unique K-mers in the dataset exceeds R's memory limit.

The offset feature counts K-mers of length k offset bp away from the 5' end in the variable region. For example, if we have 16-mer variable regions and wish to count K-mers of length 12 at an offset of 3 bp, we are looking at the K-mers found only at the position indicated by the bolded nucleotides in the variable region:

5' NNNNNNNNNNNNNNNN 3'

Minimum count refers to the lowest count observed for a kmer of length k for a given sample. Total count is the sum of counts over all kmers of length k for a given sample. These statistics can be viewed for all K-mer lengths and samples counting was performed on using selex.countSummary. When a new seqfilter object is provided, K-mer counting is redone. See selex.seqfilter for more details.

See ‘References’ for more details regarding the K-mer counting process.

selex.counts returns a data frame containing the K-mer sequence and observed counts for a given sample if a Markov model has not been supplied.

If a Markov model is supplied, a data frame containing K-mer sequence, observed counts, predicted previous round probability, and predicted previous round expected counts is returned.

If the number of unique K-mers exceeds R's memory limit, selex.counts will cause R to crash when returning a data frame containing the K-mers. The outputPath option can be used to avoid such a situation, as the Java code will directly write the table to a plain text file at the specified location instead.

Slattery, M., Riley, T.R., Liu, P., Abe, N., Gomez-Alcala, P., Dror, I., Zhou, T., Rohs, R., Honig, B., Bussemaker, H.J.,and Mann, R.S. (2011) Cofactor binding evokes latent differences in DNA binding specificity between Hox proteins. Cell 147:1270–1282.

Riley, T.R., Slattery, M., Abe, N., Rastogi, C., Liu, D., Mann, R.S., and Bussemaker, H.J. (2014) SELEX-seq: a method for characterizing the complete repertoire of binding site preferences for transcription factor complexes. Methods Mol. Biol. 1196:255–278.

selex.affinities, selex.countSummary, selex.infogain, selex.kmax, selex.mm, selex.run

# Kmer counting for a specific length on a given dataset
t1 = selex.counts(sample=r2, k=8, minCount=1)

# Kmer counting with an offset
t2 = selex.counts(sample=r2, k=2, offset=14, markovModel=NULL)

# Kmer counting with a Markov model (produces expected counts)
t3 = selex.counts(sample=r2, k=4, markovModel=mm)

# Display all available kmer statistics
selex.countSummary()