selex.counts: Construct or retrieve a K-mer count table

Description Usage Arguments Details Value References See Also Examples

View source: R/SELEX.R

Description

A function used to count and return the number of instances K-mers of length k appear within the sample's variable regions. If an offset value is provided, K-mer counting takes place at a fixed position within the variable region of the read. If a Markov model is supplied, the expected count and the probability of observing the K-mer are also returned.

Usage

1
2
selex.counts(sample, k, minCount=100, top=-1, numSort=TRUE, offset=NULL,
  markovModel=NULL, forceCalculation=FALSE, seqfilter=NULL, outputPath = "")

Arguments

sample

A sample handle to the dataset on which K-mer counting should be perfomed.

k

K-mer length(s) to be counted.

minCount

The minimum number of counts for a K-mer to be output.

top

Give the first N K-mers (by count).

numSort

Sort K-mers in descending order by count. If FALSE, K-mers are sorted in ascending order.

offset

Location of window for which K-mers should be counted for. If not provided, K-mers are counted across all windows.

markovModel

Markov model handle to use to predict previous round probabilities and expected counts.

forceCalculation

Forces K-mer counting to be performed again, even if a previous result exists.

seqfilter

A sequence filter object to include/exclude sequences that are read in from the FASTQ file.

outputPath

Prints the computed K-mer table to a plain text file. This is useful when the number of unique K-mers in the dataset exceeds R's memory limit.

Details

The offset feature counts K-mers of length k offset bp away from the 5' end in the variable region. For example, if we have 16-mer variable regions and wish to count K-mers of length 12 at an offset of 3 bp, we are looking at the K-mers found only at the position indicated by the bolded nucleotides in the variable region:

5' NNNNNNNNNNNNNNNN 3'

Minimum count refers to the lowest count observed for a kmer of length k for a given sample. Total count is the sum of counts over all kmers of length k for a given sample. These statistics can be viewed for all K-mer lengths and samples counting was performed on using selex.countSummary. When a new seqfilter object is provided, K-mer counting is redone. See selex.seqfilter for more details.

See ‘References’ for more details regarding the K-mer counting process.

Value

selex.counts returns a data frame containing the K-mer sequence and observed counts for a given sample if a Markov model has not been supplied.

If a Markov model is supplied, a data frame containing K-mer sequence, observed counts, predicted previous round probability, and predicted previous round expected counts is returned.

If the number of unique K-mers exceeds R's memory limit, selex.counts will cause R to crash when returning a data frame containing the K-mers. The outputPath option can be used to avoid such a situation, as the Java code will directly write the table to a plain text file at the specified location instead.

References

Slattery, M., Riley, T.R., Liu, P., Abe, N., Gomez-Alcala, P., Dror, I., Zhou, T., Rohs, R., Honig, B., Bussemaker, H.J.,and Mann, R.S. (2011) Cofactor binding evokes latent differences in DNA binding specificity between Hox proteins. Cell 147:1270–1282.

Riley, T.R., Slattery, M., Abe, N., Rastogi, C., Liu, D., Mann, R.S., and Bussemaker, H.J. (2014) SELEX-seq: a method for characterizing the complete repertoire of binding site preferences for transcription factor complexes. Methods Mol. Biol. 1196:255–278.

See Also

selex.affinities, selex.countSummary, selex.infogain, selex.kmax, selex.mm, selex.run

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
# Kmer counting for a specific length on a given dataset
t1 = selex.counts(sample=r2, k=8, minCount=1)

# Kmer counting with an offset
t2 = selex.counts(sample=r2, k=2, offset=14, markovModel=NULL)

# Kmer counting with a Markov model (produces expected counts)
t3 = selex.counts(sample=r2, k=4, markovModel=mm)

# Display all available kmer statistics
selex.countSummary()

SELEX documentation built on Nov. 8, 2020, 5:22 p.m.