Description Usage Arguments Details Value References See Also Examples
A function used to count and return the number of instances K-mers of length k
appear within the sample
's variable regions. If an offset
value is provided, K-mer counting takes place at a fixed position within the variable region of the read. If a Markov model is supplied, the expected count and the probability of observing the K-mer are also returned.
1 2 |
sample |
A sample handle to the dataset on which K-mer counting should be perfomed. |
k |
K-mer length(s) to be counted. |
minCount |
The minimum number of counts for a K-mer to be output. |
top |
Give the first N K-mers (by count). |
numSort |
Sort K-mers in descending order by count. If |
offset |
Location of window for which K-mers should be counted for. If not provided, K-mers are counted across all windows. |
markovModel |
Markov model handle to use to predict previous round probabilities and expected counts. |
forceCalculation |
Forces K-mer counting to be performed again, even if a previous result exists. |
seqfilter |
A sequence filter object to include/exclude sequences that are read in from the FASTQ file. |
outputPath |
Prints the computed K-mer table to a plain text file. This is useful when the number of unique K-mers in the dataset exceeds R's memory limit. |
The offset
feature counts K-mers of length k
offset
bp away from the 5' end in the variable region. For example, if we have 16-mer variable regions and wish to count K-mers of length 12 at an offset of 3 bp, we are looking at the K-mers found only at the position indicated by the bolded nucleotides in the variable region:
5' NNNNNNNNNNNNNNNN 3'
Minimum count refers to the lowest count observed for a kmer of length k for a given sample. Total count is the sum of counts over all kmers of length k for a given sample. These statistics can be viewed for all K-mer lengths and samples counting was performed on using selex.countSummary
. When a new seqfilter
object is provided, K-mer counting is redone. See selex.seqfilter
for more details.
See ‘References’ for more details regarding the K-mer counting process.
selex.counts
returns a data frame containing the K-mer sequence and observed counts for a given sample if a Markov model has not been supplied.
If a Markov model is supplied, a data frame containing K-mer sequence, observed counts, predicted previous round probability, and predicted previous round expected counts is returned.
If the number of unique K-mers exceeds R's memory limit, selex.counts
will cause R to crash when returning a data frame containing the K-mers. The outputPath
option can be used to avoid such a situation, as the Java code will directly write the table to a plain text file at the specified location instead.
Slattery, M., Riley, T.R., Liu, P., Abe, N., Gomez-Alcala, P., Dror, I., Zhou, T., Rohs, R., Honig, B., Bussemaker, H.J.,and Mann, R.S. (2011) Cofactor binding evokes latent differences in DNA binding specificity between Hox proteins. Cell 147:1270–1282.
Riley, T.R., Slattery, M., Abe, N., Rastogi, C., Liu, D., Mann, R.S., and Bussemaker, H.J. (2014) SELEX-seq: a method for characterizing the complete repertoire of binding site preferences for transcription factor complexes. Methods Mol. Biol. 1196:255–278.
selex.affinities
, selex.countSummary
, selex.infogain
, selex.kmax
, selex.mm
, selex.run
1 2 3 4 5 6 7 8 9 10 11 | # Kmer counting for a specific length on a given dataset
t1 = selex.counts(sample=r2, k=8, minCount=1)
# Kmer counting with an offset
t2 = selex.counts(sample=r2, k=2, offset=14, markovModel=NULL)
# Kmer counting with a Markov model (produces expected counts)
t3 = selex.counts(sample=r2, k=4, markovModel=mm)
# Display all available kmer statistics
selex.countSummary()
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.