Tabulate n-grams

Description

Builds a contingency table of the n-gram counts versus their class labels.

Usage

1
table_ngrams(seq, ngrams, target)

Arguments

seq

vector or matrix describing sequence(s).

ngrams

vector of n-grams.

target

integer vector with target information (e.g. class labels). Must have at least two values.

Value

a data frame with the number of columns equal to the length of the target plus 1. The first column contains names of the n-grams. Further columns represents counts of n-grams for respective value of the target.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
seqs_pos <- matrix(sample(c("a", "c", "g", "t"), 100, replace = TRUE, 
            prob = c(0.2, 0.4, 0.35, 0.05)), ncol = 5)
seqs_neg <- matrix(sample(c("a", "c", "g", "t"), 100, replace = TRUE), 
            ncol = 5)
tab <- table_ngrams(seq = rbind(seqs_pos, seqs_neg), 
                    ngrams = c("1_c.t_0", "1_g.g_0", "2_t.c_0", "2_g.g_0", "3_c.c_0", "3_g.c_0"), 
                    target = c(rep(1, 20), rep(0, 20)))
# see the results
print(tab)
# easily plot the results using ggplot2

Questions? Problems? Suggestions? or email at ian@mutexlabs.com.

Please suggest features or report bugs with the GitHub issue tracker.

All documentation is copyright its authors; we didn't write any of that.