qgrams | R Documentation |
Get a table of qgram counts from one or more character vectors.
qgrams(..., .list = NULL, q = 1L, useBytes = FALSE, useNames = !useBytes)
... |
any number of (named) arguments, that will be coerced to character with |
.list |
Will be concatenated with the |
q |
size of q-gram, must be non-negative. |
useBytes |
Determine byte-wise qgrams. |
useNames |
Add q-grams as column names. If |
A table with q-gram counts. Detected q-grams are column names and the argument names as row names. If no argument names were provided, they will be generated.
The input is converted to character
. If useBytes=TRUE
, each element is
converted to utf8
and then to integer
as in stringdist
.
Next,the data is passed to the underlying routine.
Strings with less than q
characters and elements containing NA
are skipped. Using q=0
therefore counts the number of empty strings ""
occuring in each argument.
stringdist
, amatch
qgrams('hello world',q=3) # q-grams are counted uniquely over a character vector qgrams(rep('hello world',2),q=3) # to count them separately, do something like x <- c('hello', 'world') lapply(x,qgrams, q=3) # output rows may be named, and you can pass any number of character vectors x <- "I will not buy this record, it is scratched" y <- "My hovercraft is full of eels" z <- c("this", "is", "a", "dead","parrot") qgrams(A = x, B = y, C = z,q=2) # a tonque twister, showing the effects of useBytes and useNames x <- "peter piper picked a peck of pickled peppers" qgrams(x, q=2) qgrams(x, q=2, useNames=FALSE) qgrams(x, q=2, useBytes=TRUE) qgrams(x, q=2, useBytes=TRUE, useNames=TRUE)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.