txt_previousgram | R Documentation |
If you have annotated your text using udpipe_annotate
,
your text is tokenised in a sequence of words. Based on this vector of words in sequence
getting n-grams comes down to looking at the previous word and the subsequent previous word andsoforth.
These words can be pasted
together to form an n-gram containing
the second previous word, the previous word, the current word ...
txt_previousgram(x, n = 2, sep = " ")
x |
a character vector where each element is just 1 term or word |
n |
an integer indicating the ngram. Values of 1 will keep the x, a value of 2 will append the previous term to the current term, a value of 3 will append the second previous term term and the previous term preceding the current term to the current term |
sep |
a character element indicating how to |
a character vector of the same length of x
with the n-grams
paste
, shift
x <- sprintf("%s%s", LETTERS, 1:26) txt_previousgram(x, n = 2) data.frame(words = x, bigram = txt_previousgram(x, n = 2), trigram = txt_previousgram(x, n = 3, sep = "-"), quatrogram = txt_previousgram(x, n = 4, sep = ""), stringsAsFactors = FALSE) x <- c("A1", "A2", "A3", NA, "A4", "A5") data.frame(x, bigram = txt_previousgram(x, n = 2, sep = "_"), stringsAsFactors = FALSE)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.