txt_context | R Documentation |
If you have annotated your text using udpipe_annotate
,
your text is tokenised in a sequence of words. Based on this vector of words in sequence
getting n-grams comes down to looking at the previous/next word and the subsequent previous/next word andsoforth.
These words can be pasted
together to form an n-gram.
txt_context(x, n = c(-1, 0, 1), sep = " ", na.rm = FALSE)
x |
a character vector where each element is just 1 term or word |
n |
an integer vector indicating how many terms to look back and ahead |
sep |
a character element indicating how to |
na.rm |
logical, if set to |
a character vector of the same length of x
with the n-grams
txt_paste
, txt_next
, txt_previous
, shift
x <- c("We", "walked", "anxiously", "to", "the", "doctor", "!") ## Look 1 word before + word itself y <- txt_context(x, n = c(-1, 0), na.rm = FALSE) data.frame(x, y) ## Look 1 word before + word itself + 1 word after y <- txt_context(x, n = c(-1, 0, 1), na.rm = FALSE) data.frame(x, y) y <- txt_context(x, n = c(-1, 0, 1), na.rm = TRUE) data.frame(x, y) ## Look 2 words before + word itself + 1 word after ## even if not all words are there y <- txt_context(x, n = c(-2, -1, 0, 1), na.rm = TRUE, sep = "_") data.frame(x, y) y <- txt_context(x, n = c(-2, -1, 1, 2), na.rm = FALSE, sep = "_") data.frame(x, y) x <- c("We", NA, NA, "to", "the", "doctor", "!") y <- txt_context(x, n = c(-1, 0), na.rm = FALSE) data.frame(x, y) y <- txt_context(x, n = c(-1, 0), na.rm = TRUE) data.frame(x, y) library(data.table) data(brussels_reviews_anno, package = "udpipe") x <- as.data.table(brussels_reviews_anno) x <- subset(x, doc_id %in% txt_sample(unique(x$doc_id), n = 10)) x <- x[, context := txt_context(lemma), by = list(doc_id, sentence_id)] head(x, 20) x$term <- sprintf("%s/%s", x$lemma, x$upos) x <- x[, context := txt_context(term), by = list(doc_id, sentence_id)] head(x, 20)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.