txt_context: Based on a vector with a word sequence, get n-grams (looking...
In udpipe: Tokenization, Parts of Speech Tagging, Lemmatization and Dependency Parsing with the 'UDPipe' 'NLP' Toolkit

txt_context

R Documentation

Based on a vector with a word sequence, get n-grams (looking forward + backward)

Description

If you have annotated your text using udpipe_annotate, your text is tokenised in a sequence of words. Based on this vector of words in sequence getting n-grams comes down to looking at the previous/next word and the subsequent previous/next word andsoforth. These words can be pasted together to form an n-gram.

Usage

txt_context(x, n = c(-1, 0, 1), sep = " ", na.rm = FALSE)

Arguments

`x`	a character vector where each element is just 1 term or word
`n`	an integer vector indicating how many terms to look back and ahead
`sep`	a character element indicating how to `paste` the subsequent words together
`na.rm`	logical, if set to `TRUE`, will keep all text even if it can not look back/ahead the amount specified by `n`. If set to `FALSE`, will have a resulting value of `NA` if at least one element is `NA` or it can not look back/ahead the amount specified by `n`.

Value

a character vector of the same length of x with the n-grams

Examples

x <- c("We", "walked", "anxiously", "to", "the", "doctor", "!")

## Look 1 word before + word itself
y <- txt_context(x, n = c(-1, 0), na.rm = FALSE)
data.frame(x, y)
## Look 1 word before + word itself + 1 word after
y <- txt_context(x, n = c(-1, 0, 1), na.rm = FALSE)
data.frame(x, y)
y <- txt_context(x, n = c(-1, 0, 1), na.rm = TRUE)
data.frame(x, y)

## Look 2 words before + word itself + 1 word after 
## even if not all words are there
y <- txt_context(x, n = c(-2, -1, 0, 1), na.rm = TRUE, sep = "_")
data.frame(x, y)
y <- txt_context(x, n = c(-2, -1, 1, 2), na.rm = FALSE, sep = "_")
data.frame(x, y)

x <- c("We", NA, NA, "to", "the", "doctor", "!")
y <- txt_context(x, n = c(-1, 0), na.rm = FALSE)
data.frame(x, y)
y <- txt_context(x, n = c(-1, 0), na.rm = TRUE)
data.frame(x, y)

library(data.table)
data(brussels_reviews_anno, package = "udpipe")
x      <- as.data.table(brussels_reviews_anno)
x      <- subset(x, doc_id %in% txt_sample(unique(x$doc_id), n = 10))
x      <- x[, context := txt_context(lemma), by = list(doc_id, sentence_id)]
head(x, 20)
x$term <- sprintf("%s/%s", x$lemma, x$upos)
x      <- x[, context := txt_context(term), by = list(doc_id, sentence_id)]
head(x, 20)

udpipe documentation built on Jan. 6, 2023, 5:06 p.m.

udpipe index

README.md UDPipe Natural Language Processing - Basic Analytical Use Cases UDPipe Natural Language Processing - Model Building UDPipe Natural Language Processing - Parallel UDPipe Natural Language Processing - Text Annotation UDPipe Natural Language Processing - Topic Modelling Use Cases UDPipe Natural Language Processing - Try it out UDPipe Natural Language Processing - Universe

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

udpipe
Tokenization, Parts of Speech Tagging, Lemmatization and Dependency Parsing with the 'UDPipe' 'NLP' Toolkit

txt_context: Based on a vector with a word sequence, get n-grams (looking...
In udpipe: Tokenization, Parts of Speech Tagging, Lemmatization and Dependency Parsing with the 'UDPipe' 'NLP' Toolkit

Based on a vector with a word sequence, get n-grams (looking forward + backward)

Description

Usage

Arguments

Value

See Also

Examples

Related to txt_context in udpipe...

R Package Documentation

Browse R Packages

We want your feedback!

udpipe Tokenization, Parts of Speech Tagging, Lemmatization and Dependency Parsing with the 'UDPipe' 'NLP' Toolkit

txt_context: Based on a vector with a word sequence, get n-grams (looking... In udpipe: Tokenization, Parts of Speech Tagging, Lemmatization and Dependency Parsing with the 'UDPipe' 'NLP' Toolkit

Based on a vector with a word sequence, get n-grams (looking forward + backward)

Description

Usage

Arguments

Value

See Also

Examples

Related to txt_context in udpipe...

R Package Documentation

Browse R Packages

We want your feedback!

udpipe
Tokenization, Parts of Speech Tagging, Lemmatization and Dependency Parsing with the 'UDPipe' 'NLP' Toolkit

txt_context: Based on a vector with a word sequence, get n-grams (looking...
In udpipe: Tokenization, Parts of Speech Tagging, Lemmatization and Dependency Parsing with the 'UDPipe' 'NLP' Toolkit