tokens_subset: Extract a subset of a tokens

Description Usage Arguments Value See Also Examples

Description

Returns document subsets of a tokens that meet certain conditions, including direct logical operations on docvars (document-level variables). tokens_subset functions identically to subset.data.frame, using non-standard evaluation to evaluate conditions based on the docvars in the tokens.

Usage

1
tokens_subset(x, subset, select, ...)

Arguments

x

tokens object to be subsetted

subset

logical expression indicating the documents to keep: missing values are taken as false

select

expression, indicating the docvars to select from the tokens; or a tokens object, in which case the returned tokens will contain the same documents in the same order as the original tokens, even if these are empty.

...

not used

Value

tokens object, with a subset of documents (and docvars) selected according to arguments

See Also

subset.data.frame

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
corp <- corpus(c(d1 = "a b c d", d2 = "a a b e",
                 d3 = "b b c e", d4 = "e e f a b"),
                 docvars = data.frame(grp = c(1, 1, 2, 3)))
toks1 <- tokens(corp)
# selecting on a docvars condition
tokens_subset(toks1, grp > 1)
# selecting on a supplied vector
tokens_subset(toks1, c(TRUE, FALSE, TRUE, FALSE))

# selecting on a tokens
toks2 <- tokens(c(d1 = "a b b c", d2 = "b b c d"))
toks3 <- tokens(c(d1 = "x y z", d2 = "a b c c d", d3 = "x x x"))
tokens_subset(toks2, subset = toks3)
tokens_subset(toks2, subset = toks3[c(3,1,2)])

quanteda/quanteda documentation built on June 26, 2019, 3:38 a.m.