text_types: Text Type Sets

Description Usage Arguments Details Value See Also Examples

View source: R/text_types.R

Description

Get or measure the set of types (unique token values).

Usage

1
2
3
text_types(x, filter = NULL, collapse = FALSE, ...)

text_ntype(x, filter = NULL, collapse = FALSE, ...)

Arguments

x

a text or character vector.

filter

if non-NULL, a text filter to to use instead of the default text filter for x.

collapse

a logical value indicating whether to collapse the aggregation over all rows of the input.

...

additional properties to set on the text filter.

Details

text_ntype counts the number of unique types in each text; text_types returns the set of unique types, as a character vector. Types are determined according to the filter argument.

Value

If collapse = FALSE, then text_ntype produces a numeric vector with the same length and names as the input text, with the elements giving the number of units in the corresponding texts. For text_types, the result is a list of character vector with each vector giving the unique types in the corresponding text, ordered according to the sort function.

If collapse = TRUE, then we aggregate over all rows of the input. In this case, text_ntype produces a scalar indicating the number of unique types in x, and text_types produces a character vector with the unique types.

See Also

text_filter, text_tokens.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
text <- c("I saw Mr. Jones today.",
          "Split across\na line.",
          "What. Are. You. Doing????",
          "She asked 'do you really mean that?' and I said 'yes.'")

# count the number of unique types
text_ntype(text)
text_ntype(text, collapse = TRUE)

# get the type sets
text_types(text)
text_types(text, collapse = TRUE)

corpus documentation built on May 2, 2021, 9:06 a.m.