concat: Return the concatenator character from an object

View source: R/tokens.R

concatR Documentation

Return the concatenator character from an object

Description

Get the concatenator character from a tokens object.

Usage

concat(x)

concatenator(x)

Arguments

x

a tokens object

Details

The concatenator character is a special delimiter used to link separate tokens in multi-token phrases. It is embedded in the meta-data of tokens objects and used in downstream operations, such as tokens_compound() or tokens_lookup(). It can be extracted using concat() and set using tokens(x, concatenator = ...) when x is a tokens object.

The default ⁠_⁠ is recommended since it will not be removed during normal cleaning and tokenization (while nearly all other punctuation characters, at least those in the Unicode punctuation class ⁠[P]⁠ will be removed).

Value

a character of length 1

Examples

toks <- tokens(data_corpus_inaugural[1:5])
concat(toks)


quanteda documentation built on Sept. 11, 2024, 6:08 p.m.