text_tokenize: generic for gregexpr wrappers to tokenize text

Description Usage Arguments

Description

generic for gregexpr wrappers to tokenize text

default method for text_tokenize generic

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
text_tokenize(
  string,
  regex = NULL,
  ignore.case = FALSE,
  fixed = FALSE,
  perl = FALSE,
  useBytes = FALSE,
  non_token = FALSE
)

## Default S3 method:
text_tokenize(
  string,
  regex = NULL,
  ignore.case = FALSE,
  fixed = FALSE,
  perl = FALSE,
  useBytes = FALSE,
  non_token = FALSE
)

Arguments

string

text to be tokenized

regex

regex expressing where to cut see (see grep)

ignore.case

whether or not reges should be case sensitive (see grep)

fixed

whether or not regex should be interpreted as is or as regular expression (see grep)

perl

whether or not Perl compatible regex should be used (see grep)

useBytes

byte-by-byte matching of regex or character-by-character (see grep)

non_token

should information for non-token, i.e. those patterns by which the text was splitted, be returned as well


stringb documentation built on Jan. 26, 2021, 1:07 a.m.