Weka_tokenizers: R/Weka Tokenizers
In RWeka: R/Weka Interface

Weka_tokenizers

R Documentation

R/Weka Tokenizers

Description

R interfaces to Weka tokenizers.

Usage

AlphabeticTokenizer(x, control = NULL)
NGramTokenizer(x, control = NULL)
WordTokenizer(x, control = NULL)

Arguments

`x`	a character vector with strings to be tokenized.
`control`	an object of class `Weka_control`, or a character vector of control options, or `NULL` (default). Available options can be obtained on-line using the Weka Option Wizard `WOW`, or the Weka documentation.

Details

AlphabeticTokenizer is an alphabetic string tokenizer, where tokens are to be formed only from contiguous alphabetic sequences.

NGramTokenizer splits strings into n-grams with given minimal and maximal numbers of grams.

WordTokenizer is a simple word tokenizer.

Value

A character vector with the tokenized strings.

RWeka documentation built on March 7, 2023, 6:21 p.m.

RWeka index

RWeka Odds and Ends

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

Tweet to @rdrrHQ

GitHub issue tracker

ian@mutexlabs.com