tokenize-asweka: Weka-like n-gram Tokenization
In ngram: Fast n-Gram 'Tokenization'

Tokenize-AsWeka

R Documentation

Weka-like n-gram Tokenization

Description

An n-gram tokenizer with identical output to the NGramTokenizer function from the RWeka package.

Usage

ngram_asweka(str, min = 2, max = 2, sep = " ")

Arguments

`str`	The input text.
`min`, `max`	The minimum and maximum 'n' as in 'n-gram'.
`sep`	A set of separator characters for the "words". See details for information about how this works; it works a little differently from `sep` arguments in R functions.

Details

This n-gram tokenizer behaves similarly in both input and return to the tokenizer in RWeka. Unlike the tokenizer ngram(), the return is not a special class of external pointers; it is a vector, and therefore can be serialized via save() or saveRDS().

Value

A vector of n-grams listed in decreasing blocks of n, in order within a block. The output matches that of RWeka's n-gram tokenizer.

Examples

library(ngram)

str = "A B A C A B B"
ngram_asweka(str, min=2, max=4)

ngram documentation built on May 29, 2024, 6:18 a.m.

ngram index

Package overview README.md Guide to the ngram Package

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

ngram
Fast n-Gram 'Tokenization'

tokenize-asweka: Weka-like n-gram Tokenization
In ngram: Fast n-Gram 'Tokenization'

Weka-like n-gram Tokenization

Description

Usage

Arguments

Details

Value

See Also

Examples

Related to tokenize-asweka in ngram...

R Package Documentation

Browse R Packages

We want your feedback!

ngram Fast n-Gram 'Tokenization'

tokenize-asweka: Weka-like n-gram Tokenization In ngram: Fast n-Gram 'Tokenization'

Weka-like n-gram Tokenization

Description

Usage

Arguments

Details

Value

See Also

Examples

Related to tokenize-asweka in ngram...

R Package Documentation

Browse R Packages

We want your feedback!

ngram
Fast n-Gram 'Tokenization'

tokenize-asweka: Weka-like n-gram Tokenization
In ngram: Fast n-Gram 'Tokenization'