token: Tokenize (or split) text and emit n-word combinations from a...
In teradata-aster-field/toaster: Big Data in-Database Analytics that Scales with Teradata Aster Distributed Platform

Description Usage Arguments Value

When n=1 simply tokenize text and emit words with counts. When n>1 tokenized words are combined into permutations of length n within each document.

1
2
3

token(n, tokenSep = "+", ignoreCase = FALSE,
  delimiter = "[ \\t\\b\\f\\r]+", punctuation = NULL,
  stemming = FALSE, stopWords = FALSE, sep = " ", minLength = 1)

`n`	number of words
`tokenSep`	a character string to separate the tokens when `n > 1`
`ignoreCase`	logical: treat text as-is (`FALSE`) or convert to all lowercase (true); Default is `TRUE`. Note that if the `stemming` is set to `TRUE`, tokens will always be converted to lowercase, so this option will be ignored.
`delimiter`	character or string that divides one word from the next. You can use a regular expression as the `delimiter` value.
`punctuation`	a regular expression that specifies the punctuation characters parser will remove before it evaluates the input text.
`stemming`	logical: If true, apply Porter2 Stemming to each token to reduce it to its root form. Default is `FALSE`.
`stopWords`	logical or string with the name of the file that contains stop words. If TRUE then that should be ignored when parsing text. Each stop word is specified on a separate line.
`sep`	a character string to separate multiple text columns.
`minLength`	exclude tokens shorter than minLength characters.

pluggable token parser

teradata-aster-field/toaster documentation built on May 31, 2019, 8:36 a.m.

teradata-aster-field/toaster index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

teradata-aster-field/toaster
Big Data in-Database Analytics that Scales with Teradata Aster Distributed Platform

token: Tokenize (or split) text and emit n-word combinations from a...
In teradata-aster-field/toaster: Big Data in-Database Analytics that Scales with Teradata Aster Distributed Platform

Description

Usage

Arguments

Value

Related to token in teradata-aster-field/toaster...

R Package Documentation

Browse R Packages

We want your feedback!

teradata-aster-field/toaster Big Data in-Database Analytics that Scales with Teradata Aster Distributed Platform

token: Tokenize (or split) text and emit n-word combinations from a... In teradata-aster-field/toaster: Big Data in-Database Analytics that Scales with Teradata Aster Distributed Platform

Description

Usage

Arguments

Value

Related to token in teradata-aster-field/toaster...

R Package Documentation

Browse R Packages

We want your feedback!

teradata-aster-field/toaster
Big Data in-Database Analytics that Scales with Teradata Aster Distributed Platform

token: Tokenize (or split) text and emit n-word combinations from a...
In teradata-aster-field/toaster: Big Data in-Database Analytics that Scales with Teradata Aster Distributed Platform