tokenize: Tokenize a character vector Parse the elements of a character...
In AdamSpannbauer/lexRankr: Extractive Summarization of Text with the LexRank Algorithm

View source: R/tokenize.R

tokenize

R Documentation

Tokenize a character vector Parse the elements of a character vector into a list of cleaned tokens.

Description

Tokenize a character vector Parse the elements of a character vector into a list of cleaned tokens.

Usage

tokenize(text, removePunc = TRUE, removeNum = TRUE, toLower = TRUE,
  stemWords = TRUE, rmStopWords = TRUE)

Arguments

`text`	The character vector to be tokenized
`removePunc`	`TRUE` or `FALSE` indicating whether or not to remove punctuation from `text`. If `TRUE`, punctuation will be removed. Defaults to `TRUE`.
`removeNum`	`TRUE` or `FALSE` indicating whether or not to remove numbers from `text`. If `TRUE`, numbers will be removed. Defaults to `TRUE`.
`toLower`	`TRUE` or `FALSE` indicating whether or not to coerce all of `text` to lowercase. If `TRUE`, `text` will be coerced to lowercase. Defaults to `TRUE`.
`stemWords`	`TRUE` or `FALSE` indicating whether or not to stem resulting tokens. If `TRUE`, the outputted tokens will be tokenized using `SnowballC::wordStem()`. Defaults to `TRUE`.
`rmStopWords`	`TRUE`, `FALSE`, or character vector of stopwords to remove. If `TRUE`, words in `lexRankr::smart_stopwords` will be removed prior to stemming. If `FALSE`, no stopword removal will occur. If a character vector is passed, this vector will be used as the list of stopwords to be removed. Defaults to `TRUE`.

Examples

tokenize("Mr. Feeny said the test would be on Sat. At least I'm 99.9% sure that's what he said.")
tokenize("Bill is trying to earn a Ph.D. in his field.", rmStopWords=FALSE)

AdamSpannbauer/lexRankr documentation built on Dec. 9, 2022, 3:44 a.m.

AdamSpannbauer/lexRankr index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

AdamSpannbauer/lexRankr
Extractive Summarization of Text with the LexRank Algorithm

tokenize: Tokenize a character vector Parse the elements of a character...
In AdamSpannbauer/lexRankr: Extractive Summarization of Text with the LexRank Algorithm

Tokenize a character vector Parse the elements of a character vector into a list of cleaned tokens.

Description

Usage

Arguments

Examples

Related to tokenize in AdamSpannbauer/lexRankr...

R Package Documentation

Browse R Packages

We want your feedback!

AdamSpannbauer/lexRankr Extractive Summarization of Text with the LexRank Algorithm

tokenize: Tokenize a character vector Parse the elements of a character... In AdamSpannbauer/lexRankr: Extractive Summarization of Text with the LexRank Algorithm

Tokenize a character vector Parse the elements of a character vector into a list of cleaned tokens.

Description

Usage

Arguments

Examples

Related to tokenize in AdamSpannbauer/lexRankr...

R Package Documentation

Browse R Packages

We want your feedback!

AdamSpannbauer/lexRankr
Extractive Summarization of Text with the LexRank Algorithm

tokenize: Tokenize a character vector Parse the elements of a character...
In AdamSpannbauer/lexRankr: Extractive Summarization of Text with the LexRank Algorithm